2
exposed to higher risks. Therefore, the German Federal Ministry of Labour and Social Affairs
implemented a guideline to lower the risk of accidents in short-term roadwork sites. However, Hands-
on experience shows that this is difficult to observe for maintenance workers, as there are many
situations in which the “No-Entry Zone” is entered, consciously or unconsciously.
Our solution addresses this problem by trying to detect the approaching vehicle from both front and rear
directions based on computer vision and trigger a warning signal to alert the maintenance personnel. To
reach that aim, we utilize the recent groundbreaking deep learning-based Object detection for detecting
approaching vehicles on the road in both directions. Furthermore, the system can track such vehicles
and record their trajectories. After tracking vehicles, we utilize our traffic flow check algorithm to filter
unnecessary warnings.
Figure 2: Architecture of the vision-based warning system
There are two inputs to the system. One is a video streaming of a front-mounted conventional RGB
camera, and another is a video streaming of the rear-mounted conventional RGB camera. With these
two cameras, we can observe both directions of the construction site. The whole system is composed
of three main components: (1) a fast detector, (2) a detector-based tracker (3) a traffic flow check
algorithm (see Figure 2).
2. Related works
Vehicle detection is a considerably mature and proven technology that is crucial for the whole system.
Traditional detectors combine the sliding window method with traditional machine learning classifiers
[4, 5]. Such methods rely on handcrafted features, require much feature engineering work, and cannot
achieve satisfactory accuracy. In recent decades, due to the groundbreaking improvements in deep
learning, modern detectors are mainly composed of the convolutional neural network (CNN) [6, 7, 8].
CNN has a similar property to the traditional fully connected network but considerably reduces the
amount of the neural network’s parameter, making CNN much more robust and less prone to overfitting.
Such CNN-based detectors require large-scale datasets to train their neural network and can achieve a
better generalization capability. As a result, their accuracy is significantly improved in comparison to
traditional detectors. Since the working condition of maintenance is outdoor and constantly varies from
one country road to another, some country roads’ backgrounds are dense forest and mountain bodies,
while the others ought to be open fields. The weather can also vary seasonally, from rain to sunshine.
Therefore, we need a robust and highly reliable detector for the application and choose a CNN-based
detector. CNNs have a high requirement of computing resources to get a fast inference speed, so in the
project, we equipped the vehicle computer with GPU, which can accelerate the neural network’s
inference.
There are also two main branches of modern detectors. The first is R-CNNs based on the regional
proposal, and the inference process consists of two stages [6, 9, 10]. The other is a one-shot method,
which operates the feature extraction and object localization at the same time [7, 8, 11]. The one-shot