Table of Links
VI. Conclusions and References
II. DATASET
The Road Damage Dataset 2020 [2] was curated and annotated for automated inspection. This multi-country dataset is released as a part of IEEE Big Data Cup Challenge [23]. The task is to detect road damages at a global scale and report the performance on Test 1 and Test 2 datasets.
The damages vary across countries. To generalize the damage category detection in Table I, classes considered for the analysis are; D00: Longitudinal Crack, D10: Transverse Crack, D20: Alligator Crack, D40: Pothole. Test 1 and Test 2 data is provided by the challenge [23] committee for evaluation and submission. Upon submission an Average F1 score is added to the private leaderboard as well as a public leaderboard if it exceeds all the previous scores in our private leaderboard.
A. Global Road Damage Dataset
The latest dataset is collected from Czech Republic and India in addition to what was made available by GIS Association of Japan. The 2020 dataset provides training images of size 600x600 with damages as a bounding box with associated damage class. Class labels and bounding box coordinates, defined by four numbers (xmin, ymin, xmax, ymax), are stored in the XML format as per PASCAL VOC [12].
The provided training data has 21041 total images. It consists of 2829 images from Czech (CZ); 10506 from Japan (JP); and 7706 from India (IN) with annotations stored in individual XML files. In Fig. 1, We can see the file structure, bounding box in xml tags and corresponding image example.
The shared Test data are divided into two sets. Test 1 consists of 349 Czech, 969 India and 1313 Japan Road images without annotated ground truth. Test 2 consists of 360 Czech, 990 India and 1314 Japan Road images without annotated ground truth. The detection results on these test images is submitted to the challenge [23] for Avg F1 score evaluation.
In order to run the experiments, we split the given training dataset proportionally into 80:15:5 :: Train (T):Val (V):Test (T) data. This gives us the final image & annotations count in Fig. 2 that will be used for training and tuning.
As we fine tune the models, we need to create composite datasets with Train+Test (T+T) and Train+Val (T+V) dataset composition. This will help model use entire data for learning and evaluation.
B. Evaluation Strategy
Evaluation strategy includes matching of the predicted class label for the ground truth bounding box and that the predicted bounding box has over 50% Intersection over Union (IoU) in area. Precision and recall are both based on evaluating Intersection over Union (IoU), which is defined as the ratio of the area overlap between predicted and ground-truth bounding boxes by the area of their union.
The evaluation of the match is done using the Mean F1 Score metric. The F1 score, commonly used in information retrieval, measures accuracy using the statistics of precision p and recall r. Precision is the ratio of true positives (tp) to all predicted positives (tp + fp) while recall is the ratio of true positives to all actual positives (tp + fn). Maximizing the F1-score ensures reasonably high precision and recall.
The F1 score is given by:
Avg F1 score serves as a balanced metric for precision and recall. This is the metric we obtain in our private leaderboard, upon submitting the evaluation results on Test 1 or Test 2 datasets.
Authors:
(1) Rahul Vishwakarma, Big Data Analytics & Solutions Lab, Hitachi America Ltd. Research & Development, Santa Clara, CA, USA ([email protected]);
(2) Ravigopal Vennelakanti, Big Data Analytics & Solutions Lab, Hitachi America Ltd. Research & Development, Santa Clara, CA, USA ([email protected]).
This paper is