Myriad efforts have been made over the last 10 years in algorithmic improvements and dataset creation for semantic segmentation tasks. Of late, there have been rapid gains in this field, a subset of visual scene understanding, due mainly to contributions by deep learning methodologies. But deep learning techniques have an Achilles’ heel of consuming vast amounts of annotated data. Here we review some widely used and open, urban semantic segmentation datasets for Self Driving Car applications.
What is Semantic Segmentation?
The task of Semantic Segmentation is to annotate every pixel of an image with an object class. These classes could be “pedestrians, vehicles, buildings, vegetation, sky, void etc” in a self-driving environment. For example, semantic segmentation helps SDCs (Self Driving Cars) discover the driveable areas on an image.
The Cambridge-driving Labeled Video Database was one of the first semantically segmented datasets to be released in the self-driving space in late 2007. They used their own image annotation software to annotate 700 images from a video sequence of 10 minutes. The camera was set up on the dashboard of a car, with a similar field of view as that of the driver.
The KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset was released in 2012, but not with semantically segmented images. Other independent groups have annotated frames for their own use cases. Although, there does exist a dataset and benchmark suite for road and lane detection. In this smaller dataset, a variety of sensors including grayscale and color cameras, laser scanners, and GPS/IMU units, are mounted atop a car.
The Daimler Urban Segmentation dataset is a dataset of 5000 grayscale images of which only 500 are semantically segmented. Unlike most datasets, it does not contain the “nature” class. This dataset is part of a larger research initiative called 6D-vision by researchers from automaker Daimler.
Due to its small size, this would serve as a good testbed to see how well a semantic segmentation model generalizes.
This is a continuation of the “Daimler Urban Segmentation” dataset, where the scope of geography and climate has been expanded to capture a variety of urban scenes. This dataset also contains coarse images to enable methods that leverage large volumes of weakly labeled data. Similar to DUS, the cameras are mounted behind the windshield.
The 30 classes are split across 8 higher level categories as well. A unique feature of this dataset is that the authors have provided 20000 more images with coarse segmentation. Many deep learning techniques have used this additional dataset to improve their IoU scores
The most recent models currently have an IoU (Intersection over Union) of above 80%. This link contains an explanation of their scoring methodology as well as their benchmarking suite
Mapillary is a street-level imagery platform where participants collaborate to build better maps. They have made available a part of their image dataset and annotated them with pixel-level accuracy. As of the writing of this blog post, it is the world’s largest and most diverse open dataset with a geographical outreach spanning continents.
This dataset also consists of instance-level urban semantic segmentation for 37 classes out of 66. Since the images on the Mapillary platform are collaboratively collected, they are from a variety of viewing angles, as is visible through this explorer
One can make submissions of algorithms on their dataset over here.
The world of open datasets is ever growing as researchers look to create newer benchmarks. As each new dataset comes with increasing degrees of content, models can be evaluated on how well they generalize to the natural world around us. And, if you thought that this was the end of development of open datasets, check out SYNTHIA, a repository of images from virtual urban scenes!
Stay tuned for more on deep learning models for urban semantic segmentation.