In recent years, more and more companies and research institutions have made their open to the public. However, the best are not always easy to find, and scouring the internet for them takes time. autonomous driving datasets datasets To help, we at SiaSearch have put together a list of the top 15 open datasets for autonomous driving. The resources below collectively contain millions of data samples, many of which are already annotated. We hope this list provides you with a solid starting point for learning more about the field, or for starting your own autonomous driving project. Top Open Datasets for Autonomous Driving Projects 1. A2D2 Dataset The Audi Autonomous Driving Dataset (A2D2) features over 41,000 labeled with 38 features. Around 2.3 TB in total, A2D2 is split by annotation type (i.e. semantic segmentation, 3D bounding box). 2. ApolloScape Dataset ApolloScape is an evolving research project that aims to foster innovation across all aspects of autonomous driving, from perception to navigation and control. Via their website, users can explore a variety of simulation tools and over 100K street view frames, 80k lidar point cloud and 1000km trajectories for urban traffic. An example of lanemark segmentation in the ApolloScape dataset 3. Argoverse Dataset The Argoverse dataset includes 3D tracking annotations for 113 scenes and over 324,000 unique vehicle trajectories for motion forecasting. 4. Berkeley DeepDrive Dataset Also known as BDD 100K, the DeepDrive dataset gives users access to 100,000 annotated videos and 10 tasks to evaluate image recognition algorithms for autonomous driving. The dataset represents more than 1000 hours of driving experience with more than 100 million frames, as well as information on geographic, environmental, and weather diversity. 5. CityScapes Dataset CityScapes is a large-scale dataset focused on the semantic understanding of urban street scenes in 50 German cities. It features semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories. The entire dataset  includes 5,000 annotated images with fine annotations, and an additional 20,000 annotated images with coarse annotations. Examples of scenes from the CityScapes dataset - overlayed colors encode semantic classes 6. Comma2k19 Dataset This dataset includes 33 hours of commute time recorded on highway 280 in California. Each 1-minute scene was captured on a 20km section of highway driving between San Jose and San Francisco. The data was collected using comma EONs, which features a road-facing camera, phone GPS, thermometers and a 9-axis IMU. 7. Google-Landmarks Dataset Published by Google in 2018, the Landmarks dataset is divided into two sets of images to evaluate recognition and retrieval of human-made and natural landmarks. The original dataset contains over 2 million images depicting 30 thousand unique landmarks from across the world. In 2019, Google published Landmarks-v2, an even larger dataset with 5 million images and 200k landmarks. 8. KITTI Vision Benchmark Suite First released in 2012 by Geiger et al, the was released with the intent of advancing autonomous driving research with a novel set of real-world computer vision benchmarks. One of the first ever autonomous driving datasets, KITTI boasts over 4000 academic citations and counting. KITTI dataset An example of images from the KITTI semantic instance segmentation benchmark 9. LeddarTech PixSet Dataset Launched in 2021, Leddar PixSet is a new, publicly available dataset for autonomous driving research and development that contains data from a full AV sensor suite (cameras, LiDARs, radar, IMU), and includes full-waveform data from the Leddar Pixell, a 3D solid-state flash LiDAR sensor. The dataset contains 29k frames in 97 sequences, with more than 1.3M 3D boxes annotated 10. Level 5 Open Data Published by popular rideshare app Lyft, the Level5 dataset is another great source for autonomous driving data. It includes over 55,000 human-labeled 3D annotated frames, surface map, and an underlying HD spatial semantic map that is captured by 7 cameras and up to 3 LiDAR sensors that can be used to contextualize the data. 11. nuScenes Dataset Developed by Motional, the is one of the largest open-source datasets for autonomous driving. Recorded in Boston and Singapore using a full sensor suite (32-beam LiDAR, 6 360° cameras and radars), the dataset contains over 1.44 million camera images capturing a diverse range of traffic situations, driving maneuvers, and unexpected behaviors. nuScenes dataset Examples from the nuScenes dataset: images collected from clear weather (col 1), nighttime (col 2), rain (col 3) and construction zones (col 4). 12. Oxford Radar RobotCar Dataset The Oxford RobotCar Dataset contains over 100 recordings of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures many different environmental conditions, including weather, traffic and pedestrians, along with longer term changes such as construction and roadworks. 13. PandaSet PandaSet was the first open-source AV dataset available for both academic and commercial use. It contains 48,000 camera images, 16,000 LiDAR sweeps, 28 annotation classes, and 37 semantic segmentation labels taken from a full sensor suite. 14. Udacity Self Driving Car Dataset Online education platform Udacity has open sourced access to a variety of projects for autonomous driving, including neural networks trained to predict steering angles of the car, camera mounts, and dozens of hours of real driving data. 15. Waymo Open Dataset The Waymo Open dataset is an open-source multimodal sensor dataset for autonomous driving. Extracted from Waymo self-driving vehicles, the data covers a wide variety of driving scenarios and environments. It contains 1000 types of different segments where each segment captures 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor. Want to explore autonomous driving datasets on a deeper level? A proud supporter of the research community, SiaSearch offers free, enhanced access to popular autonomous driving datasets such as and via our Open Data initiative. nuScenes KITTI In fact, many of the datasets on this list have already been integrated into our platform. to easily query, curate, and transform datasets for autonomous driving. Sign up for a free account Also published at: https://www.siasearch.io/blog/best-open-source-autonomous-driving-datasets Lead image via CityScapes

Google

Lyft

Zones

AI in the Retail Industry: 10 Computer Vision Startups to Follow in 2021

Automotive Companies only Access 5% of their Vehicle Data

Learn more about the SiaSearch data management platform.

Top 15 Datasets for Autonomous Driving

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI in the Retail Industry: 10 Computer Vision Startups to Follow in 2021

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

AI in the Retail Industry: 10 Computer Vision Startups to Follow in 2021

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps