Today, we can already see several autonomous vehicle programs under trial, plying the roads of a number of cites in a controlled environment and collecting critical data for driving in a real-world environment. There are at least 15 well-known large-scale public datasets that researchers can work on for autonomous driving research, from major autonomous driving firms and A.I. research labs.
The earlier datasets such as the BBD100K and ApolloScape contained primarily annotated frames from monocular camera video. The datasets released in 2019 come in richer variety and include different types of data from LiDAR cameras, radar and stereo cameras. Most of these datasets provide different city scenarios, multiple weather conditions, times of day and scene types to help researchers improve their autonomous driving models and algorithms to work optimally in different situations.
There are six levels of driving automation in SAE (Society of Automobile Engineers) International's J3016TM driving automation standard from Level 0 (No driving automation) to Level 5 (Full driving automation). Most vehicles on our roads today are at Level 0, that is, manually driven. The 2019 Audi A8L with Traffic Jam Pilot will be classified at Level 3 when rolled out.
Level 4 vehicles are in geo-fenced test-bedding stages at the moment and that is the frontline research of autonomous driving in the world today. These is where most of the advanced autonomous vehicles are at today, such as Baidu's Apollo Go Robotaxi service with about 67 cars at more than 600 pick-up and drop-off spots in a 60 square kilometers large trial area at the Beijing Economic and Technological Development Zone.
A future of Level 5 fully autonomous driving will greatly reduce fatalities on the road, solve traffic issues such as congestion and parking, and improve the environment by reducing personal cars on the road and maximizing shared transport.
Apolloscape is an autonomous driving AI dataset product developed by Baidu (China) for the transport & mobility industry. Baidu released the Apolloscape dataset on 19 March 2018.
The ApolloScape is part of the Baidu Apollo Program. Its dataset contains RGB videos with high-resolution image sequences (146,997 frames) and per-pixel annotation, along with survey-grade dense 3D points with semantic segmentation. The data is collected in different cities under various traffic conditions using mid-sized SUVs equipped with high resolution cameras and a Riegl acquisition system.
Product | Apolloscape |
Company | Baidu (China) |
Function | autonomous driving AI dataset |
Release Date | 2018-03-19 |
#Frames/Images | 146997 |
Locations | China - urban areas |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | http://apolloscape.auto/scene.html.. |
Argoverse is an autonomous driving AI dataset product developed by Argo (USA) for the transport & mobility industry. The Argoverse dataset was released publicly by Argo on 19 June 2019 and provides 16,389,650 frames of data in locations around Pittsburgh and Miami.
According to Argo, the Argoverse Dataset is acquired from cars equipped with 2 roof-mounted LiDAR sensors, 7 HD ring cameras and 2 front-view facing stereo cameras, and includes
Location/Environment: 204 linear kilometers in Miami and 86 linear kilometers in Pittsburgh — two US cities with distinct urban driving challenges and local driving habits, across different seasons, weather conditions, and times of day to provide a broad range of real-world driving scenarios.
Product | Argoverse |
Company | Argo (USA) |
Function | autonomous driving AI dataset |
Release Date | 2019-06-19 |
#Frames/Images | 16389650 |
Locations | USA - Pittsburgh and Miami |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://www.argoverse.org/ |
Audi Autonomous Driving Dataset (A2D2) is an autonomous driving AI dataset developed by Audi (Germany) for the transport & mobility industry.
Audi released the A2D2 (Audi Autonomous Driving Dataset) in 2020. The dataset contained 41,277 frames of driving imagery in the southern cities of Germany.
Product | Audi Autonomous Driving Dataset (A2D2) |
Company | Audi (Germany) |
Function | autonomous driving AI dataset |
Release Date | 2020-04-14 |
#Frames/Images | 41277 |
Locations | Germany - southern cities |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://www.a2d2.audi/a2d2/en.html.. |
BDD100K is an autonomous driving AI dataset product developed by Berkeley Artificial Intelligence Research Lab (USA) for the transport & mobility industry.
The Berkeley Artificial Intelligence Research Lab (BAIR) released the Berkeley Deep Drive dataset or BDD100K on 5 June 2018. The dataset contain driving segments along roads in Berkeley, San Francisco and New York, with close to 120 million frames of data.
Datasets released in 2019 by Aptiv, Argo, Lyft and Waymo have started to incorporate multi-modal data from other sensors such as LiDAR, radar and stereo cameras. While BDD100K lacks the multi-modal data of its newer counterparts, it is the largest dataset based on monocular videos with 120 million image frames across multiple cities, weather conditions, times of day and scene types.
Product | BDD100K |
Company | Berkeley Artificial Intelligence Research Lab (USA) |
Function | autonomous driving AI dataset |
Release Date | 2018-06-05 |
#Frames/Images | 120000000 |
Locations | USA - New York, Berkeley, San Francisco, Bay Area |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://bdd-data.berkeley.edu/.. |
Canadian Adverse Driving Conditions Dataset is an autonomous driving AI dataset developed by University of Waterloo (Canada) and University of Toronto (Canada) for the transport & mobility industry.
Canadian Adverse Driving Conditions Dataset was published in 2020 and contains 56,000 frames of imagery featured adverse/snow driving conditions in Waterloo, Canada.
Product | Canadian Adverse Driving Conditions Dataset |
Company | University of Waterloo (Canada) and University of Toronto (Canada) |
Function | autonomous driving AI dataset |
Release Date | 2020-02-03 |
#Frames/Images | 56000 |
Locations | Canada - Waterloo |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | http://cadcd.uwaterloo.ca/ |
KITTI Vision Benchmark Suite is an autonomous driving AI dataset developed by Karlsruhe Institute of Technology (Germany) for the transport & mobility industry.
The open dataset was released in 20 March 2012 and provides 15,000 frames of data around locations in Karlsruhe, Germany.
Product | KITTI Vision Benchmark Suite |
Company | Karlsruhe Institute of Technology (Germany) |
Function | autonomous driving AI dataset |
Release Date | 2012-03-20 |
#Frames/Images | 15000 |
Locations | Germany - Karlsruhe |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | http://www.cvlibs.net/datasets/kitti/.. |
LeddarTech PixSet Dataset is an autonomous driving AI dataset developed by LeddarTech (Canada) for the transport & mobility industry.
The PixSet dataset was released by LeddarTech in 24 February 2021 and provides 29,000 frames of data around urban areas in Canada.
Product | LeddarTech PixSet Dataset |
Company | LeddarTech (Canada) |
Function | autonomous driving AI dataset |
Release Date | 2021-02-24 |
#Frames/Images | 29000 |
Locations | Canada - urban areas |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://leddartech.com/solutions/leddar-pixset-dataset/.. |
Lyft released the Level 5 Dataset on 23 July 2019. According to Lyft, Lyft used a fleet of Ford Fusion vehicles for data collection using two different configurations of LiDARs and cameras. The Lyft Level 5 Dataset includes over 55,000 human-labeled 3D annotated frames, data from 7 cameras and up to 3 LiDARs, a drivable surface map and underlying HD spatial semantic map of the surveyed region (including lanes and crosswalks).
Location/Environment: San Francisco
Product | Lyft Level 5 Dataset |
Company | Lyft (USA) |
Function | autonomous driving AI dataset |
Release Date | 2019-07-23 |
#Frames/Images | 55000 |
Locations | USA - San Francisco |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://level5.lyft.com/dataset/.. |
Mapillary Vistas Dataset is an autonomous driving AI dataset developed by Mapillary AB (Sweden) for the transport & mobility industry.
The Mapillary Vistas dataset was released publicly on 3rd May 2017 and provides close to 25,000 frames of data around locations in general locations worldwide.
Product | Mapillary Vistas Dataset |
Company | Mapillary AB (Sweden) |
Function | autonomous driving AI dataset |
Release Date | 2017-05-03 |
#Frames/Images | 25000 |
Locations | locations worldwide |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://www.mapillary.com/dataset/vistas.. |
Targeting the ICT, software & AI development and transport & mobility industries, NVIDIA DRIVE Sim is an autonomous driving AI dataset and metaverse platform (virtual world), developed by Nvidia (USA), with the purpose to enable developers to virtually deploy and test self-driving vehicles in a high-fidelity and physically accurate simulated virtual world, with datasets that can be generated for perception training or decision-making testing for the self-driving algoritm.
Product | NVIDIA DRIVE Sim |
Company | Nvidia (USA) |
Function | autonomous driving AI dataset and metaverse platform (virtual world) |
Industry | ICT, Software & AI Development and Transport & Mobility |
Tech Domain | AI |
URL | https://developer.nvidia.com/drive/drive-sim.. |
Oxford Robotcar Dataset is an autonomous driving AI dataset developed by Oxford Robotics Institute (UK) for the transport & mobility industry.
The Oxford Robotics Institute released the Oxford Robotcar Dataset in 2016. The dataset contained around 20 million frames of driving imagery around central Oxford, England.
Product | Oxford Robotcar Dataset |
Company | Oxford Robotics Institute (UK) |
Function | autonomous driving AI dataset |
Release Date | 2016-11-29 |
#Frames/Images | 20000000 |
Locations | UK - central Oxford |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://robotcar-dataset.robots.ox.ac.uk/.. |
PandaSet is an autonomous driving AI dataset product developed by Hesai (China) and Scale (USA) for the transport & mobility industry.
The PandaSet autonomous driving dataset was released by Hesai and Scale on 20 May 2020. The dataset contains 60,000 frames of driving imagery around San Francisco, Palo Alto and San Mateo.
Combining Hesai’s best in class LiDAR sensors with Scale’s high-quality data annotation, the full PandaSet dataset will feature:
Location/Environment: Pandaset scenes are selected from 2 routes in Silicon Valley: (1) San Francisco; and (2) El Camino Real from Palo Alto to San Mateo, showcasing complex urban driving scenarios, including steep hills, construction, dense traffic and pedestrians, and a variety of times of day and lighting conditions in the morning, afternoon, dusk and evening.
Product | PandaSet |
Company | Hesai (China) and Scale (USA) |
Function | autonomous driving AI dataset |
Release Date | 2020-05-20 |
#Frames/Images | 60000 |
Locations | USA - San Francisco, Palo Alto, San Mateo |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://scale.com/open-datasets/pandaset.. |
Waymo released its open dataset on 21 August 2019. According to Waymo, the Waymo Open Dataset contains data from 1,000 driving segments. Each segment captures 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor.
Each segment contains sensor data from five high-resolution Waymo lidars and five front-and-side-facing cameras, and includes lidar frames and images with vehicles, pedestrians, cyclists, and signage carefully labeled, capturing a total of 12 million 3D labels and 1.2 million 2D labels.
The Waymo team believes the dataset is "one of the largest, richest, and most diverse self-driving datasets ever released for research".
Location/Environment: The dataset covers diverse driving environments, including dense urban and suburban environments across Phoenix, AZ, Kirkland, WA, Mountain View, CA and San Francisco, CA with a wide spectrum of driving conditions (day and night, dawn and dusk, sun and rain).
Product | Waymo Open Dataset |
Company | Waymo (USA) |
Function | autonomous driving AI dataset |
Release Date | 2019-08-21 |
#Frames/Images | 200000 |
Locations | USA - Phoenix, Kirkland, Mountain View, San Francisco |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://waymo.com/open/ |
NuPlan is an autonomous driving AI dataset developed by Motional (USA) for the transport & mobility industry.
Product | nuPlan |
Company | Motional (USA) |
Function | autonomous driving AI dataset |
Industry | Transport & Mobility |
Tech Domain | Robotics and AI |
URL | https://www.nuscenes.org/nuplan.. |
NuReality is an autonomous driving AI dataset and metaverse platform (virtual world) developed by Motional (USA) for the transport & mobility industry. Its key value is to enable autonomous-driving researchers and developers to make use of its custom-built virtual reality (VR) environment to study the interactions between autonomous vehicles (Avs) and pedestrians.
Product | nuReality |
Company | Motional (USA) |
Function | autonomous driving AI dataset and metaverse platform (virtual world) |
Industry | Transport & Mobility |
Tech Domain | Metaverse and AI |
URL | https://www.nureality.org/ |