Autonomous Driving Open Datasets Released To Date (2022)

  •  Gary Yates
  •  Feb 10, 2022
  •  2397 views
Please continue below

To date, at least 15 large-scale datasets, ranging from the size of 25,000 frames to the scale of 120 million frames of driving imagery, have been released openly to stimulate and accelerate the pace of research into self-driving cars.


Today, we can already see several autonomous vehicle programs under trial, plying the roads of a number of cites in a controlled environment and collecting critical data for driving in a real-world environment. There are at least 15 well-known large-scale public datasets that researchers can work on for autonomous driving research, from major autonomous driving firms and A.I. research labs.


The earlier datasets such as the BBD100K and ApolloScape contained primarily annotated frames from monocular camera video. The datasets released in 2019 come in richer variety and include different types of data from LiDAR cameras, radar and stereo cameras. Most of these datasets provide different city scenarios, multiple weather conditions, times of day and scene types to help researchers improve their autonomous driving models and algorithms to work optimally in different situations.


Autonomous Driving Datasets

 


Where are we today, in terms of autonomous driving?

There are six levels of driving automation in SAE (Society of Automobile Engineers) International's J3016TM driving automation standard from Level 0 (No driving automation) to Level 5 (Full driving automation). Most vehicles on our roads today are at Level 0, that is, manually driven. The 2019 Audi A8L with Traffic Jam Pilot will be classified at Level 3 when rolled out.

Level 4 vehicles are in geo-fenced test-bedding stages at the moment and that is the frontline research of autonomous driving in the world today. These is where most of the advanced autonomous vehicles are at today, such as Baidu's Apollo Go Robotaxi service with about 67 cars at more than 600 pick-up and drop-off spots in a 60 square kilometers large trial area at the Beijing Economic and Technological Development Zone.

A future of Level 5 fully autonomous driving will greatly reduce fatalities on the road, solve traffic issues such as congestion and parking, and improve the environment by reducing personal cars on the road and maximizing shared transport.



Highlights

Apolloscape

Narrated in Mandarin

Apolloscape is an autonomous driving AI dataset product developed by Baidu (China) for the transport & mobility industry. Baidu released the Apolloscape dataset on 19 March 2018.

The ApolloScape is part of the Baidu Apollo Program. Its dataset contains RGB videos with high-resolution image sequences (146,997 frames) and per-pixel annotation, along with survey-grade dense 3D points with semantic segmentation. The data is collected in different cities under various traffic conditions using mid-sized SUVs equipped with high resolution cameras and a Riegl acquisition system.


Key Information
Product Apolloscape 
Company Baidu (China) 
Function autonomous driving AI dataset 
Release Date 2018-03-19 
#Frames/Images 146997 
Locations China - urban areas 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL http://apolloscape.auto/scene.html.. 

Argoverse

Argoverse is an autonomous driving AI dataset product developed by Argo (USA) for the transport & mobility industry. The Argoverse dataset was released publicly by Argo on 19 June 2019 and provides 16,389,650 frames of data in locations around Pittsburgh and Miami.


According to Argo, the Argoverse Dataset is acquired from cars equipped with 2 roof-mounted LiDAR sensors, 7 HD ring cameras and 2 front-view facing stereo cameras, and includes


  • One dataset with 3D tracking annotations for 113 scenes

  • One dataset with 327,793 interesting vehicle trajectories extracted from over 1000 driving hours

  • Two high-definition (HD) maps with lane centerlines, traffic direction, ground height, and more

  • One API to connect the map data with sensor information


Location/Environment: 204 linear kilometers in Miami and 86 linear kilometers in Pittsburgh — two US cities with distinct urban driving challenges and local driving habits, across different seasons, weather conditions, and times of day to provide a broad range of real-world driving scenarios.


Key Information
Product Argoverse 
Company Argo (USA) 
Function autonomous driving AI dataset 
Release Date 2019-06-19 
#Frames/Images 16389650 
Locations USA - Pittsburgh and Miami 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://www.argoverse.org/ 

Audi Autonomous Driving Dataset (A2D2)

Audi Autonomous Driving Dataset (A2D2) is an autonomous driving AI dataset developed by Audi (Germany) for the transport & mobility industry.

Audi released the A2D2 (Audi Autonomous Driving Dataset) in 2020. The dataset contained 41,277 frames of driving imagery in the southern cities of Germany.


Key Information
Product Audi Autonomous Driving Dataset (A2D2) 
Company Audi (Germany) 
Function autonomous driving AI dataset 
Release Date 2020-04-14 
#Frames/Images 41277 
Locations Germany - southern cities 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://www.a2d2.audi/a2d2/en.html.. 

BDD100K

BDD100K is an autonomous driving AI dataset product developed by Berkeley Artificial Intelligence Research Lab (USA) for the transport & mobility industry.

The Berkeley Artificial Intelligence Research Lab (BAIR) released the Berkeley Deep Drive dataset or BDD100K on 5 June 2018. The dataset contain driving segments along roads in Berkeley, San Francisco and New York, with close to 120 million frames of data.


BDD100K — Berkeley Deep Drive dataset

Datasets released in 2019 by Aptiv, Argo, Lyft and Waymo have started to incorporate multi-modal data from other sensors such as LiDAR, radar and stereo cameras. While BDD100K lacks the multi-modal data of its newer counterparts, it is the largest dataset based on monocular videos with 120 million image frames across multiple cities, weather conditions, times of day and scene types.


Key Information
Product BDD100K 
Company Berkeley Artificial Intelligence Research Lab (USA) 
Function autonomous driving AI dataset 
Release Date 2018-06-05 
#Frames/Images 120000000 
Locations USA - New York, Berkeley, San Francisco, Bay Area 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://bdd-data.berkeley.edu/.. 

Canadian Adverse Driving Conditions Dataset

Canadian Adverse Driving Conditions Dataset is an autonomous driving AI dataset developed by University of Waterloo (Canada) and University of Toronto (Canada) for the transport & mobility industry.

Canadian Adverse Driving Conditions Dataset was published in 2020 and contains 56,000 frames of imagery featured adverse/snow driving conditions in Waterloo, Canada.


Key Information
Product Canadian Adverse Driving Conditions Dataset 
Company University of Waterloo (Canada) and University of Toronto (Canada) 
Function autonomous driving AI dataset 
Release Date 2020-02-03 
#Frames/Images 56000 
Locations Canada - Waterloo 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL http://cadcd.uwaterloo.ca/ 

KITTI Vision Benchmark Suite

KITTI Vision Benchmark Suite is an autonomous driving AI dataset developed by Karlsruhe Institute of Technology (Germany) for the transport & mobility industry.

The open dataset was released in 20 March 2012 and provides 15,000 frames of data around locations in Karlsruhe, Germany.


Key Information
Product KITTI Vision Benchmark Suite 
Company Karlsruhe Institute of Technology (Germany) 
Function autonomous driving AI dataset 
Release Date 2012-03-20 
#Frames/Images 15000 
Locations Germany - Karlsruhe 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL http://www.cvlibs.net/datasets/kitti/.. 

LeddarTech PixSet Dataset

LeddarTech PixSet Dataset is an autonomous driving AI dataset developed by LeddarTech (Canada) for the transport & mobility industry.

The PixSet dataset was released by LeddarTech in 24 February 2021 and provides 29,000 frames of data around urban areas in Canada.


Key Information
Product LeddarTech PixSet Dataset 
Company LeddarTech (Canada) 
Function autonomous driving AI dataset 
Release Date 2021-02-24 
#Frames/Images 29000 
Locations Canada - urban areas 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://leddartech.com/solutions/leddar-pixset-dataset/.. 

Lyft Level 5 Dataset

Lyft Level 5 — Self-driving Car Division

Lyft released the Level 5 Dataset on 23 July 2019. According to Lyft, Lyft used a fleet of Ford Fusion vehicles for data collection using two different configurations of LiDARs and cameras. The Lyft Level 5 Dataset includes over 55,000 human-labeled 3D annotated frames, data from 7 cameras and up to 3 LiDARs, a drivable surface map and underlying HD spatial semantic map of the surveyed region (including lanes and crosswalks).

Location/Environment: San Francisco


Key Information
Product Lyft Level 5 Dataset 
Company Lyft (USA) 
Function autonomous driving AI dataset 
Release Date 2019-07-23 
#Frames/Images 55000 
Locations USA - San Francisco 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://level5.lyft.com/dataset/.. 

Mapillary Vistas Dataset

Mapillary Vistas Dataset is an autonomous driving AI dataset developed by Mapillary AB (Sweden) for the transport & mobility industry.

The Mapillary Vistas dataset was released publicly on 3rd May 2017 and provides close to 25,000 frames of data around locations in general locations worldwide.


Key Information
Product Mapillary Vistas Dataset 
Company Mapillary AB (Sweden) 
Function autonomous driving AI dataset 
Release Date 2017-05-03 
#Frames/Images 25000 
Locations locations worldwide 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://www.mapillary.com/dataset/vistas.. 

NVIDIA DRIVE Sim

Targeting the ICT, software & AI development and transport & mobility industries, NVIDIA DRIVE Sim is an autonomous driving AI dataset and metaverse platform (virtual world), developed by Nvidia (USA), with the purpose to enable developers to virtually deploy and test self-driving vehicles in a high-fidelity and physically accurate simulated virtual world, with datasets that can be generated for perception training or decision-making testing for the self-driving algoritm.

Key Information
Product NVIDIA DRIVE Sim 
Company Nvidia (USA) 
Function autonomous driving AI dataset and metaverse platform (virtual world) 
Industry ICT, Software & AI Development and Transport & Mobility 

Oxford Robotcar Dataset

Oxford Robotcar Dataset is an autonomous driving AI dataset developed by Oxford Robotics Institute (UK) for the transport & mobility industry.

The Oxford Robotics Institute released the Oxford Robotcar Dataset in 2016. The dataset contained around 20 million frames of driving imagery around central Oxford, England.


Key Information
Product Oxford Robotcar Dataset 
Company Oxford Robotics Institute (UK) 
Function autonomous driving AI dataset 
Release Date 2016-11-29 
#Frames/Images 20000000 
Locations UK - central Oxford 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://robotcar-dataset.robots.ox.ac.uk/.. 

PandaSet

PandaSet is an autonomous driving AI dataset product developed by Hesai (China) and Scale (USA) for the transport & mobility industry.

The PandaSet autonomous driving dataset was released by Hesai and Scale on 20 May 2020. The dataset contains 60,000 frames of driving imagery around San Francisco, Palo Alto and San Mateo.


Combining Hesai’s best in class LiDAR sensors with Scale’s high-quality data annotation, the full PandaSet dataset will feature:


  • 60,000 camera images

  • 20,000 LiDAR sweeps

  • 125 scenes of 8s each

  • 28 annotation classes

  • 37 semantic segmentation labels

  • Full sensor suite: 1x mechanical LiDAR, 1x solid-state LiDAR, 6x cameras, On-board GPS/IMU


Location/Environment: Pandaset scenes are selected from 2 routes in Silicon Valley: (1) San Francisco; and (2) El Camino Real from Palo Alto to San Mateo, showcasing complex urban driving scenarios, including steep hills, construction, dense traffic and pedestrians, and a variety of times of day and lighting conditions in the morning, afternoon, dusk and evening.


Key Information
Product PandaSet 
Company Hesai (China) and Scale (USA) 
Function autonomous driving AI dataset 
Release Date 2020-05-20 
#Frames/Images 60000 
Locations USA - San Francisco, Palo Alto, San Mateo 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://scale.com/open-datasets/pandaset.. 

Waymo Open Dataset

Waymo released its open dataset on 21 August 2019. According to Waymo, the Waymo Open Dataset contains data from 1,000 driving segments. Each segment captures 20 seconds of continuous driving, corresponding to 200,000 frames at 10 Hz per sensor.


Each segment contains sensor data from five high-resolution Waymo lidars and five front-and-side-facing cameras, and includes lidar frames and images with vehicles, pedestrians, cyclists, and signage carefully labeled, capturing a total of 12 million 3D labels and 1.2 million 2D labels.

The Waymo team believes the dataset is "one of the largest, richest, and most diverse self-driving datasets ever released for research".

Location/Environment: The dataset covers diverse driving environments, including dense urban and suburban environments across Phoenix, AZ, Kirkland, WA, Mountain View, CA and San Francisco, CA with a wide spectrum of driving conditions (day and night, dawn and dusk, sun and rain).


Key Information
Product Waymo Open Dataset 
Company Waymo (USA) 
Function autonomous driving AI dataset 
Release Date 2019-08-21 
#Frames/Images 200000 
Locations USA - Phoenix, Kirkland, Mountain View, San Francisco 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://waymo.com/open/ 

nuPlan

NuPlan is an autonomous driving AI dataset developed by Motional (USA) for the transport & mobility industry.

Key Information
Product nuPlan 
Company Motional (USA) 
Function autonomous driving AI dataset 
Industry Transport & Mobility 
Tech Domain Robotics and AI 
URL https://www.nuscenes.org/nuplan.. 

nuReality

NuReality is an autonomous driving AI dataset and metaverse platform (virtual world) developed by Motional (USA) for the transport & mobility industry. Its key value is to enable autonomous-driving researchers and developers to make use of its custom-built virtual reality (VR) environment to study the interactions between autonomous vehicles (Avs) and pedestrians.

Key Information
Product nuReality 
Company Motional (USA) 
Function autonomous driving AI dataset and metaverse platform (virtual world) 
Industry Transport & Mobility 
Tech Domain Metaverse and AI 
URL https://www.nureality.org/