Announcing Virtual KITTI 2 - Naver Labs Europe
loader image

New release of the popular synthetic image dataset for training and testing.

Virtual KITTI [1] was one of the first datasets to explore how synthetic data could be used to train and test machine learning models. The idea, which came from a researcher in our computer vision team who liked to play video games, was that, if it were possible to create synthetic data that was good enough, it could help fill the need for the large amounts of diverse, fully annotated data that these models need. To create the data, the researchers carefully recreated real-world videos from the popular KITTI tracking benchmark [3] using the Unity game engine. They called it Virtual KITTI.

Since the introduction of Virtual KITTI in 2016, other synthetic datasets have appeared and, together, they’ve successfully demonstrated that while synthetic datasets cannot completely replace real-world data, they’re a cost-effective alternative with pretty good transferability.  Not only are they useful in evaluating preliminary prototypes but, in combination with real-world datasets, they can sometimes improve performance [1,2].

Fast forward to 2020 where, with the need for machine learning data being even greater, we’re happy to release a new version of the dataset called Virtual KITTI 2.

What’s new in Virtual KITTI 2.0

Virtual KITTI 2.0 is a more photo-realistic version of the original dataset with better features. It exploits recent improvements in lighting and post-processing of the game engine [4] with Unity 2018.4 LTS to bridge the gap between Virtual KITTI and the real KITTI images (see figure above).

To showcase the capabilities of Virtual KITTI 2, we re-ran the original experiments of Gaidon et al. [1] and added new ones on stereo matching, monocular depth estimation, camera pose estimation and semantic segmentation. This makes it possible to address all the problems that all other available datasets are applied to with just one single dataset. Moreover, as well as having images from a stereo camera where each one renders RGB, class segmentation, instance segmentation and depth, you now also get backward optical flow (in addition to forward which was already available in the previous version) and forward and backward scene flow images.

The results show that Virtual KITTI 2 is even closer to real KITTI than the previous versions which means it’s well suited for training and testing algorithms under controlled conditions.

To start testing, you can download the dataset here: https://europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds/

Our paper with more information about the experiments and usage of the new dataset can be found here: arxiv link

[1] https://europe.naverlabs.com/research/publications/virtual-worlds-as-proxy-for-multi-object-tracking-analysis/

[2] https://europe.naverlabs.com/research/publications/procedural-generation-of-videos-to-train-deep-action-recognition-networks/

[3] http://www.cvlibs.net/datasets/kitti/index.php

[4] https://unity.com/