Virtual Worlds as Proxy for Multi-Object Tracking Analysis

Published by NAVER LABS Europe at 11 March 2016

Adrien Gaidon, Qiao Wang, Yohann Cabon, Eleonora Vig

CVPR, Las Vegas, Nevada, USA; June 26 - July 1st, 2016.

2015-085.pdf

Assessing performance on data not seen during training is critical in order to validate machine learning models. In computer vision, however, experimentally measuring the actual robustness and generalization performance of high-level recognition methods is difficult in practice, especcially in video analyzis, due to high data acquisition and labeling costs.

Furthermore, it is sometimes nearly impossible to acquire data for some test scenarios of interest (e.g., storms, accidents, …). In this work, we show how to leverage the recent progress in computer graphics (especially off-the-shelf tools like game engines) to generate photo-realistic virtual worlds useful to assess the performance of video analysis algorithms.

The main benefits of our approach are (i) the low cost of data generation, including with high-quality detailed annotations, (ii) the flexibility to automatically generate rich and varied scenes and their annotations, including under rare conditions to perform “what-if” and “ceteris paribus” analysis, and (iii) techniques to quantify the “transferability of conclusions” from synthetic to real-world data.

The main novel idea behind our approache consists in initializing the virtual worlds from 3D synthetic clones of real-world video sequences.

Citation: CVPR 2016, Las Vegas, Nevada, USA; June 26^th – July 1^st, 2016.

Also: MIT Technology Review | 16^th March 2016

Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dataset, called “Virtual KITTI”, automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance. As the gap between real and virtual worlds is small, virtual worlds enable measuring the impact of various weather and imaging conditions on recognition performance, all other things being equal. We show these factors may affect drastically otherwise high-performing deep models for tracking.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Virtual Worlds as Proxy for Multi-Object Tracking Analysis

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings