|Adrien Gaidon, Qiao Wang, Yohann Cabon, Eleonora Vig|
|CVPR, Las Vegas, Nevada, USA; June 26 - July 1st, 2016.|
Assessing performance on data not seen during training is critical in order to validate machine learning models. In computer vision, however, experimentally measuring the actual robustness and generalization performance of high-level recognition methods is difficult in practice, especcially in video analyzis, due to high data acquisition and labeling costs.
Furthermore, it is sometimes nearly impossible to acquire data for some test scenarios of interest (e.g., storms, accidents, …). In this work, we show how to leverage the recent progress in computer graphics (especially off-the-shelf tools like game engines) to generate photo-realistic virtual worlds useful to assess the performance of video analysis algorithms.
The main benefits of our approach are (i) the low cost of data generation, including with high-quality detailed annotations, (ii) the flexibility to automatically generate rich and varied scenes and their annotations, including under rare conditions to perform “what-if” and “ceteris paribus” analysis, and (iii) techniques to quantify the “transferability of conclusions” from synthetic to real-world data.
The main novel idea behind our approache consists in initializing the virtual worlds from 3D synthetic clones of real-world video sequences.
Citation: CVPR 2016, Las Vegas, Nevada, USA; June 26th – July 1st, 2016.
Also: MIT Technology Review | 16th March 2016
Modern computer vision algorithms typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to generate fully labeled, dynamic, and photo-realistic proxy virtual worlds. We propose an efficient real-to-virtual world cloning method, and validate our approach by building and publicly releasing a new video dataset, called “Virtual KITTI”, automatically labeled with accurate ground truth for object detection, tracking, scene and instance segmentation, depth, and optical flow. We provide quantitative experimental evidence suggesting that (i) modern deep learning algorithms pre-trained on real data behave similarly in real and virtual worlds, and (ii) pre-training on virtual data improves performance. As the gap between real and virtual worlds is small, virtual worlds enable measuring the impact of various weather and imaging conditions on recognition performance, all other things being equal. We show these factors may affect drastically otherwise high-performing deep models for tracking.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.