Capturing the geometry of object categories from video supervision

Published by Diane Larlus at 30 September 2018

David Novotny, Diane Larlus, Andrea Vedaldi

Pattern Analysis and Machine Intelligence, September 2018

@article{novotny18pami,
author = {Novotny, David and Larlus, Diane and Vedaldi, Andrea},
year = {2018},
month = {09},
pages = {1-1},
title = {Capturing the Geometry of Object Categories from Video Supervision},
volume = {PP},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
doi = {10.1109/TPAMI.2018.2871117}
}

Careers home

Abstract

In this article, we are interested in capturing the 3D geometry of object categories simply by looking around them. Our unsupervised method fundamentally departs from traditional approaches that require either CAD models or manual supervision. It only uses video sequences capturing a handful of instances of an object category to train a deep architecture tailored for extracting 3D geometry predictions. Our deep architecture has three components. First, a Siamese viewpoint factorization network robustly aligns the input videos and, as a consequence, learns to predict the absolute category-specific viewpoint from a single image depicting any previously unseen instance of that category. Second, a depth estimation network performs monocular depth prediction. Finally, a 3D shape completion network predicts the full shape of the depicted object instance by re-using the output of the monocular depth prediction module. We also propose a way to configure networks so they can perform probabilistic predictions. We demonstrate that, properly used in our framework, this self-assessment mechanism is crucial for obtaining high quality predictions. Our network achieves state-of-the-art results on viewpoint prediction, depth estimation, and 3D point cloud estimation on public benchmarks.

This is an extended version of the following ICCV17 conference paper. More information on the VGG webpage.

BibTex reference:

@article{novotny18pami,
author = {Novotny, David and Larlus, Diane and Vedaldi, Andrea},
year = {2018},
month = {09},
pages = {1-1},
title = {Capturing the Geometry of Object Categories from Video Supervision},
volume = {PP},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
doi = {10.1109/TPAMI.2018.2871117}
}

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Capturing the geometry of object categories from video supervision

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings