In an ambient intelligence world, which involves robots and self-driving vehicles, we naturally need to have a good understanding of the 3D world we live in.
- 1 paper (R2D2: Reliable and Repeatable Detectors and Descriptors for Joint Sparse Keypoint Detection and Local Feature Extraction) accepted as oral at NeurIPS 2019.
- New paper on laveraging semantic segmentation for VSLAM accepted at Deep Learning for VSLAM workshop at ICCV (pdf).
- 2 papers accepted for ICCV 2019 (NAVER LABS total: 4): Learning with Average Precision: Training Image Retrieval with a Listwise Loss (PDF) and Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings
- New paper about wifi-based localization to appear at IPIN 2019
- New feature detector R2D2 wins Visual Localization Challenge at CVPR 2019 (paper on arxiv and publication database)
- Paper on Visual Localization at CVPR19
- Tutorial on structure-from-motion
- arXiv pre-print paper: “From handcrafted to deep local invariant features”
- Embedded Vision Workshop at CVPR19
- Paper: “Vision-based autonomous feeding robot” at OAGM18
Results in the tasks related to this understanding such as 3D reconstruction, mapping and visual localization have been getting better and better.
In reconstructing the geometry of the world as accurately as possible, it’s common practice to use sensors such as LIDAR, radar and, of course, cameras. This is because geometry is pretty well understood and one needs to ‘measure the world’ for many applications. However, progress in only using geometry to solve 3D vision tasks has been declining and the methods that exist today are not sufficiently robust for everyday situations such as changing environments and weather conditions.
One reason for this lack of robustness is that not everything can be measured or described in a way a computer can reliably detect it. Furthermore, even if a scene were to be perfectly reconstructed, there’s no guarantee that a computer would understand, analyse and interpret it correctly.
A popular strategy of the computer vision community to overcome these problems, is to use machine learning techniques rather than hand-crafted approaches and their success has proven it to be a good choice. There have been some outstanding results in topics such as image categorization, image retrieval and object detection.
However, geometric properties constitute a significant part of the world and we believe they should not be neglected entirely in favour of learning.
Our strategy therefore combines both approached. We want to measure as much as we can and learn what we cannot.
The research focus of the 3D Vision team lies on the design of methods which combine geometry and learning-based approaches to solve specific real-world challenges such as visual localization, camera pose estimation and 3D reconstruction. Examples for our target applications are robot navigation, indoor mapping, augmented reality (AR) and, more generally speaking, systems which enable ambient intelligence in day to day life.