A novel, efficient model for whole-body 3D pose estimation (including bodies, hands and faces), trained by mimicking the output of hand-, body- and face-pose experts.
Method for extreme pruning of artificial neural networks at initialization.
Improved pose proposals integration for multi-person 2D and 3D pose detection in natural images.
Data mixing strategies that can be computed on-the-fly with minimal computational overhead, highly transferable visual representations.
Benchmark associated with the 3DV2020 paper of the same name.
Updated photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.
A strategy to learn a stream that takes only RGB frames as input but leverages both appearance and motion information from them.
713 video clips from YouTube of mimed actions for a subset of 50 classes from the Kinetics400 dataset.
Datasets addresses all possible POI change scenarios to automatically update complex indoor maps.
Benchmarked on classic feature matching benchmarks (HPatches) and challenging visual localization datasets.
Repository contains models and evaluation scripts of papers ‘End-to-end Learning of Deep Visual Representations for Image Retrieval’ & ‘Learning with Average Precision: Training Image Retrieval with a Listwise Loss’.
Contains 39,982 videos, with more than 1,000 examples for each action of 35 categories.