Code implementing the model introduced in Learning to Rank Images with Cross-Modal Graph Convolutions (ECIR’20).
A novel, efficient model for whole-body 3D pose estimation (including bodies, hands and faces), trained by mimicking the output of hand-, body- and face-pose experts.
Method for extreme pruning of artificial neural networks at initialization.
Kapture is a file format as well as a set of tools for manipulating datasets, and in particular Visual Localization and Structure from Motion data.
Improved pose proposals integration for multi-person 2D and 3D pose detection in natural images.
Implementation of fully fledged Lisp interpreter with Data Structure, Pattern Programming and High level Functions with Lazy Evaluation à la Haskell. Comes with editor from TAMGU.
Data mixing strategies that can be computed on-the-fly with minimal computational overhead, highly transferable visual representations.
Benchmark associated with the 3DV2020 paper of the same name.
Updated photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.
A strategy to learn a stream that takes only RGB frames as input but leverages both appearance and motion information from them.
A method to predict the drop in accuracy of a trained model.
713 video clips from YouTube of mimed actions for a subset of 50 classes from the Kinetics400 dataset.
Datasets addresses all possible POI change scenarios to automatically update complex indoor maps.
Benchmarked on classic feature matching benchmarks (HPatches) and challenging visual localization datasets.
A open source programming language to help create, annotate and augment corpora and data
Targets challenges such as varying lighting conditions and different occlusion levels for tasks such as depth estimation, instance segmentation and visual localization.
585 samples (1006 sentences) randomly selected and annotated with the SemEval2016 annotation guidelines for the restaurant domain.
Theoretical and experimental findings to improve regression applications.
Repository contains models and evaluation scripts of papers ‘End-to-end Learning of Deep Visual Representations for Image Retrieval’ & ‘Learning with Average Precision: Training Image Retrieval with a Listwise Loss’.
Contains 39,982 videos, with more than 1,000 examples for each action of 35 categories.