PoseEmbroider
A transformer-based multi-modal alignment retrieval model that processes 3D poses, person’s pictures and textual pose descriptions to produce an enhanced 3D-, visual- and semantic aware human pose representation that can sort out partial information (e.g. image with the lower body occluded). Code with ECCV 2024 paper.
UNIC
General encoder for classification. Accompanies ECCV’24 paper.
DEBiT (Dual Encoder Binocular Transformer)
An end-to-end trained agent for image goal navigation. Accompanies ICLR24 paper End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon.
.
SHiNe
A novel classifier that uses semantic knowledge from class hierarchies. Can be seamlessly integrated with any off-the-shelf OvOD detector, with no additional computational overhead during inference.
ZLaP
Code that accompanies the CVPR 24 paper, Label Propagation for Zero-shot Classification with Vision-Language Models.
POC
Code for the paper “Placing Objects in Context via Inpainting for Out-of-distribution Segmentation”, ECCV 2024
A simple yet effective single-shot method to detect multiple people in an image and estimate their pose, body shape and expression. Training and demo code.
The SHOWMe dataset comprises 96 videos with their associated high-quality textured meshes of a hand holding an object.
Collaboration with INRIA.
The PoseFix dataset consists of several thousand paired 3D poses and corresponding text feedback that describes how the source pose needs to be modified to obtain the target pose.
Learning augmentation policies without prior knowledge.
A codebase to evaluate the robustness and uncertainty properties of semantic segmentation models as implemented in the CVPR 2024 paper.
Model for transfer learning.
Two ResNet50 models pretrained on our synthetic ImageNet clones: ImageNet-100-SD or ImageNet-1K-SD.
An Explicit Matching module for compatibility and an Implicit Similarity module for relevance.
Code for running our FIRe model , based solely on mid-level features that we call super-features.
Model trained by mimicking the BERT algorithm from the natural language processing community.
An auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences.
A dataset pairing 3D human poses with both automatically generated and human-written descriptions.
A Pytorch codebase for research to replicate the CVPR22 paper.
Official repo for the NeurIPS 2022 paper.
Dataset that allows exploration of cross-modal retrieval where images contain scene-text instances.
A method that is simple, easy to implement and train and of broad applicability.
Code repository for the ImageNet-CoG Benchmark introduced in the paper ICCV 2021 paper.