A simple yet effective single-shot method to detect multiple people in an image and estimate their pose, body shape and expression. Training and demo code.
The SHOWMe dataset comprises 96 videos with their associated high-quality textured meshes of a hand holding an object.
Collaboration with INRIA.
The PoseFix dataset consists of several thousand paired 3D poses and corresponding text feedback that describes how the source pose needs to be modified to obtain the target pose.
Model trained by mimicking the BERT algorithm from the natural language processing community.
An auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences.
A dataset pairing 3D human poses with both automatically generated and human-written descriptions.
A novel, efficient model for whole-body 3D pose estimation (including bodies, hands and faces), trained by mimicking the output of hand-, body- and face-pose experts.
Improved pose proposals integration for multi-person 2D and 3D pose detection in natural images.
Benchmark associated with the 3DV2020 paper of the same name.
A strategy to learn a stream that takes only RGB frames as input but leverages both appearance and motion information from them.
713 video clips from YouTube of mimed actions for a subset of 50 classes from the Kinetics400 dataset.
Contains 39,982 videos, with more than 1,000 examples for each action of 35 categories.