SMPLy Benchmarking 3D Human Pose in-the-Wild - NAVER LABS Europe

SMPLy benchmarking 3D human pose in-the-wild

Mannequin Benchmark


In order to use our annotations, you will first need to download the MannequinChallenge Dataset following these instructions: .

Following the instructions will extract 3 sets (train, test and validation) of sequences. Our annotations are in the form of a compressed JSON file. Since some recovered poses were not precise enough, they have been discarded for the evaluation. This means that some humans are not annotated in the sequences.


unzip and Python 3.6 with JSON 0.1.1. An easy way is to use conda ( in your favorite terminal like this:


~/$ sudo apt-get install unzip # needed to unzip the archive

~/$ conda env create --name MCB python=3.6 # We need to create a simple env to load json files

~/$ conda activate MCB

~/$ conda install -c jmcmurray json=0.1.1


The annotations can be downloaded here.

How to manipulate the annotations:


~/$ cd /path/to/download/

~/$ unzip

~/$ python

Python 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)

[GCC 7.3.0] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import json

>>> bchk = json.load(open('./mc_benchmark_release.json','r'))

>>> bchk.keys() #print the name of the sets

dict_keys(['test', 'train', 'validation'])


The  data architecture follows the architecture of the mannequin challenge dataset. ‘bchk’ contains 3 entries: ‘train’, ‘test’ and ‘validation’, one for each set. Each set contains the sequences identified by a string, e.g. ‘e14e8516ec32cf00’.

In our work, we compared methods regardless of the sets, but kept the architecture of the database as is for clarity. For each sequence, we identified human instances with an index.

For example, the sequence ‘e14e8516ec32cf00’ that is in the train set has 1 tracked human instance:


>>> bchk['train']['e14e8516ec32cf00'].keys() # human instances for this sequence



Now, each instance has a SMPL pose and shape that is assumed to be constant along the whole sequence, and that consist in a flattened vector of 24×3 float values for the pose:


>>> len(bchk['train']['e14e8516ec32cf00']['0']['SMPL_pose']) # SMPL parameters of the pose, the first 3 values define the global rotation of the model in the world.



and a vector of 10 float values for the shape:


>>> len(bchk['train']['e14e8516ec32cf00']['0']['betas']) # SMPL parameters of the shape



The resulting 3D pose can be obtained using the SMPL model code available here:

Additionally, every frame is provided with the following:


>>> bchk['train']['e14e8516ec32cf00']['0']['crops']['00000.png'].keys()

dict_keys(['min_x', 'min_y', 'max_x', 'max_y', 'j_vis', '2D_joints', 'local_SMPL_pose'])


The parameters min_x, min_y, max_x, max_y define the cropping around the subject in the image. The ‘2D_joints’ contain the coordinates of the projection of the joints in the image.

For instance, the cropping parameters and the 2D joints for the first frame of this sequence:

Mannequin Benchmark image

The visibility of every joint is accessible with:

>>> bchk['train']['e14e8516ec32cf00']['0']['crops']['00000.png']['j_vis']

which is a vector of 24 boolean values.

Because the camera parameters are unknown at inference time, all methods use a fictive camera for a given image and predict the SMPL pose in this coordinate system.

Considering that the fictive camera is looking in the direction of the z-axis (forward), we provide for each image the SMPL pose rotated with respect to the fictive camera to compare methods:


>>> bchk['train']['e14e8516ec32cf00']['0']['crops']['00000.png']['local_SMPL_pose']


This is the SMPL parameters that we considered for comparisons between methods when not using Procrustes Alignment.

Cite this dataset


title={SMPLy Benchmarking 3D Human Pose Estimation in the Wild},

author={Vincent Leroy and Philippe Weinzaepfel and Romain Brégier and Hadrien Combaluzier and Grégory Rogez},


booktitle = {International Conference on 3D Vision, 3DV}