SMPLy benchmarking 3D human pose estimation in the wild

Published by Vincent Leroy at 25 November 2020

Vincent Leroy, Philippe Weinzaepfel, Romain Brégier, Hadrien Combaluzier, Gregory Rogez

8th International Conference on 3D Vision (3DV), 25-28 November, 2020

Abstract

Predicting 3D human pose from images has seen great recent improvements. Novel approaches that can even predict both pose and shape from a single input image have been introduced, often relying on a parametric model of the human body such as SMPL. While qualitative results for such methods are often shown for images captured in-the-wild, a proper benchmark in such conditions is still missing, as it is cumbersome to obtain ground-truth 3D poses elsewhere than in a motion capture room. This paper presents a pipeline to easily produce and validate such a dataset with accurate ground-truth, with which we benchmark recent 3D human pose estimation methods in-the-wild. We make use of the recently introduced Mannequin Challenge dataset which contains in-the-wild videos of people frozen in action like statues and leverage the fact that people are static and the camera moving to accurately fit the SMPL model on the sequences. A total of 24,428 frames with registered body models are then selected from 567 scenes at almost no cost, using only online RGB videos. We benchmark state-of-the-art SMPL-based human pose estimation methods on this dataset. Our results highlight that challenges remain, in particular for difficult poses or for scenes where the persons are partially truncated or occluded.

Benchmark

In order to use our annotations, you will first need to download the MannequinChallenge Dataset following these instructions: https://google.github.io/mannequinchallenge/www/index.html .

Following the instructions will extract 3 sets (train, test and validation) of sequences. Our annotations are in the form of a compressed JSON file. Since some recovered poses were not precise enough, they have been discarded for the evaluation. This means that some humans are not annotated in the sequences.

Prerequisites:

unzip and Python 3.6 with JSON 0.1.1. An easy way is to use conda (https://docs.conda.io/projects/conda/en/latest/user-guide/install/) in your favorite terminal like this:

~/$ sudo apt-get install unzip # needed to unzip the archive

~/$ conda env create --name MCB python=3.6 # We need to create a simple env to load json files

~/$ conda activate MCB

~/$ conda install -c jmcmurray json=0.1.1

Annotations:

The annotations can be downloaded here.

How to manipulate the annotations:

~/$ cd /path/to/download/


~/$ unzip mc_benchmark_release.zip

~/$ python

Python 3.6.12 |Anaconda, Inc.| (default, Sep 8 2020, 23:10:56)

[GCC 7.3.0] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import json

>>> bchk = json.load(open('./mc_benchmark_release.json','r'))

>>> bchk.keys() #print the name of the sets

dict_keys(['test', 'train', 'validation'])

The data architecture follows the architecture of the mannequin challenge dataset. ‘bchk’ contains 3 entries: ‘train’, ‘test’ and ‘validation’, one for each set. Each set contains the sequences identified by a string, e.g. ‘e14e8516ec32cf00’.

In our work, we compared methods regardless of the sets, but kept the architecture of the database as is for clarity. For each sequence, we identified human instances with an index.

For example, the sequence ‘e14e8516ec32cf00’ that is in the train set has 1 tracked human instance:

>>> bchk['train']['e14e8516ec32cf00'].keys() # human instances for this sequence

dict_keys(['0'])

Now, each instance has a SMPL pose and shape that is assumed to be constant along the whole sequence, and that consist in a flattened vector of 24×3 float values for the pose:

>>> len(bchk['train']['e14e8516ec32cf00']['0']['SMPL_pose']) # SMPL parameters of the pose, the first 3 values define the global rotation of the model in the world.

72

and a vector of 10 float values for the shape:

>>> len(bchk['train']['e14e8516ec32cf00']['0']['betas']) # SMPL parameters of the shape

10

The resulting 3D pose can be obtained using the SMPL model code available here: https://smpl.is.tue.mpg.de/

Additionally, every frame is provided with the following:

>>> bchk['train']['e14e8516ec32cf00']['0']['crops']['00000.png'].keys()

dict_keys(['min_x', 'min_y', 'max_x', 'max_y', 'j_vis', '2D_joints', 'local_SMPL_pose'])

The parameters min_x, min_y, max_x, max_y define the cropping around the subject in the image. The ‘2D_joints’ contain the coordinates of the projection of the joints in the image.

For instance, the cropping parameters and the 2D joints for the first frame of this sequence:

The visibility of every joint is accessible with:

>>> bchk['train']['e14e8516ec32cf00']['0']['crops']['00000.png']['j_vis']

which is a vector of 24 boolean values.

Because the camera parameters are unknown at inference time, all methods use a fictive camera for a given image and predict the SMPL pose in this coordinate system.

Considering that the fictive camera is looking in the direction of the z-axis (forward), we provide for each image the SMPL pose rotated with respect to the fictive camera to compare methods:

>>> bchk['train']['e14e8516ec32cf00']['0']['crops']['00000.png']['local_SMPL_pose']

This is the SMPL parameters that we considered for comparisons between methods when not using Procrustes Alignment.

@inproceedings{leroy2020smply,
title={SMPLy Benchmarking 3D Human Pose Estimation in the Wild},
author={Vincent Leroy and Philippe Weinzaepfel and Romain Brégier and Hadrien Combaluzier and Grégory Rogez},
year={2020},
booktitle = {International Conference on 3D Vision, 3DV}
}

Abstract

Benchmark

Prerequisites:

Annotations:

How to manipulate the annotations:

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

SMPLy benchmarking 3D human pose estimation in the wild

Abstract

Benchmark

Prerequisites:

Annotations:

How to manipulate the annotations:

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings