PoseGPT: Quantizing human motion for large scale generative modeling

Published by Gregory Rogez at 23 October 2022

Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Gregory Rogez

European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October, 2022

We address the problem of action-conditioned generation of human motion sequences. Existing work falls into two categories: forecast models conditioned on observed past motions, or generative models conditioned action labels and duration only. In contrast, we generate motion conditioned on observations of arbitrary length, including none. To solve this generalized problem, we propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences. An auto-encoder first maps human motion to latent index sequences in a discrete space, and vice-versa. Inspired by the Generative Pretrained Transformer (GPT), we propose to train a GPT-like model for next-index prediction in that space; this allows PoseGPT to output distributions on possible futures, with or without conditioning on past motion. The discrete and compressed nature of the latent space allows the GPT- like model to focus on long-range signal, as it removes low-level redundancy in the input signal. Predicting discrete indices also alleviates the common pitfall of predicting averaged poses, a typical failure case when regressing continuous values, as the average of discrete targets is not a target itself. Our experimental results show that our proposed approach achieves state-of-the-art results on Hu- manAct12 – a standard but small scale dataset, on BABEL – a recent large scale MoCap dataset and on GRAB – a human-object interactions dataset.

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

PoseGPT: Quantizing human motion for large scale generative modeling

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings