PoTion: pose motion representation for action recognition

Published by NAVER LABS Europe at 5 April 2018

Vasileios Choutas, Philippe Weinzaepfel, Jérome Revaud, Cordelia Schmid

Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, US, 18-22 June, 2018

@inproceedings{choutas2018potion,
  title={Potion: Pose motion representation for action recognition},
  author={Choutas, Vasileios and Weinzaepfel, Philippe and Revaud, J{\'e}r{\^o}me and Schmid, Cordelia},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={7024--7033},
  year={2018}
}

Careers home

Most state-of-the-art methods for action recognition rely on a two-stream architecture that processes appearance and motion independently. In this paper, we claim that considering them jointly offers rich information for action recognition. We introduce a novel representation that gracefully encodes the movement of some semantic keypoints. We use the human joints as these keypoints and term our Pose moTion representation PoTion. Specifically, we first run a state-of-the-art human pose estimator and extract heatmaps for the human joints in each frame. We obtain our PoTion representation by temporally aggregating these probability maps. This is achieved by colorizing each of them depending on the relative time of the frames in the video clip and summing them. This fixed-size representation for an entire video clip is suitable to classify actions using a shallow convolutional neural network. Our experimental evaluation shows that PoTion outperforms other state-of-the-art pose representations. Furthermore, it is complementary to standard appearance and motion streams. When combining PoTion with the recent two-stream I3D approach [5], we obtain state-of-the-art performance on the JHMDB, HMDB and UCF101 datasets.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

PoTion: pose motion representation for action recognition

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings