Demonstration-conditioned reinforcement learning for few-shot imitation

Published by Christopher Dance at 18 July 2021

Christopher Dance, Julien Perez, Theo Cachet

Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139:2376-2387, 2021

Abstract

In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot imitation agents that we call demonstration conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to define a policy that takes demonstrations and current state as inputs, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Compared to concurrent approaches, DCRL has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Demonstration-conditioned reinforcement learning for few-shot imitation

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings