|Theo Cachet, Julien Perez, Christopher Dance|
|Proceedings of the 38th International Conference on Machine Learning (ICML), PMLR 139:2376-2387, 2021|
In few-shot imitation, an agent is given a few demonstrations of a previously unseen task, and must then successfully perform that task. We propose a novel approach to learning few-shot imitation agents that we call demonstration conditioned reinforcement learning (DCRL). Given a training set consisting of demonstrations, reward functions and transition distributions for multiple tasks, the idea is to define a policy that takes demonstrations and current state as inputs, and to train this policy to maximize the average of the cumulative reward over the set of training tasks. Compared to concurrent approaches, DCRL has several advantages, such as the ability to improve upon suboptimal demonstrations, to operate given state-only demonstrations, and to cope with a domain shift between the demonstrator and the agent. Moreover, we show that DCRL outperforms methods based on behaviour cloning by a large margin, on navigation tasks and on robotic manipulation tasks from the Meta-World benchmark.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.