Behavioral mode discovery for fine-tuning multimodal generative policies

Published by Jean-Michel Renders at 6 July 2026

Alberta Longhini, David Emukpere, Jean-Michel Renders, Seungsu Kim

The Forty-Third International Conference on Machine Learning (ICML), Seoul, South Korea, 6-11 July, 2026

We address the problem of fine-tuning pre-trained generative policies with reinforcement learning while preserving the multimodality of the action distributions of such policies. Current methods for fine-tuning generative policies (e.g. diffusion policies) with reinforcement learning improve task performance but tend to collapse diverse behaviors into a single reward-maximizing mode. To overcome this, we propose MD-MAD, an unsupervised mode discovery framework that uncovers latent behaviors in generative policies, together with a mutual information metric to quantify multimodality. The discovered modes allow mutual information to be used as an intrinsic reward, regularizing reinforcement learning fine-tuning to improve success rates while maintaining diverse strategies. Experiments on robotic manipulation tasks demonstrate that our method consistently outperforms conventional fine-tuning, achieving high task success while preserving richer multimodal action distributions.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

Behavioral mode discovery for fine-tuning multimodal generative policies

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings