People and robots looking at and speaking to each other

Published by NAVER LABS Europe at 4 March 2015

Speaker: Radu Horaud, director of research at Perception project, INRIA Rhône-Alpes, Montbonnot, France

Abstract: Audio-visual integration has been an active research topic, in particular for disambiguating the audio modality based on visual information, for example, leap reading to improve speech recognition performance. We address the more general problem of how to combine visual and auditory data within the task of cognitive interactions between an artificial agent (a robot) and a group of people. The overall task is to be able to retrieve a multi-party, multi-modal dialog and to allow a robot to purposively interact with people. One key ingredient is to properly align images with speech in an unconstrained setting. Modern computer vision and signal processing methods use high-dimensional descriptors to represent images and sounds. Therefore, one important task is to extract low-dimensional latent information from these high-dimensional observations, for example, to be able to keep track over time of face locations, face orientations, as well as clean speech signals that are emitted by individual speakers. We propose a novel high-dimensional-to-low-dimensional mapping model and we briefly describe the associated methodology for learning the model parameters, based on mixture- and latent-variable models and on EM inference. We then illustrate an instance of this model that robustly and efficiently aligns speech utterances with images of faces.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

People and robots looking at and speaking to each other

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings