Learning from vision and natural language

Published by NAVER LABS Europe at 24 September 2021

NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration

Date: 28^th September 2021, 11:00 am CEST

Learning from vision and natural language

Abstract: This talk focusses on the interface between vision and natural language research and is split into two parts. The first part describes how we can train a video question-answering model without using a manually annotated vision dataset, but only by watching narrated web videos. (Reference: ICCV’21 Oral paper [https://arxiv.org/abs/2012.00451]). In the second part, a method for scaling vision-text transformers for large-scale text-to-vision search will be described (Reference: CVPR’21 paper)

About the Speaker: Antoine Miech is a Research Scientist working at DeepMind. His main research interest is weakly-supervised video understanding using natural language. Prior to joining DeepMind, he completed his Ph.D. in computer vision in the WILLOW team, which is part of Inria Paris and Ecole Normale Supérieure, under the supervision of Dr. Ivan Laptev and Dr. Josef Sivic. He was an intern at Facebook AI and DeepMind. Antoine was awarded the Google Ph.D. fellowship in 2018 for his contribution to computer vision.

Learning from vision and natural language

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

Learning from vision and natural language

Learning from vision and natural language

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings