NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration
Learning from vision and natural language
Abstract: This talk focusses on the interface between vision and natural language research and is split into two parts. The first part describes how we can train a video question-answering model without using a manually annotated vision dataset, but only by watching narrated web videos. (Reference: ICCV’21 Oral paper [https://arxiv.org/abs/2012.00451]). In the second part, a method for scaling vision-text transformers for large-scale text-to-vision search will be described (Reference: CVPR’21 paper)
About the Speaker: Antoine Miech is a Research Scientist working at DeepMind. His main research interest is weakly-supervised video understanding using natural language. Prior to joining DeepMind, he completed his Ph.D. in computer vision in the WILLOW team, which is part of Inria Paris and Ecole Normale Supérieure, under the supervision of Dr. Ivan Laptev and Dr. Josef Sivic. He was an intern at Facebook AI and DeepMind. Antoine was awarded the Google Ph.D. fellowship in 2018 for his contribution to computer vision.