NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration.
Date: 9th February 2021, 10:00 am (GMT +01.00)
Multi-modal learning for robot perception
Speaker: Andrea Cavallaro is Professor of Multimedia Signal Processing and the founding Director of the Centre for Intelligent Sensing at Queen Mary University of London, UK. He is Fellow of the International Association for Pattern Recognition (IAPR) and Turing Fellow at the Alan Turing Institute, the UK National Institute for Data Science and Artificial Intelligence. He is Editor-in-Chief of Signal Processing: Image Communication; Chair of the IEEE Image, Video, and Multidimensional Signal Processing Technical Committee; an IEEE Signal Processing Society Distinguished Lecturer; an elected member of the IEEE Video Signal Processing and Communication Technical Committee; and a Senior Area Editor for the IEEE Transactions on Image Processing. He is currently the coordinator of the projects CORSMAL (Collaborative object recognition, shared manipulation and learning, 2019-22) and GraphNEx (Graph Neural Networks for Explainable Artificial Intelligence, 2021-24).
Prof. Cavallaro received his Ph.D. in Electrical Engineering from the Swiss Federal Institute of Technology (EPFL), Lausanne, in 2002. He was a Research Fellow with British Telecommunications (BT) in 2004/2005 and was awarded the Royal Academy of Engineering Teaching Prize in 2007; three student paper awards on target tracking and perceptually sensitive coding at IEEE ICASSP in 2005, 2007 and 2009; and the best paper award at IEEE AVSS 2009. He is a past Area Editor for the IEEE Signal Processing Magazine (2012-2014) and past Associate Editor for the IEEE Transactions on Image Processing (2011-2015), IEEE Transactions on Signal Processing (2009-2011), IEEE Transactions on Multimedia (2009-2010), IEEE Signal Processing Magazine (2008-2011) and IEEE Multimedia. He is a past elected member of the IEEE Multimedia Signal Processing Technical Committee and past chair of the Awards committee of the IEEE Signal Processing Society, Image, Video, and Multidimensional Signal Processing Technical Committee. Prof. Cavallaro has published over 280 journal and conference papers, one monograph on Video tracking (2011, Wiley) and three edited books: Multi-camera networks (2009, Elsevier); Analysis, retrieval and delivery of multimedia content (2012, Springer); and Intelligent multimedia surveillance (2013, Springer).
Abstract: The audio-visual analysis of the environment surrounding a robot is important for the recognition of activities, objects, interactions and intentions. In this talk I will discuss methods that enable a robot to understand a dynamic scene using only its on-board sensors in order to interact with humans. These methods include a multi-modal training strategy that leverages complementary information across observation modalities to improve the testing performance of a uni-modal system; a multi-channel technique that improves the acoustic sensing performance of a small microphone array mounted on a drone; an audio-visual tracker that exploits visual observations to guide the acoustic processing to localise people in 3D from a compact multi-sensor platform; and the estimation of the physical properties of unknown containers manipulated by humans to inform the control of a robot grasping the container during a dynamic handover. I will show several examples of multi-modal dynamic scene understanding and discuss open research directions.