Modelling the interplay between visual and textual information for computer vision applications

Published by NAVER LABS Europe at 26 April 2019

Speaker: Dimosthenis Karatzas, senior lecturer at the UAB, Barcelona, Spain & associate director of the Computer Vision Centre where he leads the Vision and Language research line.

Abstract: There is huge value in enabling machines to understand and interpret, through vision, written information, in unconstrained conditions in the world around us. At the same time, our visual interpretation capacity is jointly acquired with the linguistic structures we use to describe the world – it would be desirable for machines to be able to learn in a similar way.
My research group at the Computer Vision Centre focuses on the design of computational mo dels at the meeting point between vision and language, that efficiently exploit available textual information to solve any type of computer vision challenge. We investigate new technologies to give machines the capacity to read, as well as methods to enable computer vision models to learn by properly exploiting textual information, in or about images, and to use natural language interfaces to interact with humans.
In this talk I will be discussing recent research in the group for modelling the interplay between visual and textual information for computer vision applications. I will focus on recent work we have done on image captioning and visual question answering, while during the presentation I will touch upon scene text recognition methods, cross-modal image retrieval, joint visual-textual embeddings, semantic retrieval and self-supervised learning.

Date: 26^th April 2019

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Modelling the interplay between visual and textual information for computer vision applications

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings