AI Seminar at the University of Copenhagen: What is the best atomic unit to represent text?

Published by Claudia Heyer at 13 January 2020

23^rd January 2020; 1:00PM-4:00PM. Place: Aud. 01, August Krogh Building, Universitetsparken 13, 2100 Copenhagen, Denmark

Speaker: Matthias Gallé, group lead of the NAVER LABS Europe Natural Language Processing group.
Title: Text Representation Units for Neural Machine Translation
Abstract: What is the best atomic unit to represent text? This important decision lies at the heart of the intersection between the continuous representation of modern NLP and the discrete world. To understand the effectiveness of BPE, we test the hypothesis that it lies in the compression capacity of that algorithm. We test this by linking it to the broader family of dictionary-based compression algorithms. We then study character-based NMT with Transformer models, showing the consequences of using character as atomic symbols on overall translation quality, robustness as well as the need of deeper models. This is joint work with Rohit Gupta, Laurent Besacier and Marc Dymetman.

Organizer: Wouter Boomsma and Francois Lauze, Department of Computer Science, University of Copenhagen (DIKU)

The seminar is free and open for everyone.

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

AI Seminar at the University of Copenhagen: What is the best atomic unit to represent text?

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings