Analyzing Information Flow in Transformers

Published by NAVER LABS Europe at 30 January 2020

Seminars at NAVER LABS Europe are open to the public but space is limited. Please register

Date: 30^th January 2020, 11:00 AM-12:00 PM

Speaker: Lena Voita , Yandex Research, University of Amsterdam

Abstract :
We will discuss what, how and why Transformers learn by analyzing
1. the mechanisms the model uses to encode different kinds of information;
2. how training objective defines information flow in a model.
First, we will start with an in-depth analysis of multi-head attention. Using attribution methods, we will assess the importance of individual heads and will show that the most important heads play interpretable roles. Surprisingly, all the rest of the heads are redundant and, using our novel heads-pruning method, can be pruned with almost no loss in translation quality.
Then, we will look at how the representations of individual tokens in the Transformer evolve between layers under different learning objectives: MT, LM and MLM (BERT-style). While previous work mostly used so-called ‘probing tasks’ and has made some interesting observations, an explanation of the process behind the observed behavior has been lacking. I will attempt to explain more generally why such behavior is observed by characterizing how the learning objective determines the information flow in the model. I look at this task from the information bottleneck perspective on learning in neural networks and will show that patterns in information flow are substantially different. For example, while LMs gradually forget past when forming predictions about future, for MLMs the evolution proceeds in two stages of context encoding and token reconstruction.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Analyzing Information Flow in Transformers

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings