Seminars at NAVER LABS Europe are open to the public but space is limited. Please register here.
Date: 25th October 2019
Rachel Bawden, research associate from the University of Edinburgh
While Machine Translation (MT) has traditionally been performed on individual sentences, there has been a growing interest in recent years to exploit context outside of the sentence to improve translation quality. Such context can be found within the text itself (preceding and following sentences) or be information concerning the text’s production (speaker information, topic of discussion, scenario, etc.). This information can be important, if not necessary, for a variety of cases, such as discourse-level, including lexical cohesion and referential phenomena.
This talk will focus on work carried out during my PhD on contextual MT and more especially on how to adapt evaluation strategies to take into account the type of phenomena likely to be improved by the addition of context. I will present a new open-source parallel dataset of English-French bilingual MT-mediated written dialogues, complete with context and meta-data concerning the speakers and topic. The dataset was designed with three main aims in mind: (i) an analysis set for the future study of the interaction between humans and MT, (ii) a test set of spontaneous dialogues for future contextual MT models and (iii) a way of collecting human evaluation judgements of MT quality for existing MT models. On this third point, I will be presenting preliminary results for the comparison of the two MT models used to produce the dataset, one non-contextual MT model and a lightly contextual MT model, showing that the types of errors differ and that this method of collecting human evaluation judgments is viable.