NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration
Date: 16th May 2023, 10:00 am (CEST)
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question
About the speaker: I’m currently a Ph.D. student of the EDIC department of EPFL, research assistant at IDIAP research institute (advisor: Dr. James Henderson), and a postdoctoral researcher at UZH (advisor: Prof. Rico Sennrich). During my study, I worked on representation learning, multilinguality and encoding the structured data. Additionally, I did two internships at NAVERLABS Europe and Meta AI, working on machine translation and question answering. Prior to that, I received my bachelor’s degree in electrical engineering from Sharif University of Technology (also minor in computer science).
Abstract: Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this work, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer module, in which we use pre-trained models from the existing literature, and therefore, our metric can be used without further training. We show that RQUGE has a higher correlation with human judgment without relying on the reference question. RQUGE is shown to be significantly more robust to several adversarial corruptions. Additionally, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on the synthetic data generated by a question generation model and re-ranked by RQUGE.