NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration
Date: 1st July 2021, 10:00 am (GMT +02.00)
About the speaker: Sebastian Hofstätter is a PhD student and research assistant at the Vienna University of Technology (TU Wien) supervised by Prof. Allan Hanbury. He works in the field of Information Retrieval on efficient & interpretable neural re-ranking and effective dense retrieval models. His goal is to make neural techniques in IR accessible to a large audience. Therefore, he studies and tries to optimize the cost-effectiveness trade-off from multiple angles, so that everyone can deploy those techniques.
Abstract: A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows. The neural IR community made great advancements in training effective dual-encoder dense retrieval (DR) models recently. A dense text retrieval model uses a single vector representation per query and passage to score a match, which enables low-latency first-stage retrieval with a nearest neighbour search. Increasingly common, training approaches require enormous compute power, as they either conduct negative passage sampling out of a continuously updating refreshing index or require very large batch sizes.
In this talk we first look at the problem setting of dense retrieval; new training techniques that become possible because of the vector scoring workflow; and finally, how we employ knowledge distillation & efficient sampling techniques to greatly improve the quality and zero-shot transfer ability of efficient dense retrieval models.