Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding

Published by NAVER LABS Europe at 8 July 2022

NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration

Date: 8^th July 2022, 4:00 pm (CEST)

Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding

About the speaker: Leonid Boytsov is a researcher at the Bosch Center for Artificial Intelligence (BCAI) where he works on adversarial robustness for computer vision, information retrieval and extraction. He serves as an ARR action editor and co-advises several MS and PhD students.
Leonid holds a PhD from the Carnegie Mellon University in language technologies (2018) and an MSc/BSc from the Moscow State University (1997) in applied mathematics and computer science.
Overall, Leonid Boytsov has been a professional computer scientist for 25 years working on information retrieval, computer vision, speech recognition, and financial management systems. He remembers dependency parsing and gradient boosted decision trees.
An important by-product of his research is an efficient and flexible library for k-NN search codenamed NMSLIB, which was created in collaboration with several other researchers. NMSLIB has 2M+ downloads. It was adopted by Amazon and incorporated into TensorFlow similarity.

Abstract: We carry out a comprehensive evaluation of 13 recent models for ranking of long documents using two popular collections (MS MARCO documents and Robust04). Our model zoo includes two specialized Transformer models (such as Longformer) that can process long documents without the need to split them. Along the way, we document several difficulties regarding training and comparing such models. Somewhat surprisingly, we find the simple FirstP baseline (truncating documents to satisfy the input-sequence constraint of a typical Transformer model) to be quite effective. We analyze the distribution of relevant passages (inside documents) to explain this phenomenon. We further argue that, despite their widespread use, Robust04 and MS MARCO documents are not particularly useful for benchmarking of long-document models.

Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding

Related Content

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding

Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding

Related Content

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings