Abstract
Carlos Lassance, Maroua Maachou, Joohee Park, Stéphane Clinchant |
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Madrid, Spain, 11-15 July, 2022 |
Abstract
BERT-based rankers have been shown very effective as rerankers in information retrieval tasks. In order to extend these models to full-ranking scenarios, the ColBERT model has been recently proposed, which adopts a late interaction mechanism. This mechanism allows for the representation of documents to be precomputed in advance. However, the late-interaction mechanism leads to large index size, as one needs to save a representation for each token of every document. In this work, we focus on token pruning techniques in order to attack this problem. We test four methods, ranging from simpler ones to the use of a single layer of attention mechanism to select the tokens to keep at indexing time. Our experiments show that for the MS MARCO-passages collection, indexes can be pruned up to 70\% of their original size, without a significant drop in performance. We also investigate on the MS MARCO-documents collection and the BEIR benchmark, which reveals some challenges for the proposed mechanism.
En 2021, NAVER France a obtenu les notes suivantes pour chacun des indicateurs :
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.