Our participation to CLEF2008 (Ad-Hoc Track, TEL Subtask) was an opportunity to develop and assess methods that tackle multilinguilality in a principled – while rather simple – way. It was also an opportunity to demonstrate the effectiveness of the dictionary adaptation method we designed last year in the case of the domain-specific track.
Our goal was to get a single retrieval model and index for all the languages of one specific collection. However, this approach required to give weights to each language to merge dictionaries at retrieval time. While assigning such weights requires prior knowledge about the collections, the dictionary adaptation mechanism provides a partial solution to this problem, adapting weights to each query.
Unfortunately, the accumulation of some mistakes rendered our official runs relatively inefficient. In particular, a misunderstanding of the “bilingual task” definition led us not to index a significant part of the collections. In this note, we present the reasons of these mistakes and report how we partly corrected some of them in a set of extra unofficial runs whose performances are among the best ones; they demonstrate that dictionary adaptation is effective for the TEL task and corpora. This set of extra experiments is based on a simplifying assumption, that considers all bilingual tasks as really bilingual, with one source language and one unique target language (the official language of the target collection). Further work will require re-processing the collections to keep the documents we have not indexed.
We will also need to come back to a true multilingual setting by solving the issue of weighting differently the basic bilingual lexicons and monolingual thesauri.