XRCE’s Participation at Patent Image Classification and Image-based Patent Retrieval Tasks of the CLEF-IP 2011

Published by NAVER LABS Europe at 7 April 2013

Gabriela Csurka, Jean-Michel Renders, Guillaume Jacquet

Conference on Multilingual and Multimodal Information Access Evaluation, Amsterdam, Netherlands, 19-22 September 2011.

The aim of this document is to describe the methods we used in the Patent Image Classification and Image-based Patent Retrieval tasks of the Clef-IP 2011 track. The patent image classification task consisted in categorizing patent images into predefined categories such as abstract drawing, graph, flowchart, table, etc. Our main aim in participating in this sub-task was to test how our image categorizer performs on this type of categorization problem. Therefore, we used SIFT-like local orientation histograms as low level features and on the top of that we built a visual vocabularies specific to patent images using Gaussian mixture model (GMM). This allowed us to represent images with Fisher Vectors and to use linear classifiers to train one-versusall classifiers. As the results show, we obtain very good classification performance. Concerning the Image-based Patent Retrieval task, we kept the same image representation as for the Image Classification task and used dot product as similarity measure. Nevertheless, in the case of patents the aim was to rank patents based on patent similarities, which in the case of pure image-based retrieval implies to be able to compare a set of images versus another set of images. Therefore, we investigated different strategies such as averaging Fisher Vector representation of an image set or considering the maximum similarity between pairs of images. Finally, we also built runs where the predicted image classes were considered in the retrieval process. For the text-based patent retrieval, we decided simply to weight differently the different fields of the patent, giving more weight to some of them, before concatenating the different fields. Monolingually, we then used the standard cosine measure, after applying the tf-idf weighting scheme, to compute the similarity between the query and the documents of the collection. To handle the multi-lingual aspect, we either used late fusion of monolingual similarities (French / English / German) or translated non-English fields into English (and then computed simple monolingual similarities).
In addition to these standard textual similarities, we also computed similarities between patents based on the IPC-categories they share and similarities based on the patent citation graph; we used late fusion to merge these new similarities with the former ones.
Finally to combine the image-based and the text-based rankings, we normalized the ranking scores and used again weighted late fusion strategy. As our expectation for the visual expert was low, we used a much stronger weight for the textual expert, than for the visual one. We have shown that while indeed the visual expert performed poorly, combined with text experts the multi-modal system outperformed the corresponding text-only based retrieval system.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

XRCE’s Participation at Patent Image Classification and Image-based Patent Retrieval Tasks of the CLEF-IP 2011

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings