Assisting Medical Annotation in Swiss-Prot using Statistical Classifiers

Published by NAVER LABS Europe at 6 April 2013

Pavel Dobrokhotov, Cyril Goutte, Anne-Lise Veuthey, Eric Gaussier

International Journal of Medical Informatics 74(2-4):317-324.

Bio-medical knowledge bases are valuable resources for the research community. Original publications are the main source used to annotate them. Medical annotation in Swiss-Prot is specifically targeted at finding and extracting data about human genetic diseases and polymorphisms. Curators have to scan through hundreds of publications to select the relevant one. This workload can be greatly reduced by using bio-text mining techniques. Using a combination of Natural Language Processing (NLP) techniques and Statistical Classigiers, we achieve recall points of up to 84% on the potentially interesting documents and a precision of more than 96% in detecting irrelevant document. Careful analysis of the document pre-processing chain allows us to measure the impact of some steps on the overall result, as well as test different classifier configurations. The best combination was used to create a prototype of a search and classification tool that is currently tested by the database curators. This article is an extended version of the papers presented at MIE 2003 and ISMB 2003.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Assisting Medical Annotation in Swiss-Prot using Statistical Classifiers

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings