Document Layout Analysis (DLA) aims at associating to a page image, a structured output corresponding to the hierarchical structure(s) of the page (its regions) and aims also at categorizing these regions (such as line, paragraphs, headings).
While Structured Machine Learning provides some tools to capture context (such as Graph Neural Networks recently ), decisions are eventually taken at very low level (input units: pixels, words), and post-processing is often required and often task-specific.
Ideally, a structured output (a graph in general) should be directly generated by the method.
In order to tackle this structured output problem, we would like to learn how to mimic what a human being does when creating ground-truth material for these tasks. The human annotation is composed of a sequence of operations, which can be learned by a system, especially a Reinforcement Learning (RL) system. The agent (in terms of RL) will play the role of the human annotator, and perform actions the a human annotator will do in order to create ground truth data.
A similar approach was recently tested on the problem of chip design . This problem may be considered, to a certain extent, as being similar to document layout problems.
NAVER LABS Europe has full-time positions, PhD and PostDoc opportunities throughout the year which are advertised here and on international conference sites that we sponsor such as CVPR, ICCV, ICML, NeurIPS, EMNLP etc.
NAVER LABS Europe is an equal opportunity employer.
NAVER LABS are in Grenoble in the French Alps. We have a multi and interdisciplinary approach to research with scientists in machine learning, computer vision, artificial intelligence, natural language processing, ethnography and UX working together to create next generation technology and services that deeply understand users and their contexts.