Versatile Layout Understanding via Conjugate Graph - Naver Labs Europe
loader image

Abstract

Recent advances in document understanding, especially text recognition, provide new opportunities to address the page segmentation problem. In this paper, we propose a method to groups text lines into semantic objects. We model a page as a graph where nodes represent text lines and the edges their geometric relations. The logical segmentation task then refers to identify all text lines belonging to some logical subdivision of the page. We model this task as categorizing edges as relevant or not to build the targeted sub-division (sub-graph). This edge categorization is performed using structured machine learning algorithms (graph Conditional Random Field and Edge Convolutional Network). We use a connected componentsbased approach following the edge classification for aggregating the nodes. This simple approach shows very robust results for various layout and various page sub-division. We experiment on table segmentation into multiple sub-divisions (rows, columns, and cells) and minutes segmentation into resolutions. Our subdivision and page-layout oblivious approach shows near-par performance as compared to task dedicated approaches and even outperforms them in certain setups.