Table Rows Segmentation - Naver Labs Europe


We consider the Document Understanding problem of segmenting tables in rows. We propose a method that first enumerates virtual row separator candidates and then select the correct ones thanks to a classification task, solved using supervised structured machine learning. Interestingly, the task is the joint-classification of virtual separators and real text lines. We describe and tested several alternative candidate generation methods and report the results of our experiment for each, on two different types of registry books from the 19th century.