HTML-to-XML Migration by Means of Sequential Learning and Grammatical Inference

Published by NAVER LABS Europe at 6 April 2013

IJCAI 05 Workshop on Grammatical Inference Applications, Edinburgh, Scotland, 30 July, 2005

We consider the problem of document conversion from the layout-oriented HTML into a semantic-oriented XML annotation. An important fragment of the conversion problem can be reduced to the sequential learning framework, where source tree leaves are labeled with XML tags. We review sequential learning methods developed for the NLP applications, including the Naive Bayes and Maximum entropy. Then we extend these methods with the hidden markov model (HMM) that injects the transition probabilities into the leaf classification function. Finally, we address the issue of HMM topology. We adopt grammatical inference methods to induce the HMM topology and show how to extend the sequential learning methods accordingly. We test all methods on a particular conversion case and report the evaluation results.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

HTML-to-XML Migration by Means of Sequential Learning and Grammatical Inference

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings