Experiments in Unsupervised Entropy-Based Corpus Segmentation

Published by NAVER LABS Europe at 6 April 2013

Andre Kempe

Proc. CoNLL'99, Bergen, Norway, pp. 7-13

The paper presents an entropybased approach to segment a corpus into words, when no additional information about the corpus or the language, and no other resources such as a lexicon or grammar are available. To segment the corpus, the algorithm searches for separators, without knowing a priori by which symbols they are constituted. Good results can be obtained with corpora containing ‘clearly perceptible’ separators such as blank or newline.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Experiments in Unsupervised Entropy-Based Corpus Segmentation

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings