The paper presents an entropybased approach to segment a corpus into words, when no additional information about the corpus or the language, and no other resources such as a lexicon or grammar are available. To segment the corpus, the algorithm searches for separators, without knowing a priori by which symbols they are constituted. Good results can be obtained with corpora containing ‘clearly perceptible’ separators such as blank or newline.

