A Hierarchical Model for Clustering and Categorising Documents

Published by NAVER LABS Europe at 5 April 2013

Eric Gaussier, Cyril Goutte, Kris Popat, Francine Chen

Advances in Information Retrieval -- Proceedings of the 24th BCS-IRSG European Colloquium on IR Research (ECIR-02), Glasgow, March 25-27, 2002. Lecture Notes in Computer Science 2291, pp. 229-247, Springer.

Paper

Careers home

We propose a new hierarchical generative model for textual data, where words may be generated by topic specific distributions at any level in the hierarchy. This model is naturally well suited to clustering documents in preset or automatically generated hierarchies, as well as categorising new documents in an existing hierarchy. Training algorithms are derived for both cases and illustrated on real data by clustering news stories and categorising newsgroup messages. Finally, the generated model may be used to derive a Fisher kernel expressing similarity between documents.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

A Hierarchical Model for Clustering and Categorising Documents

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings