The abundant availability of health-care data calls for effective analysis methods which help medical experts gain a better understanding of their data. While the focus has been largely on prediction,
“representation” and “exploration” of health-care data have received little attention. In this paper, we introduce CORE, a framework for representing and exploring patient cohorts. Obtaining a readable and succinct representation of health data of a cohort is challenging because cohorts often consist of hundreds of patients whose medical actions are of various types and occur at different points in time. We extend the Needleman-Wunsch algorithm for sequence matching to handle temporal sequences, and propose “trajectory families”, a customized index to efficiently compare and aggregate patient trajectories into a cohort representation. We define cohort exploration as finding similar cohorts to a given cohort. This problem is challenging because the potential number of similar cohorts is huge. We propose a two-staged approach based on limiting the search space to “contrast cohorts” and then computing their similarity to the given cohort. To speed up cohort similarity computation, we use “event sets” in the same spirit as the double dictionary encoding proposed for keyword search. We run qualitative and quantitative experiments on real data to explore the efficiency and usefulness of CORE. We show that CORE representations reduce time-to-insight from hours to seconds and help medical experts find insights better than state-of-the-art Visual Analytics tools.