We address the problem of generating human motion sequences that can be conditioned on three context categories: semantic motion information, eg target action/object labels, geometric scene information and past or future motion observations. State-of-the-art methods consist of either models specialized to one single setting or unconditional models which have by nature limited applicability. In contrast, we introduce \ours, a generalized approach that can generate human motion sequences of variable length conditioned with various combinations of the three contextual categories. It can leverage large amounts of unconditional data, and be used in various settings depending on the desired control and available information. $\mathrm{Purposer}$ addresses the problem in a two-stage fashion, inspired by neural discrete representation learning. First, it encodes unconditional human motion into a discrete latent space. Second, a generative model trained for next-step prediction in this space synthesizes sequences of latent indices. Our model can be conditioned on any combination of available context. In contrast to most existing methods, ours is learning-based and does not rely on a test time optimization loop. Our experimental results show that besides offering more controllability than existing methods, our model generates motion sequences that are compelling, diverse, and coherent with the given contextual information.