In the context of human-robot dialogue, we consider the problem of annotating utterances with non-verbal behaviour. We argue for the need of a data-driven approach, that could learn those behaviours. As a first step we propose to replicate the results obtained by the current state-of-the-art, the rule-based system BEAT. Our proposed baseline, based on multi-label logistic regression with some simple features show that there is still more work needed for such an approach to produce satisfactory results. However, analyzing the results in more details reveal that for some behaviours such a simple model might already be good enough.