12th February 2020, South England Natural Language Processing Meetup, London, UK
Speaker: Hady Elsahar
Abstract: In real-world applications — where errors cannot be tolerated — ML models deployed in production require tight monitoring of performance. This is usually done manually by continuously annotating evaluation examples to measure the model performance; such process is prohibitively expensive and slow meaning that they are unsuitable to be used as an alerting mechanism during run time.
In my talk, I’ll present a method to predict the performance drop of ML models on new examples seen during test time. In our experiments, this method was able to predict performance drops of a sentiment classifier with an error rate as low as 2.15%. At the end of the talk, I’ll leave you with a practical recipe to implement an inexpensive runtime methodology for monitoring your ML model in production.