Machine learning (ML) is a driving force for many successful applications in Artificial Intelligence. ML pipelines ensure guarantees on the entirety of the system (i.e., horizontal certification) as well as on each single component (i.e., vertical certification). The horizontal certification covers the full pipeline from data acquisition to data visualization. Moreover, it spans over user-centered, technical, financial, and regulatory aspects of the system. The vertical certification exploits the theory of ML to guarantee error bounds, sampling complexity, energy consumption, execution time, time-to-think, and memory and communication demands. The understandability of an ML pipeline in its entirety requires the collaboration of researchers from the database and the ML communities.
ETMLP workshop will examine the aforementioned opportunities and their associated challenges. The main objective of this workshop is to create a forum where researchers from machine learning, data management, and practitioners engage with ideas around explainability and certified trustworthiness of ML pipelines, at the pipeline level, as well as the component level.
The ultimate goal of the workshop is to discuss recommendations for further work in science and industry and society regarding explainable ML pipelines.
Workshop date: 30 March 2020
20 December 2019 12 January 2020, 11:59PM CET
Notification of outcome: 31 January 2020, 11:59PM CET (before EDBT 2020 early registration deadline)
Camera ready due: 4 February 2020, 11:59PM CET
We invite authors to submit either of the following:
Regular papers should be maximally 6 pages in length. Abstracts should be maximally one page long. Both page limits exclude references. Abstracts will be only considered for an oral presentation at the workshop and won’t be included in the proceedings of the workshop. Papers must follow the latest ACM Proceedings format 2020 (sigconf double-column). The ETMLP 2020 workshop is single-blind.
Submissions will be handled through EasyChair: https://easychair.org/conferences/?conf=etmlp2020
Monday March 30, 2020, in Scandic Falkoner Congress Center, room 205
13:00-13:45 Keynote by Yanlei Diao
13:45-14:45 Session 1
15:00-16:00 Panel on Explainability for Trustworthy ML Pipelines
16:00-16:40 Session 2
16:40-17:00 Closing remarks
Sihem Amer-Yahia (CNRS, France)
Vassilis Christophides (University of Crete, Greece and Institute for Advanced Studies, Cergy France)
Joao Comba (UFRGS,Brazil)
Yanlei Diao (Ecole Polytechnique, France)
Fosca Giannotti (Information Science and Technology Institute A. Faedo, Italy)
Dimitrios Gunopulos (National and Kapodistrian University of Athens, Greece)
Thomas Liebig (Materna Information and Communications SE, Germany)
Adrian Mos (NAVER LABS Europe, France)
Dino Pedreschi (University of Pisa, Italy)
Nico Piatkowski (TU Dortmund, Germany)
Stefan Rüping (Fraunhofer Institute for Intelligent Analysis and Information Systems, Germany)
Eric Simon (SAP, France)
Divesh Srivastava (AT&T Labs, USA)
Thibaut Thonet (NAVER LABS Europe, France),
Ioannis Tsamardinos (University of Crete, Greece)
Title: Model Learning and Explanation Discovery for Exploring Large Datasets
Abstract: There is an increasing gap between fast growth of data and limited human ability to comprehend data. Consequently, there has been a growing demand for analytics tools that can bridge this gap and help the user retrieve high-value content from data. In this talk, I present two analytics tools that our team has developed to address the above challenge. First, I will introduce AIDEme, a scalable interactive data exploration system for efficiently learning a user interest pattern over a large dataset. The system is cast in a principled active learning (AL) framework, which iteratively presents strategically selected records for user labeling, thereby building an increasingly-more-accurate model of the user interest. However, existing AL techniques experience slow convergence when learning the user interest on large datasets. To overcome the problem, AIDEme explores properties of the user labeling process and the class distribution of observed data to design new AL algorithms, which come with provable results on model accuracy and approximation, and have evaluation results showing much improved convergence over existing AL methods while maintaining interactive speed. Second, I will introduce EXAD, a new system that is designed to not only identify interesting patterns from large amounts of data but also produce a concrete explanation for the model learned. Such explanations will provide more insightful information to the user and enable timely action or better strategies in the future.
Bio: Yanlei Diao is Professor of Computer Science at Ecole Polytechnique, France and the University of Massachusetts Amherst, USA. Her research interests lie in big data analytics and scalable intelligent information systems, with a focus on interactive data exploration, explainable anomaly detection, optimization in cloud analytics, data streams, and uncertain data management. She received her PhD in Computer Science from the University of California, Berkeley in 2005. Prof. Diao was a recipient of the 2016 ERC Consolidator Award, 2013 CRA-W Borg Early Career Award (one female computer scientist selected each year for outstanding contributions), IBM Scalable Innovation Faculty Award, and NSF Career Award. She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin and Technische Universitaet Darmstadt. She has served as Editor-in-Chief of the ACM SIGMOD Record, Associate Editor of ACM TODS, Chair of the ACM SIGMOD Research Highlight Award Committee, and member of the SIGMOD and PVLDB Executive Committees. She was PC Co-Chair of IEEE ICDE 2017 and ACM SoCC 2016, and served on the organizing committees of SIGMOD, PVLDB, and CIDR, as well as on the program committees of many international conferences and workshops.
All rights reserved © NAVERLABS Europe