Workshop on methods and measures for explainability and trustworthiness of machine learning pipelines
Machine Learning (ML) pipelines are end-to-end systems which assemble various data-centric components to ensure that the ML product will work for the end-user. Examples of these kinds of systems are KeystoneML and SystemML. Complex ML pipelines need to be transparent and explainable, so that the end-user can understand how the final output is generated as a product of different functionalities in the pipeline. NAVER LABS Europe (NLE) is organizing a workshop on this topic on 30th March 2020, to create a forum for researchers and data practitioners to discuss important aspects of explainability in ML pipelines. In this post, we elaborate on the topic and the workshop objectives. If you’d like to submit (which we hope you will!) the deadline is December 20th 2019.
To qualify as explainable, an ML pipeline needs to provide guarantees on the entirety of the system (horizontal certification) as well as on each single component (vertical certification). The horizontal certification covers the full pipeline from data acquisition and data preparation to data presentation and visualization. It also spans user-centered and regulatory aspects of the system. The vertical certification exploits ML theories to ensure error bounds, sampling complexity, energy consumption, execution time, time-to-think and memory and communication demands.
The “Explainability for Trustworthy ML Pipelines” workshop (ETMLP 2020) will examine the opportunities and difficulties of vertical and horizontal guarantees and their associated challenges. The main objective is to engage researchers and practitioners from machine learning and data management with ideas around explainability and certified trustworthiness of ML pipelines, at the component level as much as at the pipeline level.
Why “explainability” matters in ML pipelines
To date, ML pipelines have often been compiled by assembling separate components (HILDA 2019 and SIGMOD 2019 tutorial). The pipeline discovers clusters of users, finds interesting clusters and then visualizes them as shown in the architecture below. As each component only addresses an isolated part of the user data analytics pipeline, the compiled pipeline impedes analysts from obtaining end-to-end user-centric insights directly from raw user data. Unsupervised methods such as subspace clustering [Agrawal et al., ACM 1998] and community detection [Girvan and Newman, 2002] are employed to generate up to millions of user clusters, but it’s difficult for analysts to pinpoint interesting subsets of results in such a voluminous search space. On the other hand, tools such as Vexus [Amer-Yahia et al., ICDE 2018] and TextTile [Felix et al., TVCG 2017] offer a visual analytics methodology to explore user clusters as first class citizens, but it’s not clear how those clusters should first be discovered and analysts need to connect different analytical tools together (e.g. provide the output of a clustering or exploration method as the input for a visualization method) to manually build a pipeline.
Fragmented pipelines have several drawbacks including the inevitable learning cost. Learning cost means that fragmented pipelines lack explainability and analysts have difficulty in learning how the results of ensuing components originate from previous ones.
How to approach explainability
Explainability in ML pipelines can be looked at from both the user and system perspective. From the user side, we’re interested in understanding what can be done to help users comprehend learned models and inspect their applications (i.e. explainable AI). From the system perspective, we’re interested in knowing how the learned models can be characterized and finally certified. The trustworthiness of ML pipelines is achieved by combining theoretical insights about the estimation errors and the success probabilities of the underlying ML methods. Errors propagate through the pipeline which can disturb the final result in an unpredictable manner so we’re interested in the precise quantification of errors and an understanding of how they propagate through a system. Error bounds and randomized learning techniques are accompanied by success probabilities or uncertainties, which may result in a wrong interpretation of the final outcome so it’s really important to integrate such uncertainties into the explanation process.
Horizontal and vertical certifications
First we focus on the ML pipeline in its entirety i.e. a horizontal perspective. The goal here is to verify the explainability quality of the output as a function of the input data, regardless of internal connections in the pipeline. The focus of horizontal guarantees is not limited to theoretical bounds on the error and success probabilities, but is also expanded to bounds on the memory and energy consumption. While a plethora of proven bounds for many ML models exist, the challenging question is how they can be communicated with application experts in an intuitive and useful way. Even in a successful scenario of implementing the theoretical bounds in a particular programming paradigm and hardware configuration, the next natural question is what kind of testing procedures should be employed for certifying such implementation. A core concept in a testing procedure is to verify if the pipeline has successfully captured the relationship between changes in the input data and changes in the learning outcome.
In horizontal guarantees, we also look at the explainability of each single component in an ML pipeline. These pipelines start with “data acquisition” which can function on snapshots or streams of data. ML theories and applications investigate efficient methods of data description, data compression, feature extraction and selection, and sampling. However, a challenging question is how these components can be explainable without hurting their performance. In other words, can we build an explanation-by-design data acquisition approach? Moreover, data impurities may travel through an ML pipeline from data acquisition to other consecutive components, and impact the quality of the pipeline downstream. Here, the question is how can data cleaning methods help to remove impurities and how can these impurities be explained using existing outlier detection algorithms? On top of that there’s often a data presentation component such as “data visualization” at the end of an ML pipeline. Visualization has proven to be an effective approach for making ML models interpretable by humans however, it’s still not clear how visualizations can achieve maximal explainability.
Societal impacts of explainable ML pipelines
The issue of responsibility for the data and services built upon data covers the whole spectrum of an ML pipeline and companies need a clear policy that governs this spectrum. The policy should include how to measure quality with testing routines. It should also rule data rights. The set of best-practice procedures for companies should be recognized and become easy-to-use. As a result of regulatory digital privacy legislation (e.g., GDPR in Europe), the donors and owners of data should be informed about the storage and use of their data. System users should be sufficiently informed so that they are able to assess the outputs of ML pipelines such as recommendations, rankings and statistics.
User-centered impact of explainable ML pipelines
Human analysts are part of the ML analysis loop and may interact at all stages. Interactions, intermediate choices, insights and decisions need to be made transparent and documented with their potential impact. Most ML pipelines are fragmented where the analyst manually migrates from one component to the other and explanations in such pipelines are of crucial importance to guide analysts in this iterative process. Explanations can be provided in a myriad of ways and can be data-driven, interaction-driven and algorithm-driven. At the same time, explanation overload should not hurt ML performance and robustness.
At ETMLP 2020 we hope to achieve a more comprehensive view of the methods and measures for explainability and trustworthiness of machine learning pipelines.
Topics of interest for the workshop include, but are not limited to, challenges of explainable AI (XAI), explaining black-box ML models, interpretable machine learning, error and uncertainty bounds, explanation interfaces, algorithmic experience, models for explainable recommendations, traceability and provenance of ML pipelines, visual analytics for enabling explanations, explainability for debugging ML pipelines, outlier explanation, variability explanation, explainable human-computer interaction, interpretability and explainability by design and explanation learning and explainability evaluation.
The EMTLP workshop is organized in collaboration with Professor Katharina Morik from the Technical University of Dortmund who is co-chair with us from NAVER LABS Europe. It will be in Copenhagen Denmark, on March 30, 2020, in conjunction with the 23rd International Conference on Extending Database Technology (EDBT). The deadline for submitting papers is 20 December 2019, 11:59PM CET.
The program committee is comprised of world-class experts from the domain: Sihem Amer-Yahia (CNRS, France), Francesco Bonchi (ISI Foundation, Italy, and Eurecat Barcelona, Spain), Vassilis Christophides (University of Crete, Greece and Institute for Advanced Studies, Cergy France), Joao Comba (UFRGS, Brazil), Yanlei Diao (Ecole Polytechnique, France), Fosca Giannotti (Information Science and Technology Institute A. Faedo, Italy), Dimitrios Gunopulos (National and Kapodistrian University of Athens, Greece), Thomas Liebig (Materna Information and Communications SE, Germany), Adrian Mos (NAVER LABS Europe), Dino Pedreschi (University of Pisa, Italy), Nico Piatkowski (TU Dortmund, Germany), Stefan Rüping (Fraunhofer Institute for Intelligent Analysis and Information Systems, Germany), Eric Simon (SAP, France), Divesh Srivastava (AT&T Labs, USA), Thibaut Thonet (NAVER LABS Europe) and Ioannis Tsamardinos (University of Crete, Greece).