Been Kim (Google) “Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
3400 N Charles St
Baltimore, MD 21218
The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
Been Kim is a research scientist at Google Brain. Her research focuses on building interpretable machine learning. The vision of her research is to make humans empowered by machine learning, not overwhelmed by it. Before joining Brain, she was a research scientist at Institute for Artificial Intelligence (AI2) and an affiliate faculty in the Department of Computer Science & Engineering at the University of Washington. She received her PhD. from MIT.