Advanced neural language models have grown ever larger and more complex, pushing forward the limits of language understanding and generation, while diminishing interpretability. The black-box nature of deep neural networks blocks humans from understanding them, as well as trusting and using them in real-world applications. This talk will introduce interpretation techniques that bridge the gap between humans and models for developing trustworthy natural language processing(NLP). I will first show how to explain black-box models and evaluate their explanations for understanding their prediction behavior. Then I will introduce how to improve the interpretability of neural language models by making their decision-making transparent and rationalized. Finally, I will discuss how to diagnose and improve models (e.g., robustness) through the lens of explanations. I will conclude with future research directions that are centered around model interpretability and committed to facilitating communications and interactions between intelligent machines, system developers, and end users for long-term trustworthy AI.
Hanjie Chen is a Ph.D. candidate in Computer Science at the University of Virginia, advised by Prof. Yangfeng Ji. Her research interests lie in Trustworthy AI, Natural Language Processing (NLP), andInterpretable Machine Learning. She develops interpretation techniques to explain neural language models and make their prediction behavior transparent and reliable. She is a recipient of the Carlos and Esther Farrar Fellowship and the Best Poster Award at the ACM CAPWIC 2021. Her work has been published at top-tier NLP/AI conferences (e.g., ACL, AAAI, EMNLP, NAACL) and selected by the National Center for Women & Information Technology (NCWIT) Collegiate Award Finalist 2021. She (as the primary instructor) co-designed and taught the course, Interpretable Machine Learning, and was awarded the UVA CS Outstanding Graduate Teaching Award and University-wide Graduate Teaching Awards Nominee (top 5% of graduate instructors). More details can be found at https://www.cs.virginia.edu/~hc9mx
The arms race to build increasingly larger, powerful language models (LMs) in the past year has been remarkable. Yet incorporating LMs effectively into practical applications that facilitate manual workflows remains challenging. I will discuss LMs’ limiting factors and our efforts to overcome them. I will start with challenges surrounding efficient and robust LM alignment. I will share insights from our recent paper “Self-Instruct” (ACL 2023), where we used vanilla (unaligned) LMs for aligning itself, an approach that has yielded some success. Then, I will move on to the challenge of tracing the output of LMs to reliable sources, a weakness that makes them prone to hallucinations. I will discuss our recent approach of ‘according-to’ prompting, which steers LMs to quote directly from sources observed in its pre-training. If time permits, I will discuss our ongoing project to adapt LMs to interact with web pages. Throughout the presentation, I will highlight our progress, and end with questions about our future progress.
Daniel Khashabi is an assistant professor in computer science at Johns Hopkins University and the Center for Language and Speech Processing (CLSP) member. He is interested in building reasoning-driven modular NLP systems that are robust, transparent, and communicative, particularly those that use natural language as the communication medium. Khashabi has published over 40 papers on natural language processing and AI in top-tier venues. His work touches upon developing. His research has won the ACL 2023 Outstanding Paper Award, NAACL 2022 Best Paper Award, research gifts from the Allen Institute for AI, and an Amazon Research Award 2023. Before joining Hopkins, he was a postdoctoral fellow at the Allen Institute for AI (2019-2022) and obtained a Ph.D. from the University of Pennsylvania in 2019.