While large language models have advanced the state-of-the-art in natural language processing, these models are trained on large-scale datasets, which may include harmful information. Studies have shown that as a result, the models exhibit social biases and generate misinformation after training. In this talk, I will discuss my work on analyzing and interpreting the risks of large language models across the areas of fairness, trustworthiness, and safety. I will first describe my research in the detection of dialect bias between African American English (AAE) vs. Standard American English (SAE). The second part investigates the trustworthiness of models through the memorization and subsequent generation of conspiracy theories. I will end my talk with recent work in AI safety regarding text that may lead to physical harm.
Sharon is a 5th-year Ph.D. candidate at the University of California, Santa Barbara, where she is advised by Professor William Wang. Her research interests lie in natural language processing, with a focus on Responsible AI. Sharon’s research spans the subareas of fairness, trustworthiness, and safety, with publications in ACL, EMNLP, WWW, and LREC. She has spent summers interning at AWS, Meta, and Pinterest. Sharon is a 2022 EECS Rising Star and a current recipient of the Amazon Alexa AI Fellowship for Responsible AI.
The arms race to build increasingly larger, powerful language models (LMs) in the past year has been remarkable. Yet incorporating LMs effectively into practical applications that facilitate manual workflows remains challenging. I will discuss LMs’ limiting factors and our efforts to overcome them. I will start with challenges surrounding efficient and robust LM alignment. I will share insights from our recent paper “Self-Instruct” (ACL 2023), where we used vanilla (unaligned) LMs for aligning itself, an approach that has yielded some success. Then, I will move on to the challenge of tracing the output of LMs to reliable sources, a weakness that makes them prone to hallucinations. I will discuss our recent approach of ‘according-to’ prompting, which steers LMs to quote directly from sources observed in its pre-training. If time permits, I will discuss our ongoing project to adapt LMs to interact with web pages. Throughout the presentation, I will highlight our progress, and end with questions about our future progress.
Daniel Khashabi is an assistant professor in computer science at Johns Hopkins University and the Center for Language and Speech Processing (CLSP) member. He is interested in building reasoning-driven modular NLP systems that are robust, transparent, and communicative, particularly those that use natural language as the communication medium. Khashabi has published over 40 papers on natural language processing and AI in top-tier venues. His work touches upon developing. His research has won the ACL 2023 Outstanding Paper Award, NAACL 2022 Best Paper Award, research gifts from the Allen Institute for AI, and an Amazon Research Award 2023. Before joining Hopkins, he was a postdoctoral fellow at the Allen Institute for AI (2019-2022) and obtained a Ph.D. from the University of Pennsylvania in 2019.