Seminars

Feb
6
Mon
Sharon Levy (University of California, Santa Barbara) “Responsible AI via Responsible Large Language Models” @ Hackerman Hall B17
Feb 6 @ 12:00 pm – 1:15 pm

Abstract

While large language models have advanced the state-of-the-art in natural language processing, these models are trained on large-scale datasets, which may include harmful information. Studies have shown that as a result, the models exhibit social biases and generate misinformation after training. In this talk, I will discuss my work on analyzing and interpreting the risks of large language models across the areas of fairness, trustworthiness, and safety. I will first describe my research in the detection of dialect bias between African American English (AAE) vs. Standard American English (SAE). The second part investigates the trustworthiness of models through the memorization and subsequent generation of conspiracy theories. I will end my talk with recent work in AI safety regarding text that may lead to physical harm.

Biography

Sharon is a 5th-year Ph.D. candidate at the University of California, Santa Barbara, where she is advised by Professor William Wang. Her research interests lie in natural language processing, with a focus on Responsible AI. Sharon’s research spans the subareas of fairness, trustworthiness, and safety, with publications in ACL, EMNLP, WWW, and LREC. She has spent summers interning at AWS, Meta, and Pinterest. Sharon is a 2022 EECS Rising Star and a current recipient of the Amazon Alexa AI Fellowship for Responsible AI.

Mar
31
Fri
Emily Prud’hommeaux (Boston College) “Endangered or Just Under-Resourced? Evaluating ASR Quality and Utility When Data is Scarce” @ Hackerman Hall B17
Mar 31 @ 12:00 pm – 1:15 pm

Abstract

Despite many recent advances in automatic speech recognition (ASR), linguists and language communities engaged in language documentation projects continue to face the obstacle of the “transcription bottleneck”. Researchers in NLP typically do not distinguish between widely spoken languages that currently happen to have few training resources and endangered languages that will never have abundant data. As a result, we often fail to thoroughly explore when ASR is helpful for language documentation, what architectures work best for the sorts of languages that are in need of documentation, and how data can be collected and organized to produce optimal results. In this talk I describe several projects that attempt to bridge the gap between the promise of ASR for language documentation and the reality of using this technology in real-world settings.

Biography

Emily Prud’hommeaux is the Gianinno Family Sesquicentennial Assistant Professor in the Department of Computer Science at Boston College. She received her BA (Harvard) and MA (University of California, Los Angeles) in Linguistics, and her PhD in Computer Science and Engineering (OHSU/OGI). Her research area is natural language processing in low-resource settings, with a particular focus on endangered languages and the language of individuals with conditions impacting communication and cognition.

Center for Language and Speech Processing