Seminars

Sep
19
Fri
Supervising Models that are Smarter than Us – Shi Feng (Georgetown) @ Hackerman Hall B17
Sep 19 @ 12:00 pm – 1:15 pm

Abstract:

Advanced AI systems are being deployed for more and more complex tasks. To ensure reliable human oversight over AIs, we need supervision protocols that remain effective despite the increase in task complexity and model capabilities. Many approaches to this challenge involve assisting human supervisors with a second model, which can complement the human’s weaknesses. However, this can also introduce new vulnerabilities. In this talk, I will discuss new research on both methods and threat models for assisted supervision protocols. I’ll also share my thoughts on the meta-question of how we can make progress in scalable oversight, as well as how it overlaps with other AI safety research agendas.

Bio: 

Shi Feng is an assistant professor of computer science at George Washington University. He received his PhD from University of Maryland and did postdocs at University of Chicago and New York University. He works on AI safety, and his recent work focuses on mitigating the risks of AIs sabotaging human oversight and control, exploring concepts like deception, collusion, and honesty. In the past, he worked on adversarial robustness and interpretability.

Sep
26
Fri
Auditing Memorization, Dissecting Mechanisms, and Evaluating Behavior of Large Language Models – Robin Jia (USC) @ Hackerman Hall B17
Sep 26 @ 12:00 pm – 1:15 pm

Abstract:

The widespread adoption of large language models (LLMs) places a responsibility on the AI research community to rigorously study and understand them. In this talk, I will describe my group’s research on analyzing LLMs’ memorization of pre-training data, their internal mechanisms, and their downstream behavior. First, I will introduce the Hubble project, in which we have pre-trained LLMs (up to 8B parameters) on controlled pre-training corpora to understand when and how they memorize sensitive data related to copyright risks, privacy leakage, and test set contamination; we envision these models as a valuable open-source resource for scientific inquiry into LLM memorization. Next, I will describe my group’s work on understanding how language models work internally, including vignettes about how they perform arithmetic with Fourier features and how they can learn optimization subroutines for in-context learning. Finally, I will highlight a recent collaboration with USC oncologists in which we uncover LLM sycophancy issues that arise when patients ask these models for medical advice.

 Bio:

Robin Jia is an Assistant Professor of Computer Science at the University of Southern California. He received his Ph.D. in Computer Science from Stanford University, where he was advised by Percy Liang. He has also spent time as a visiting researcher at Facebook AI Research, working with Luke Zettlemoyer and Douwe Kiela. He is interested broadly in natural language processing and machine learning, with a focus on scientifically understanding NLP models. Robin’s work has received best paper awards at ACL and EMNLP.

Center for Language and Speech Processing