Seminars

Oct
14
Fri
He He (New York University) “What We Talk about When We Talk about Spurious Correlations in NLP” @ Hackerman Hall B17
Oct 14 @ 12:00 pm – 1:15 pm

Abstract

Model robustness and spurious correlations have received increasing attention in the NLP community, both in methods and evaluation. The term “spurious correlation” is overloaded though and can refer to any undesirable shortcuts learned by the model, as judged by domain experts.

When designing mitigation algorithms, we often (implicitly) assume that a spurious feature is irrelevant for prediction. However, many features in NLP (e.g. word overlap and negation) are not spurious in the sense that the background is spurious for classifying objects in an image. In contrast, they carry important information that’s needed to make predictions by humans. In this talk, we argue that it is more productive to characterize features in terms of their necessity and sufficiency for prediction. We then discuss the implications of this categorization in representation, learning, and evaluation.

Biography

He He is an Assistant Professor in the Department of Computer Science and the Center for Data Science at New York University. She obtained her PhD in Computer Science at the University of Maryland, College Park. Before joining NYU, she spent a year at AWS AI and was a post-doc at Stanford University before that. She is interested in building robust and trustworthy NLP systems in human-centered settings. Her recent research focus includes robust language understanding, collaborative text generation, and understanding capabilities and issues of large language models.

Feb
3
Fri
Sasha Rush (Cornell University) “Pretraining Without Attention” @ Hackerman Hall B17
Feb 3 @ 12:00 pm – 1:15 pm

Abstract

Transformers are essential to pretraining. As we approach 5 years of BERT, the connection between attention as architecture and transfer learning remains key to this central thread in NLP. Other architectures such as CNNs and RNNs have been used to replicate pretraining results, but these either fail to reach the same accuracy or require supplemental attention layers. This work revisits the semanal BERT result and considers pretraining without attention. We consider replacing self-attention layers with recently developed approach for long-range sequence modeling and transformer architecture variants. Specifically, inspired by recent papers like the structured space space sequence model (S4), we use simple routing layers based on state-space models (SSM) and a bidirectional model architecture based on multiplicative gating. We discuss the results of the proposed Bidirectional Gated SSM (BiGS) and present a range of analysis into its properties. Results show that architecture does seem to have a notable impact on downstream performance and a different inductive bias that is worth exploring further.

Biography

Alexander “Sasha” Rush is an Associate Professor at Cornell Tech. His work is at the intersection of natural language processing and generative modeling with applications in text generation, efficient inference, and controllability. He has written several popular open-source software projects supporting NLP research and data science, and works part-time as a researcher at Hugging Face. He is the secretary of ICLR and developed software used to run virtual conferences during COVID. His work has received paper and demo awards at major NLP, visualization, and hardware conferences, an NSF Career Award, and a Sloan Fellowship. He tweets and blogs, mostly about coding and ML, at @srush_nlp.
Oct
27
Fri
Sharon Goldwater (University of Edinburgh) “Analyzing Representations of Self-Supervised Speech Models” @ Hackerman Hall B17
Oct 27 @ 12:00 pm – 1:15 pm

Abstract

Recent advances in speech technology make heavy use of pre-trained models that learn from large quantities of raw (untranscribed) speech, using “self-supervised” (ie unsupervised) learning. These models learn to transform the acoustic input into a different representational format that makes supervised learning (for tasks such as transcription or even translation) much easier. However, *what* and *how* speech-relevant information is encoded in these representations is not well understood. I will talk about some work at various stages of completion in which my group is analyzing the structure of these representations, to gain a more systematic understanding of how word-level, phonetic, and speaker information is encoded.

Biography

Sharon Goldwater is a Professor in the Institute for Language, Cognition and Computation at the University of Edinburgh’s School of Informatics. She received her PhD in 2007 from Brown University and spent two years as a postdoctoral researcher at Stanford University before moving to Edinburgh. Her research interests include unsupervised and minimally-supervised learning for speech and language processing, computer modelling of language acquisition in children, and computational studies of language use.  Her main focus within linguistics has been on the lower levels of structure including phonetics, phonology, and morphology.Prof. Goldwater has received awards including the 2016 Roger Needham Award from the British Computer Society for “distinguished research contribution in computer science by a UK-based researcher who has completed up to 10 years of post-doctoral research.” She has served on the editorial boards of several journals, including Computational Linguistics, Transactions of the Association for Computational Linguistics, and the inaugural board of OPEN MIND: Advances in Cognitive Science. She was a program chair for the EACL 2014 Conference and chaired the EACL governing board from 2019-2020.

Center for Language and Speech Processing