Karen Livescu (Toyota Technological Institute at Chicago) “What Do Pre-Trained Speech Representation Models Know? Layer-Wise Analysis and Benchmarking”

When:
December 1, 2023 @ 12:00 pm – 1:15 pm
2023-12-01T12:00:00-05:00
2023-12-01T13:15:00-05:00
Where:
Hackerman Hall B17
3400 N. Charles Street
Baltimore
MD 21218
Cost:
Free

Abstract

Pre-trained speech representation models have become ubiquitous in speech processing over the past few years.  They have both improved the state of the art and made it feasible to learn task-specific models with very little labeled data.  However, it is not well understood what linguistic information is encoded in pre-trained models and how best to apply them to downstream tasks. In this talk I will describe recent work that begins to build an understanding of the layer-wise information learned by pre-trained speech models.  We consider a number of popular pre-trained models and investigate the extent to which their layers encode spectral, phonetic, and word-level information.  The results of these analyses also suggest some ways to improve or simplify the application of pre-trained models for downstream tasks.  Finally, I will describe our efforts to benchmark model performance on a variety of spoken language understanding tasks, in order to broaden our understanding of the capabilities of state-of-the-art models.

This talk is based in part on work presented in

A. Pasad et al., “Comparative layer-wise analysis of self-supervised speech models,”ICASSP 2023.

A. Pasad et al., “What do self-supervised speech models know about words?,” arXiv:2307.00162, 2023.

S. Shon et al., “SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks,” ACL 2023.

Bio

Karen Livescu is a Professor at TTI-Chicago. She completed her PhD at MIT in 2005. She is an ISCA Fellow and a recent IEEE Distinguished Lecturer.  She has served as a program chair/co-chair for ICLR, Interspeech, and ASRU, and is an Associate Editor for TACL and IEEE T-PAMI.  Her group’s work spans a variety of topics in spoken, written, and signed language processing.

Center for Language and Speech Processing