Recent advances in speech technology make heavy use of pre-trained models that learn from large quantities of raw (untranscribed) speech, using “self-supervised” (ie unsupervised) learning. These models learn to transform the acoustic input into a different representational format that makes supervised learning (for tasks such as transcription or even translation) much easier. However, *what* and *how* speech-relevant information is encoded in these representations is not well understood. I will talk about some work at various stages of completion in which my group is analyzing the structure of these representations, to gain a more systematic understanding of how word-level, phonetic, and speaker information is encoded.
Sharon Goldwater is a Professor in the Institute for Language, Cognition and Computation at the University of Edinburgh’s School of Informatics. She received her PhD in 2007 from Brown University and spent two years as a postdoctoral researcher at Stanford University before moving to Edinburgh. Her research interests include unsupervised and minimally-supervised learning for speech and language processing, computer modelling of language acquisition in children, and computational studies of language use. Her main focus within linguistics has been on the lower levels of structure including phonetics, phonology, and morphology.Prof. Goldwater has received awards including the 2016 Roger Needham Award from the British Computer Society for “distinguished research contribution in computer science by a UK-based researcher who has completed up to 10 years of post-doctoral research.” She has served on the editorial boards of several journals, including Computational Linguistics, Transactions of the Association for Computational Linguistics, and the inaugural board of OPEN MIND: Advances in Cognitive Science. She was a program chair for the EACL 2014 Conference and chaired the EACL governing board from 2019-2020.