Abdelrahman Mohamed (Facebook) “Better Use of Un-paired Data and Pre-trained Modules for Speech Recognition”
Although training end-to-end neural speech recognition models requires large volume of transcribed data for new domains, sharing of trained neural modules is not common. In this talk, I’ll present LegoNN, a procedure for building encoder-decoder architectures with decoder modules that can be reused across various tasks, without the need for an extra joint fine-tuning phase. To achieve reusability, the interface between each module is a sequence of marginal distributions over a discrete vocabulary. We report competitive results for LegoNN models on two large scale ASR and MT tasks, while reusing modules across domains and languages. Then, I’ll talk about our other recent work on pre-training audio encoders using self-, semi-, and weakly-supervised learning methods.
Abdelrahman Mohamed is a research scientist at Facebook AI research (FAIR) in Seattle. Before FAIR, he was a principal scientist/manager in Amazon Alexa AI team. From 2014 to 2017, he was in Microsoft Research Redmond. He received his PhD from the University of Toronto with Geoffrey Hinton and Gerald Penn where he was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His research interests span Deep Learning, Spoken Language Processing, and Natural Language Understanding.