Factored Adaptation for Separating Speaker and Environment Variability – Mike Seltzer (Microsoft Research)

April 17, 2012 all-day

View Seminar Video
Acoustic model adaptation can reduce the degradation in speech recognition accuracy caused by mismatch between the speech seen at runtime and that seen in training. This mismatch is caused by many factors, including as the speaker and the environment. Standard data-driven adaptation techniques address any and all of these differences blindly. While this is a benefit, it can also be a drawback as its unknown precisely what mismatch is being compensated. This prevents the transforms from being reliably reused across sessions of an application that can be used in different environments such as voice search on a mobile phone. In this talk, I’ll discuss our recent research in factored adaptation, which jointly compensates for acoustic mismatch in a manner that enables multiple sources of variability to be separated. By performing adaptation in this way, we can increase the utility of the adaptation data and more effectively reuse transforms across user sessions. The effectiveness of the proposed approach will be shown on a series of experiments on a small vocabulary noisy digits task and a large vocabulary voice search task.
Mike Seltzer received the Sc.B. with honors from Brown University in 1996, and M.S. and Ph.D. degrees from Carnegie Mellon University in 2000 and 2003, respectively, all in electrical engineering. From 1996 to 1998, he was an applications engineer at Teradyne, Inc., Boston, MA working on semiconductor test solutions for mixed-signal devices. From 1998 to 2003, he was a member of the Robust Speech Recognition group at Carnegie Mellon University. In 2003, Dr. Seltzer joined the Speech Technology Group at Microsoft Research, Redmond, WA. In 2006, Dr. Seltzer was awarded the Best Young Author paper award from the IEEE Signal Processing Society. From 2006 to 2008, he was a member of the Speech & Language Technical Committee (SLTC) and was the Editor-in-Chief of the SLTC e-Newsletter. He was a general co-chair of the 2008 International Workshop on Acoustic Echo and Noise Control and Publicity Chair of the 2008 IEEE Workshop on Spoken Language Technology. He is currently an Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing. His current research interests include speech recognition in adverse acoustical environments, acoustic model adaptation, acoustic modeling, microphone array processing, and machine learning for speech and audio applications.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing