Vikramjit Mitra (SRI International, Speech Technology and Research Laboratory) “Toward Robust Speech Processing Systems: Speech Perception, Gestural Phonology and Machine Learning”

November 8, 2016 @ 12:00 pm – 1:15 pm
Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218
Center for Language and Speech Processing


Speech processing applications such as speech recognition, keyword spotting, language recognition, speaker identification, etc.,  are crucial for data analytics, voice operated systems, biometrics and information triage.  Current speech processing systems perform well under matched/clean acoustic conditions but can degrade rapidly even with minor environmental changes.  In particular, systems continue to be highly sensitive to acoustic variations due to noise, reverberation, different receiving-transmitting devices, multiple-speakers, etc.  For real-world applications speech processing systems must perform reliably under greatly varying environmental conditions that are part of our everyday life.  This talk will present SRI’s recent work on improving robustness of speech processing systems by exploiting findings in speech perception, machine learning and gestural phonology.  We will talk about “speech-perception” motivated robust acoustic features developed in our lab that have demonstrated noise/channel robustness in multiple speech processing tasks.  We will present a set of features (motivated by “gestural phonology”) that dynamically estimates vocal tract constriction location and degree in time, and talk about how they were used in some speech processing tasks.  We will share details about some acoustic modeling approaches that we have found to be quite robust to background distortions.  Finally, we will present results on some standard speech processing tasks and will investigate how the proposed systems perform under varying and unseen data conditions.


Dr. Vikramjit Mitra is an Advance Research Computer Scientist in SRI International’s Speech Technology and Research (STAR) Laboratory.  He received his Ph.D. in Electrical Engineering from University of Maryland, College Park; M.S. in Electrical Engineering from University of Denver, B.E. in Electrical Engineering from Jadavpur Uiversity, India.  His research focuses on signal processing for noise/channel/reverberation, speech recognition, production/perception-motivated signal processing, information retrieval, machine learning and speech analytics.  His work has been funded by the NSF, DARPA, IARPA, AFRL, Sandia National Laboratories and others.  He is a senior member of the IEEE, an affiliate member of the SLTC and has served on NSF panels.

Center for Language and Speech Processing