Robust Automatic Speech Recognition in the 21st Century – Richard M. Stern (Carnegie Mellon University)
Baltimore, MD 21218
Over the past decade, speech recognition technology has become increasingly commonplace in consumer, enterprise, and government applications. As higher expectations and greater demands are being placed on speech recognition as the technology matures, robustness in recognition is becoming increasingly important. This talk will review and discuss several classical and contemporary approaches that render the performance of automatic speech recognition systems and related technology robust to changes and degradations in the acoustical environment within which they operate. While distortions produced by quasi-stationary additive noise and quasi-stationary linear filtering can be largely ameliorated by “classical” techniques such as cepstral high-pass filtering as well as by techniques that develop statistical models of the distortion (such as vector Taylor series expansion), these types of approaches fail to provide much useful improvement when speech is degraded by transient or non-stationary noise such as background music or speech, or in environments that include nonlinear distortion. We describe and compare the effectiveness in difficult acoustical environments of techniques based on missing-feature compensation, combination of complementary streams of information, multiple microphones, physiologically-motivated auditory processing, and specialized techniques directed at compensation for nonlinearities, with a focus on how these techniques are applied to the practical problems facing us today.
Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Department of Electrical and Computer Engineering, the Department of Computer Science, and the Language Technologies Institute, and a Lecturer in the School of Music. Much of Dr. Stern’s current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. In addition to his work in speech recognition, Dr. Stern has worked extensively in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a Fellow of the IEEE, the Acoustical Society of America, and the International Speech Communication Association (ISCA). He was the ISCA 2008-2009 Distinguished Lecturer, a recipient of the Allen Newell Award for Research Excellence in 1992, and he served as the General Chair of Interspeech 2006. He is also a member of the Audio Engineering Society.