Statistical speech recognition using a functional model of “hidden” processes in human speech communication – Li Deng (University of Waterloo)
In this talk, I will present a general Bayesian statistical framework for constraint-free speech recognition based on a functional model for global characteristics of human speech communication (production and perception). The model consists of a nonlinear (autosegmental-based) phonological component (which determines the structure of the speech recognizer) and a dynamic phonetic-interface component, and contains the conventional HMM-based speech model as a highly simplified and degenerated special case. I will show how the model can be efficiently parameterized, and how the model parameters can be automatically estimated using a very small amount of acoustic data of speech. Some evaluation results of the speech recognizer using TIMIT database will be presented. Finally, I will outline our current work on applying the model to multilingual speech recognition, aiming at cross-language portability (i.e. constructing speech recognizers for a target language using training speech data from only one or two source languages.)
Li Deng (S’83-M’86-SM’91) received the B.S. degree from University of Science and Technology of China in biophysics in 1982, the M.S. degree from University of Wisconsin-Madison in electrical engineering in 1984, and the Ph.D. degree from University of Wisconsin-Madison in electrical engineering in 1986.He worked on large vocabulary automatic speech recognition at INRS-Telecommunications, Montreal, Canada, from 1986 to 1989. Since 1989, he has been with Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada, where he is currently Full Professor. From 1992 to 1993, he conducted sabbatical research at Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass working on statistical models of speech production and the related speech recognition algorithms.His research interests include acoustic-phonetic modeling of speech, speech recognition, synthesis, and enhancement, speech production and perception, statistical methods for signal analysis and modeling, nonlinear signal processing, neural network algorithms, computational phonetics and phonology for the world’s languages, and auditory speech processing.