Non-Stationary Multi-Stream Processing Towards Robust and Adaptive Speech Recognition – Herv� Bourlard (Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) and Swiss Federal Institute of Technology at Lausanne (EPFL) Switzerland)

November 6, 2000 all-day

View Seminar Video
Multi-stream automatic speech recognition (ASR) extends the standard hidden Markov model (HMM) based approach by assuming that the speech signal is processed by different (independent) “experts”, each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. The most successful approach developed so far consists in combining the stream likelihoods through integration over all possible stream combinations (i.e., over all possible values of a hidden variable representing the position of the most reliable streams). As a particular case of this approach, subband-based speech recognition will also be discussed.
In this framework, we will introduce different mathematical models and discuss some interesting relationships with psycho-acoustic evidence. As a further extension to multi-stream ASR, we will also introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation. For each case, recognition results achieved on non-stationary noise will be presented, and possibilities of fast adaptation (of a limited number of parameters) will be illustrated through specific examples.

Herve Bourlard is Professor at the Swiss Federal Institute of Technology at Lausanne (EPFL, Switzerland) and Director of the Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP, Martigny, Switzerland,, a semi-private research institute affiliated with EPFL and performing research in speech processing, vision, and machine learning. He is also External Fellow of the International Computer Science Institute (ICSI), Berkeley, CA.
With nearly 20 years of experience in speech processing, statistical pattern recognition, applied mathematics, and artificial neural networks, Herve Bourlard is the author/coauthor of over 140 reviewed papers (and book chapters) and two books. In 1996, he received the IEEE Signal Processing Society Award for the paper (co-authored with N. Morgan, from ICSI, Berkeley) entitled “Continuous Speech Recognition — An Introduction to the Hybrid HMM/Connectionist Approach,” published in the IEEE Signal Processing Magazine in May 1995. Herve Bourlard is an IEEE Fellow “for contributions in the field of statistical speech recognition and neural networks”.
Herve Bourlard is co-Editor-in-Chief of Speech Communication, member of the IEEE Technical Committee for Neural Network Signal Processing, member of the Administration Committee of EURASIP (European Association for Signal Processing), member of the Advisory Committee of ISCA (International Speech Communication Association), appointed expert for the European Community, and member of the Foundation Council of the Swiss Network for Innovation.

Center for Language and Speech Processing