New Developments in the Use of Markov Models and Artificial Neural Networks for Speech Recognition – Herv’e Bourlard (Facult’e Polytechnique de Mons/Mons, Belgium and International Computer Science Institute/Berkeley)
Abstract
Recently it has been shown that Artificial Neural Networks (ANNs) can be used to augment speech recognizers whose underlying structure is essentially that of Hidden Markov Models (HMMs). In particular, we have shown that fairly simple layered structures, which we lately have termed “Big Dumb Neural Networks” (BDNNs), can be discriminatively trained to estimate emission probabilities for HMMs. Many (relatively simple) speech recognition systems based on this approach, and generally referred to as hybrid HMM/ANN systems, have been proved, on controlled tests, to be both effective in terms of accuracy (recent results show this hybrid approach slightly ahead of more traditional HMM systems when evaluated on both British and American English tasks, using a 20,000 word vocabulary and a trigram language model) and efficient in terms of CPU and memory run-time requirements.In this talk, after a short description of the basic HMM/ANN approach, we will first discuss some of the issues that were raised by this approach, including: use of temporal information, role of prior probabilities vs likelihoods, and language information vs acoustic information. We will then discuss some current research topics on extending these results to somewhat more complex systems, including new theoretical and experimental developments on transition-based recognition systems and training of HMM/ANN hybrids to directly maximize the global posterior probabilities.This talk will assume some background in both hidden Markov models and artificial neural networks.