Learning from Mistakes: A Recent History of Negative Results in ASR – Nelson Morgan (International Computer Science Institute)
Abstract
Despite significant progress in speech recognition technology over the last 20 years, we sometimes observe results that do not match our preconceptions. Of course, this is often due to poor design or buggy code. However, when experiments are based upon solid motivations and sound methodology, the the inquiry can often be pursued past the initial results for diagnostic purposes. This sometimes leads to better understanding, which in turn can eventually be instrumental in improving performance.In this talk I will discuss some cases of negative results in speech recognition from the last few years, and speculate on what we should have learned from the experiences. Examples will include:An addendum to the famous ARPA error rate reduction graphsIncreasing the error rate with channel normalizationIn search of the “Speechgram” – trying to find the “important” regions in the speech time series; also, learning what is deficient in our current Switchboard systems.I will conclude by summarizing some of the areas of opportunity that are suggested by these experiences.
Biography
Nelson Morgan began his professional career as a recording engineer,
working with music and film (he was a technical consultant on Godfather Part II, for instance), until returning to school full time in 1977. He received B.S., M.S., and PhD degrees from the EECS Dept. at UC Berkeley in 1977, 1979, and 1980 respectively.From 1980 to 1984, he conducted and directed research in speech analysis, synthesis, and recognition, as well as DSP architectures, at National Semiconductor in Santa Clara. From 1984 to 1988 he worked at the EEG Systems Lab in San Francisco on the analysis of brain waves collected in controlled experiments on cognitive behaviors. In both jobs he developed connectionist approaches to signal analysis.Since 1988, Nelson Morgan has led a computer engineering research department at the International Computer Science Institute (ICSI), a non-profit research laboratory closely associated with the EECS Dept. at UC Berkeley. The research in this group spans a vertical slice from algorithms through architectures down to hardware, and has included both the development of algorithmic approaches such as HMM/ANN hybrid systems (with Herve Bourlard) and RASTA (with Hynek Hermansky), and a vector microcomputer that is being used to support this work (developed in collaboration with John Wawrzynek). His current professional interests are focused on the integration of perceptual and statistical models for speech recognition. In July of 1991 Dr. Morgan also received an appointment as an Adjunct Professor in EECS at UC Berkeley. He co-teaches a Berkeley course called “Audio Signal Processing in Human Machines” with Ben Gold, and the two of them are currently writing a textbook for the course.