Automatic speech recognition is often accomplished by decoding with a well-defined language model, typically an N-gram. However, for many problems we encounter in real-life, a complete characterization of the task grammar is often unavailable due to the lack of training data. This results in a degradation of speech recognition performance, especially when the input utterances are generated spontaneously as in the case of most real-life dialogue tasks.
In this talk, we discuss a new paradigm based on a detection approach. Instead of trying to characterize the grammar entirely, we construct a family of detectors, each one is designed to locate specific speech events of interest. These events, such as keywords and key phrases, are related to the task at a higher level and can be specified even without any training data. The speech uttereance is first processed by the collection of detectors to produce a lattice. The event lattice is then parsed by incorporating more speech and language knowledge to generate multiple answers. They are then rescored to produce the recognized sentence. When the events are labeled with semantic attributes, the recognized sentence can also be "understood" to generate corresponding semantic actions. We report on experimental results with two real-life tasks and conclude that the proposed detection approach is more robust than the conventional decoding approach especially in dealing with ill-formed inputs.
Chin-Hui Lee received the B.S. degree in Electrical Engineering
from National Taiwan University, Taipei, in 1973, the M.S. degree in Engineering
and Applied Science from Yale University, New Haven, in 1977, and the Ph.D.
degree in Electrical Engineering with a minor in Statistics from University
of Washington, Seattle, in 1981. In 1981, Dr. Lee joined Verbex Corporation,
Bedford, MA, and was involved in research work on connected word recognition.
In 1984, he became affiliated with Digital Sound Corporation, Santa Barbara,
where he engaged in research and product development in speech coding,
speech synthesis, speech recognition and signal processing for the development
of the DSC-2000 Voice Server. Since 1986, he has been with Bell Laboratories,
Murray Hill, New Jersey, where he is now a Distinguished Member of Technical
Staff and Head of the newly established Dialogue Systems Research Department.
His current research include multimedia signal processing, speech and speaker
recognition, speech and language modeling, adaptive and discriminative
learning, spoken dialogue processing, biometric authentication and information
retrieval. His research scope is reflected in a recent book, entitled "Automatic
Speech and Speaker Recognition: Advanced Topics", published by Kluwer Academic
Publishers in 1996. Dr. Lee has participated actively in both domestic
and international professional societies. He is a member of the IEEE Signal
Processing Society, Communication Society, and the European Speech Communication
Association. He is also a lifetime member of the Computational Linguistic
Society in Taiwan. >From 1991 to 1995, he was an associate editor for the
IEEE Transactions on Signal Processing and Transactions on Speech and Audio
Processing. During the same period, he was also a member of the ARPA Spoken
Language Coordination Committee. Since 1995 he has been a member of the
Speech Technical Committee of the IEEE Signal Processing Society (SPS).
In 1996 he helped promote the newly established SPS Multimedia Signal Processing
(MMSP) Technical Committee and is a member of the MMSP-TC. Due to his continuous
contributions to speech and language processing in Taiwan, he was recently
appointed as a member of the Advisory Board of Institute of Information
Science at Academia Sinica, Taipei, Taiwan. Dr. Lee is a Fellow of the
IEEE, and serves as the Chairman of the SPS Speech Processing Technical
Committee. He has published over 170 papers and 16 patents on the subject
of automatic speech and speaker recognition. He received the SPS Senior
Award in 1994 and the SPS Best Paper Award in 1997. His inventions have
been widely used in Lucent's products and services deployed over the telecommunication
networks. Recently he was awarded the prestigious Bell Labs President's
Gold Award for his contributions to the Lucent Speech Processing Solutions
product.