Language-Universal Speech Modeling: What, Why, When and How – Chin-Hui Lee (Georgia Institute of Technology)- Chin-Hui Lee (Georgia Institute of Technology)

November 9, 2010 all-day

Acoustic segmental modeling (ASM) is an extension of frame-based vector quantization (VQ) to establish a common set of segment-based fundamental speech units to characterize the acoustic universe. ASM has been applied to speech recognition by building an acoustic lexicon for all words in a vocabulary. ASM has also been utilized in spoken language recognition with latent semantic analysis features and vector space modeling to produce high recognition accuracies. With the recently proposed automatic speech attribute transcription (ASAT) paradigm another set of language-universal speech units using speech attributes emerges. In contrast to conventional HMM-based framework which is top-down in nature, one major goal of the ASAT paradigm is to develop a bottom-up approach to automatic speech recognition via attribute detection and knowledge integration. These two key technologies can also be applies to other applications. In this talk we report on recent studies with language-universal speech characterization in two related tasks, namely: (i) language-universal and cross-language attribute and phone recognition; and (ii) automatic spoken language recognition. We show that language-universal speech attribute models can perform better than language-specific attribute models for attribute detection and phone recognition. We also demonstrate that, by extending ASM-based algorithms to language recognition with two simple sets of manner and place of articulation models, recognition accuracies can outperform the state-of-the-art spoken language recognition systems. We anticipate that the universal speech attribute modeling tools to provide opportunities to explore future research in multilingual acoustic characterization and speech recognition.
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Dr. Lee received the B.S. degree in Electrical Engineering from National Taiwan University, Taipei, in 1973, the M.S. degree in Engineering and Applied Science from Yale University, New Haven, in 1977, and the Ph.D. degree in Electrical Engineering with a minor in Statistics from University of Washington, Seattle, in 1981.Dr. Lee started his professional career at Verbex Corporation, Bedford, MA, and was involved in research on connected word recognition. In 1984, he became affiliated with Digital Sound Corporation, Santa Barbara, where he engaged in research and product development in speech coding, speech synthesis, speech recognition and signal processing for the development of the DSC-2000 Voice Server. Between 1986 and 2001, he was with Bell Laboratories, Murray Hill, New Jersey, where he became a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. His research interests include multimedia communication, multimedia signal and information processing, speech and speaker recognition, speech and language modeling, spoken dialogue processing, adaptive and discriminative learning, biometric authentication, and information retrieval. From August 2001 to August 2002 he was a visiting professor at School of Computing, The National University of Singapore. In September 2002, he joined the Faculty Georgia Institute of Technology.Prof. Lee has participated actively in professional societies. He is a member of the IEEE Signal Processing Society (SPS), Communication Society, and the International Speech Communication Association (ISCA). In 1991-1995, he was an associate editor for the IEEE Transactions on Signal Processing and Transactions on Speech and Audio Processing. During the same period, he served as a member of the ARPA Spoken Language Coordination Committee. In 1995-1998 he was a member of the Speech Processing Technical Committee and later became the chairman from 1997 to 1998. In 1996, he helped promote the SPS Multimedia Signal Processing Technical Committee in which he is a founding member.Dr. Lee is a Fellow of the IEEE, and has published more than 300 papers and 25 patents. He received the SPS Senior Award in 1994 and the SPS Best Paper Award in 1997 and 1999, respectively. In 1997, he was awarded the prestigious Bell Labs President’s Gold Award for his contributions to the Lucent Speech Processing Solutions product. Dr. Lee often gives seminal lectures to a wide international audience. In 2000, he was named one of the six Distinguished Lecturers by the IEEE Signal Processing Society. He was also named one of the two ISCA’s inaugural Distinguished Lecturers in 2007-2008. Recently he won the SPS’s 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition”.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing