On Representing Acoustics of Speech for Speech Processing – Bishnu Atal (University of Washington)
View Seminar Video
Proper representation of the acoustic speech signal is crucial for almost every speech processing application. We often use short-time Fourier transform to convert the time-domain speech waveform to a new signal that is a function of both time and frequency by applying a moving time window of about 20 ms in duration. There are many issues, such as the size and shape of the window, that remain unresolved. The use of a relatively short window is widespread. In early development of the sound spectrograph, both narrow and wideband analysis were used, but the narrow-band analysis faded away. In digital speech coding applications (multipulse and code-excited linear prediction), high-quality speech is produced at low bit rates only when prediction using both short and long intervals is used. Recently Hermansky and others have argued that speech window for automatic speech recognition should be long, perhaps extending to as much as 1 s. What are the issues that arise in using a short or a long window? What are the relative advantages or disadvantages? In this talk, we will discuss these topics and present results that suggest that a short-time Fourier transform using long windows has advantages. In most speech representations, the Fourier components are not used directly but converted to their magnitude spectrum; the so-called phase is considered to be irrelevant. There are open questions regarding the use of phase information and we will discuss this important issue in the talk.
Bishnu S. Atal is an Affiliate Professor in the Electrical Engineering Department at the University of Washington, Seattle, WA. He retired in March 2002 after working for more than 40 years at Lucent Bell Labs, and AT&T Labs. He was a Technical Director at the AT&T Shannon Laboratory, Florham Park, New Jersey, from 1997 where he was engaged in research in speech coding and in automatic speech recognition. He joined the technical staff of AT&T Bell Laboratories in 1961, became head of Acoustics Research Department in 1985, and head of Speech Research Department in 1990.He is internationally recognized for his many contributions to speech analysis, synthesis, and coding. His pioneering work in linear predictive coding of speech established linear prediction as one of the most important speech analysis technique leading to many applications in coding, recognition and synthesis of speech. His research work is documented in over 90 technical papers and he holds 17 U.S. and numerous international patents in speech processing.He was elected to the National Academy of Engineering in 1987 and to the National Academy of Sciences in 1993. He is a Fellow of the Acoustical Society of America and the IEEE. He received the IEEE Morris N. Liebmann Memorial Field Award in 1986, the Thomas Edison Patent Award from the R&D Council of New Jersey in 1994, New Jersey Inventors Hall of Fame Inventor of the Year Award in 2000 and the Benjamin Franklin Medal in Electrical Engineering in 2003.Bishnu and his wife, Kamla, reside in Mukilteo, Washington. They have two daughters, Alka and Namita, two granddaughters, Jyotica and Sonali and two grandsons, Ananth and Niguel.