Abstract:
The explosive growth of speech recognition technology in the
marketplace has resulted from two significant advances over the past
forty years: ubiquitous high-speed computing and the evolution of
statistical methods in science and engineering. In the 1950's and
60's, approaches to speech recognition were dominated by attempts to
blend linguistic knowledge about sound production with analog
electronics. These simple systems, based on analog filter banks made
for impressive demonstrations, but were not scalable to large
problems. In the 1970's, with the advent of modern computing, many of
these techniques were transformed to their digital equivalents, and
consequently met with limited success. In the late 1970's, statistical
methods slowly began to emerge. By the mid 1990's, such techniques had
become the dominant approach to speech recognition.
Statistical methods are popular because of their simplicity - we model
variation in the data using well-known statistical models such as
Gaussian distributions, and machine learning techniques. The goal of
the speech recognizer is to estimate the message sent by the user by
maximizing the probability of a correct choice. In this talk, we will
provide an overview of modern statistical approaches to speech
recognition and show how various aspects of the problem, such as
signal processing and language modeling, can be combined using a
single probabilistic framework.