PUTTING LANGUAGE (and speech) INTO LANGUAGE MODELING

                       Frederick Jelinek
			CLSP/JHU
			October 10, 1995

                                  Abstract

So far, speech recognition language models were constructed to minimize
the per-word entropy H(W), or, what is practically the same, to maximize
the probability of the training text. However, it follows directly from
Information Theory that to minimize the recognition error, a language
model ought to be one that minimizes H(W|A), the expected uncertainty of
the spoken text W, given the observed acoustic sequence A.

In constructing such an acoustic - sensitive language model (ASLM) one
could fixed the recognizer's acoustic model P(A|W). In this way the ASLM
could compensate for any weaknesses in the former.

Language models are based on probabilities P(w|$(h)) where h denotes the
history (i.e., the hypothesized past word string) and $(h) is the
equivalence class to which h belongs. Language modeling consists of the
determination of the history equivalence classification $ followed by the
estimation of P(w|$(h)) from training data. It seems intuitively obvious
that $ should be chosen to help the recognizer discriminate between
similarly sounding words. The resulting ASLM can be relatively indifferent
to words that are easily distinguished by acoustics.

The talk will outline an approach to acoustic sensitive language modeling.
It will show how to estimate the criterion H(W|A) from transcribed speech,
and how the classification $ may depend on linguistic analysis of the
history. It will be seen which very serious hurdles stand in the way of
success.

Seminar Schedule