Putting Language (and speech) into Language Modeling – Frederick Jelinek (CLSP/Johns Hopkins University)
So far, speech recognition language models were constructed to minimize the per-word entropy H(W), or, what is practically the same, to maximize the probability of the training text. However, it follows directly from Information Theory that to minimize the recognition error, a language model ought to be one that minimizes H(W|A), the expected uncertainty of the spoken text W, given the observed acoustic sequence A.In constructing such an acoustic – sensitive language model (ASLM) one could fixed the recognizer’s acoustic model P(A|W). In this way the ASLM could compensate for any weaknesses in the former.Language models are based on probabilities P(w|$(h)) where h denotes the history (i.e., the hypothesized past word string) and $(h) is the equivalence class to which h belongs. Language modeling consists of the determination of the history equivalence classification $ followed by the estimation of P(w|$(h)) from training data. It seems intuitively obvious that $ should be chosen to help the recognizer discriminate between similarly sounding words. The resulting ASLM can be relatively indifferent to words that are easily distinguished by acoustics.The talk will outline an approach to acoustic sensitive language modeling. It will show how to estimate the criterion H(W|A) from transcribed speech, and how the classification $ may depend on linguistic analysis of the history. It will be seen which very serious hurdles stand in the way of success.