Grammatical Trigrams – John Lafferty (Carnegie Mellon University)
Abstract
It is widely believed among speech and language researchers that the incorporation of linguistic information should improve statistical models of natural language and benefit applications such as speech recognition. This belief has yet to be realized. In this talk I will discuss some previous attempts for building more effective language models and present some new ideas that this past work suggests. In particular, I will give an overview of current work at CMU to develop language modeling techniques that combine grammatical information with n-gram statistics. This work uses link grammar to extract structural information and exponential models to estimate probabilities. After introducing the relevant concepts, I will discuss areas of recent work that make this approach practical, including techniques that help decrease the computational burden of parameter estimation and robust parsing algorithms that enable the approach to be applied to disfluent and ungrammatical speech.