[Horizontal line]

Project: Phrase Structure Language Models

Project: Phrase Structure Language Models


TEAM MEMBERS

Salim Roukos (IBM) (Project Leader)
David Harris (DOD)
Steve Lowe (Dragon)
Srinivasa Rao (IBM)
Eric Ristad (Princeton)
Xiaoqiang Luo (student assistant)

[Horizontal line]


TEAM GOALS


The goal is to develop language models for improving the accuracy in recognizing conversational speech. We want to explore the use of phrase structure (possibly including syntactic lexical information such as morphology, part-of-speech tags, etc.) to improve on the infamous trigram language model. Specifically, we would like to explore parsing-based models for the prediction of the next word.

We expect to use the various available treebanks (Wall Street Journal, Brown Corpus) for written text but we need a treebank for conversational speech. Specifically, we want one million words of Switchboard marked for disfluency and surface structure similar to the WSJ Treebank. [Horizontal line]

Part Of Speeach LM
[Horizontal line]