Phrase Structure Language Models

Research Group of the 1995 Summer Workshop

The goal is to develop language models for improving the accuracy in recognizing conversational speech. We want to explore the use of phrase structure (possibly including syntactic lexical information such as morphology, part-of-speech tags, etc.) to improve on the infamous trigram language model. Specifically, we would like to explore parsing-based models for the prediction of the next word.

We expect to use the various available treebanks (Wall Street Journal, Brown Corpus) for written text but we need a treebank for conversational speech. Specifically, we want one million words of Switchboard marked for disfluency and surface structure similar to the WSJ Treebank.

Team Members
Senior Members
David Harris	DOD
Steve Lowe	Dragon
Srinivasa Rao	IBM
Eric Ristad	Princeton
Salim Roukos	IBM
Graduate Students
Xiaoqiang Luo	CLSP

Phrase Structure Language Models

Upcoming Seminars

Center for Language and Speech Processing