Phrase Structure Language Models

The goal is to develop language models for improving the accuracy in recognizing conversational speech. We want to explore the use of phrase structure (possibly including syntactic lexical information such as morphology, part-of-speech tags, etc.) to improve on the infamous trigram language model. Specifically, we would like to explore parsing-based models for the prediction of the next word.

We expect to use the various available treebanks (Wall Street Journal, Brown Corpus) for written text but we need a treebank for conversational speech. Specifically, we want one million words of Switchboard marked for disfluency and surface structure similar to the WSJ Treebank.

 

Team Members 
Senior Members
David HarrisDOD
Steve LoweDragon
Srinivasa RaoIBM
Eric RistadPrinceton
Salim RoukosIBM
Graduate Students
Xiaoqiang LuoCLSP

Center for Language and Speech Processing