A Simple, Corpus-Based Method for Finding Base Noun Phrases – Claire Cardie (Cornell University)

March 31, 1998 all-day

Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this talk instead propose a very simple algorithm that is tailored to the relative simplicity of the task. In particular, the talk will present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a “treebank” corpus (a la Charniak[1996]); then the grammar is improved by selecting rules with high “benefit” scores (a la Brill[1993]). Using this simple algorithm with a nave heuristic for matching rules, we achieve suprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing