Learning Probabilistic and Lexicalized Grammars for Natural Language Processing – Rebecca Hwa (University of Maryland)

March 27, 2001 all-day

View Seminar Video
This talk addresses two questions: what are the properties of a good grammar representation for natural language processing applications, and how can such grammars be constructed automatically and efficiently? I shall begin by describing a formalism called the Probabilistic Lexicalized Tree Insertion Grammars (PLTIGs), which has several linguistically motivated properties that are helpful for processing natural languages. Next, I shall present a learning algorithm that automatically induces PLTIGs from human-annotated text corpora. I have conducted empirical studies showing that a trained PLTIG compares favorably with other formalisms on several kinds of tasks. Finally, I shall discuss ways of making grammar induction more efficient. In particular, I want to reduce the dependency of the induction process on human-annotated training data. I will show that by applying a learning technique called sample selection to grammar induction, we can significantly decrease the number of training examples needed, and thereby reducing the human effort spent on annotating training data.

Rebecca Hwa is currently a postdoctoral research fellow at University of Maryland, College Park. Her research interests include natural language processing, machine learning, and human computer interaction.

Center for Language and Speech Processing