Predicting sentence specificity, with applications to news summarization
Ani Nenkova, University of Pennsylvania
August 3, 2011
A well-written text contains a mix of general statements and sentences that provide specific details. Yet no current work in computational linguistics has addressed the task of predicting the level of specificity of a sentence. In this talk I will present the development and evaluation of an automatic classifier capable of identifying general and specific sentences in news articles. We show that it is feasible to use existing annotations of discourse relations as training data and we validate the resulting classifier on sentences directly judged by multiple annotators. We also provide a task-based evaluation of our classifier on general and specific summaries written by people and demonstrate that the classifier predictions are able to distinguish between the two types of human authored summaries. We also analyze the level of specific and general content in news documents and their human and automatic summaries. We discover that while human abstracts contain a more balanced mix of general and specific content, automatic summaries are overwhelmingly specific. We find that too much specificity adversely affects the quality of the summary. The study of sentence specificity extends our prior work on text quality which I will briefly overview. This is joint work with my student Annie Louis.
Ani Nenkova is an assistant professor of computer and information science at the University of Pennsylvania. Her main areas of research are automatic summarization, discourse and text quality. She obtained her PhD degree in computer science from Columbia University in 2006. She also spent a year and a half as a postdoctoral fellow at Stanford University before joining Penn in Fall 2007.