Query-focused Summarization Using Text-to-Text Generation: When Information Comes from Multilingual Sources
Kathy McKeown, Columbia University
March 8, 2011
The past five years have seen the emergence of robust, scalable natural language processing systems that can summarize and answer questions about online material. One key to the success of such systems is that they re-use text that appeared in the documents rather than generating new sentences from scratch. Re-using text is absolutely essential for the development of robust systems; full semantic interpretation of unrestricted text is beyond the state of the art. Better summaries and answers can be produced, however, if systems can generate new sentences from the input text, fusing relevant phrases and discarding irrelevant ones. When the underlying sources for summarization come from multiple languages, the need for text-to-text generation is even more pronounced.
In this talk I first present the concept of text-to-text generation, showing the different kinds of editing that an be done. I then show how it has been used in our research on summarization and open-ended question-answering. Because our sources include informal genres as well as formal genres and draw from English, Arabic and Chinese, editing is critical for improving the intelligibility of responses. In our systems, we exploit information available at question answering time to edit sentences, removing redundant and irrelevant information and correcting errors in translated sentences. We also present new work on machine translation which uses information from multiple systems to post-edit the translations, again using text-to-text generation but within a TAG formalism.
Kathleen R. McKeown is the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University. She served as Department Chair from 1998-2003. Her research interests include text summarization, natural language generation, multi-media explanation, digital libraries, concept to speech generation and natural language interfaces. McKeown received the Ph.D. in Computer Science from the University of Pennsylvania in 1982 and has been at Columbia since then. In 1985 she received a National Science Foundation Presidential Young Investigator Award, in 1991 she received a National Science Foundation Faculty Award for Women, in 1994 was selected as a AAAI Fellow, and in 2003 was elected as an ACM Fellow. McKeown is also quite active nationally. She serves as a board member of the Computing Research Association and serves as secretary of the board. She served as President of the Association of Computational Linguistics in 1992, Vice President in 1991, and Secretary Treasurer for 1995-1997. She has served on the Executive Council of the Association for Artificial Intelligence and was co-program chair of their annual conference in 1991.