The Discourse Parsing and Summarization of Free Texts – Daniel Marcu (Information Sciences Institute and Department of Computer Science University of Southern California)
Abstract
Researchers of natural language have repeatedly acknowledged that coherent texts are not just simple sequences of sentences. Rather, they are complex artifacts whose semantic units are connected by rhetorical, logical, argumentative, and cohesive relations. I present research in theoretical, empirical, and applied computational linguistics that aims at uncovering the constraints that characterize the abstract structure of well-formed texts, and at producing algorithms for the automatic derivation of these structures. I show how automatically constructed discourse structures are exploited in a text summarization system and discuss other text processing open problems that can be properly addressed in a discourse-based framework