Learning to Model Text Structure – Regina Barzilay (MIT)

April 25, 2006 all-day

View Seminar Video
Discourse models capture relations across different sentences in a document. These models are crucial in applications where it is important to generate coherent text. Traditionally, rule-based approaches have been predominant in discourse research. However, these models are hard to incorporate as-is in modern systems: they rely on handcrafted rules, valid only for limited domains, with no guarantee of scalability or portability. In this talk, I will present discourse models that can be effectively learned from a collection of unannotated texts. The key premise of our work is that the distribution of entities in coherent texts exhibits certain regularities. The models I will be presenting operate over an automatically-computed representation that reflects distributional, syntactic, and referential information about discourse entities. This representation allows us to induce the properties of coherent texts from a given corpus, without recourse to manual annotation or a predefined knowledge base. To conclude my talk, I will show how these models can be effectively integrated in statistical generation and summarization systems. This is joint work with Mirella Lapata and Lillian Lee.

Center for Language and Speech Processing