Learning to Model Text Structure – Regina Barzilay (MIT)
View Seminar Video
Discourse models capture relations across different sentences in a document. These models are crucial in applications where it is important to generate coherent text. Traditionally, rule-based approaches have been predominant in discourse research. However, these models are hard to incorporate as-is in modern systems: they rely on handcrafted rules, valid only for limited domains, with no guarantee of scalability or portability. In this talk, I will present discourse models that can be effectively learned from a collection of unannotated texts. The key premise of our work is that the distribution of entities in coherent texts exhibits certain regularities. The models I will be presenting operate over an automatically-computed representation that reflects distributional, syntactic, and referential information about discourse entities. This representation allows us to induce the properties of coherent texts from a given corpus, without recourse to manual annotation or a predefined knowledge base. To conclude my talk, I will show how these models can be effectively integrated in statistical generation and summarization systems. This is joint work with Mirella Lapata and Lillian Lee.