Amir Zeldes (Georgetown University) “Feature Rich Models for Discourse Signaling”

February 23, 2018 @ 12:00 pm – 1:15 pm
Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218


Discourse relations such as ‘contrast’, ‘cause’ or ‘evidence’ are often postulated to explain how humans understand the function of one sentence in relation to another. Some relations are signaled rather directly using words such as “because” or “on the other hand”, but often signals are highly ambiguous or remain implicit, and cannot be associated with specific words. This opens up questions regarding how exactly we recognize relations and what kinds of computational models we can build to account for them.

In this talk I will explore models capturing discourse signals in the framework of Rhetorical Structure Theory (Mann & Thompson 1988), using data from the RST Signaling Corpus (Taboada & Das 2013) and a richly annotated corpus called GUM (Zeldes 2017). Using manually annotated data indicating the presence of lexical and implicit signals, I will show that purely text based models using RNNs and word embeddings inevitably miss important aspects of discourse structure. I will argue that richly annotated data beyond the textual level, including syntactic and semantic information, is required to form a more complete picture of discourse relations in text.


Amir Zeldes is assistant professor of Computational Linguistics at Georgetown University, specializing in Corpus Linguistics. He studied Cognitive Science, Linguistics and Computational Linguistics in Jerusalem, Potsdam, and Berlin, receiving his PhD in Linguistics from Humboldt University in 2012. His interests center on the syntax-semantics interface, where meaning and knowledge about the world are mapped onto language-specific choices. His most recent work focuses on computational discourse models which reflect common ground and communicative intent across sentences. He is also involved in the development of tools for corpus search, annotation and visualization, and has worked on representations of textual data in Linguistics and the Digital Humanities.

Center for Language and Speech Processing