Graph Grammars and Automata for NLP – David Chiang (Notre Dame)
Abstract
All natural language processing systems implicitly or explicitly depend on some representation of linguistic objects and some framework for defining (probabilistic) transformations of those representations. Two such representations that have been extraordinarily successful are sequences (and finite-state machines over sequences) and trees (and finite-state machines over trees). For representing linguistic meaning, however, there is currently much interest in going beyond sequences or trees to graphs (for example, abstract meaning representations), and in this talk I will present some recent and current work on graphs, graph automata, and graph grammars.
Hyperedge replacement grammar (HRG) is a formalism for generating graphs whose kinship with context-free grammars immediately leads to some important results: a probabilistic version, a synchronous version, and efficient algorithms for searching or summing packed “forests.” However, until recently there was no practical algorithm for finding, given an input graph, an HRG derivation of the graph. An algorithm due to Lautemann was known to be polynomial-time for graphs that are connected and of bounded degree. I will describe an optimization of this algorithm and some important implementation details, resulting in an algorithm that is practical for natural-language applications. This is joint work with Kevin Knight and the ISI summer interns of 2013.
DAG acceptors are another formalism, much simpler than HRG, which were recently extended by Quernheim and Knight, but still lacked efficient algorithms for searching or summing over the derivations of an input graph, and lacked a well-defined probability model. I will present solutions to both of these problems, and discuss the problems that still remain. This is joint work with Frank Drewes, Dan Gildea, Adam Lopez, Giorgio Satta, and other participants in the JHU summer workshop in 2014.
Biography
David Chiang (PhD, University of Pennsylvania, 2004) is an associate professor in the Department of Computer Science and Engineering at the University of Notre Dame. His research is on computational models for learning human languages, particularly how to translate from one language to another. His work on applying formal grammars and machine learning to translation has been recognized with two best paper awards (at ACL 2005 and NAACL HLT 2009) and has transformed the field of machine translation. He has received research grants from DARPA, NSF, and Google, has served on the executive board of NAACL and the editorial board of Computational Linguistics and JAIR, and is currently on the editorial board of Transactions of the ACL.