Modelling human syntax by means of probabilistic dependency grammars
Matthias Buch-Kromann, Center for Computational Modelling of Language, Department of Computational Linguistics, Copenhagen Business School
October 23, 2007
Probabilistic dependency grammars have played an important role in computational linguistics since they were introduced by Collins (1996) and Eisner (1996). In most computational formulations of dependency grammar, a dependency grammar can be viewed as a projective context-free grammar in which all phrases have a lexical head. However, there are many linguistic phenomena that a context-free dependency grammar cannot properly account for, such as non-projective word order (in topicalizations, scramblings, and extrapositions), secondary dependencies (in complex VPs, control constructions, relative clauses, elliptic coordinations and parasitic gaps), and punctuation (which is highly context-sensitive). In the talk, I will present a generative dependency model that can account for these phenomena and others. Although exact probabilistic parsing is NP-hard in this model, heuristic parsing need not be, and I will briefly describe a family of error-driven incremental parsing algorithms with repair that have time complexity O(n log^k(n)) given realistic assumptions about island constraints. In this parsing framework, the dependency model must assign probabilities to partial dependency analyses. I will show one way of doing this and outline how it introduces the need for adding time-dependence into the model in order to support the left-right incremental processing of the text.
Matthias Buch-Kromann is head of the Computational Linguistics Group at the Copenhagen Business School (CBS). He is also a member of the Center for Computational Modelling of Language and the Center for Research in Translation and Translation Technology at CBS. His current research interests include dependency treebanks, probabilistic dependency models of texts and translations, and computational models of human parsing and translation. His dr.ling.merc. dissertation (habilitation) from 2006 proposes a dependency-based model of human parsing and language learning. He has been the driving force behind the 100,000 word Danish Dependency Treebank (used in the CoNLL 2006 shared task) and the Copenhagen Danish-English Parallel Dependency Treebank.