Multilingual Dependency Parsing with Spanning Tree Algorithms – Ryan McDonald (University of Pennsylvania)

February 21, 2006 all-day

A dependency representation of a sentence identifies, for each word, all its modifying arguments. The ability to parse dependencies is an important problem due to the fact that they have been of use in machine translation, information extraction and many other common NLP problems. In this talk I will present some recent work on dependency parsing. The models I describe are based on two key components. The first is an aggressive edge based factorization that allows for the use of maximum spanning tree algorithms during inference. This is advantageous since it enables parsing of projective as well as non-projective languages e.g. free-word order languages like Dutch, German and Czech. The second component is a large-margin discriminative training model that can use rich sets of dependent features to overcome these rather weak factorization assumptions. I plan to present a number of experiments, including parsing results for 14 languages using a single model as well as experiments describing the use of the parser for sentence compression.

Center for Language and Speech Processing