Coarse-To-Fine Models for Natural Language Processing – Dan Klein (University of California, Berkeley)
View Seminar Video
State-of-the-art NLP models are anything but compact. Parsers have huge grammars, machine translation systems have huge transfer tables, and so on across a range of tasks. With such complexity comes two challenges. First, how can we learn highly complex models? Second, how can we efficiently infer optimal structures within them?Hierarchical coarse-to-fine (CTF) methods address both questions. CTF approaches exploit sequences of models which introduce complexity gradually. At the top of the sequence is a trivial model in which learning and inference are both cheap. Each subsequent model refines the previous one, until a final, full-complexity model is reached. Because each refinement introduces only limited complexity, both learning and inference can be done in an incremental fashion. In this talk, I describe several coarse-to-fine NLP systems.In the domain of syntactic parsing, complexity comes from the grammar. I present a latent-variable approach which begins with an X-bar grammar and learns by iteratively splitting grammar symbols. For example, noun phrases might be split into subjects and objects, singular and plural, and so on. This splitting process admits an efficient incremental inference scheme which reduces parsing times by orders of magnitude. I also present a multiscale variant which splits grammar rules rather than grammar symbols. In the multiscale approach, complexity need not be uniform across the entire grammar, providing orders of magnitude of space savings. These approaches produce the best parsing accuracies in a variety of languages, in a fully language-general fashion.In the domain of syntactic machine translation, complexity arises from both the translation model and the language model. In short, there are too many transfer rules and too many target language word types. To manage the translation model, we compute minimizations which drop rules that have high computational cost but low importance. To manage the language model, we translate into target language clusterings of increasing vocabulary size. These approaches give dramatic speed-ups, while actually increasing final translation quality.
Dan Klein is an assistant professor of computer science at the University of California, Berkeley (PhD Stanford, MSt Oxford, BA Cornell). His research focuses on statistical natural language processing, including unsupervised learning methods, syntactic parsing, information extraction, and machine translation. Academic honors include a Marshall Fellowship, a Microsoft New Faculty Fellowship, the ACM Grace Murray Hopper award, and best paper awards at the ACL, NAACL, and EMNLP conferences.