Models of Synchronous Grammar Induction for SMT

The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. The most successful methods have been based on synchronous context free grammars (SCFGs), which encode translational equivalences and license reordering between tokens in the source and target languages. Yet, while closely related language pairs can be translated with a high degree of precision now, the result for distant pairs is far from acceptable. In theory, however, the “right”SCFG is capable of handling most, if not all, structurally divergent language pairs. So we propose to focus on the crucial practical aspects of acquiring such SCFGs from bilingual text. We will take the pragmatic approach of starting with existing algorithms for inducing unlabelled SCFGs (e.g. the popular Hiero model), and then using state-of-the-art hierarchical non-parametric Bayesian methods to iteratively refine the syntactic constituents used in the translation rules of the grammar, hoping to approach, in an unsupervised manner, SCFGs learned from massive quantities of manually “tree-banked” parallel text.

Abstract
Final Presentation: First Session | Second Session
Final Presentation Video

Team Members
Senior Members
Phil Blunsom University of Oxford
Trevor Cohn University of Sheffield
Chris Dyer University of Maryland
Jonathan Graehl USC/ISI
Adam Lopez University of Edinburgh
Graduate Students
Ziyuan Wang CLSP
Jan Botha University of Oxford
Vladimir Eidelman University of Maryland
ThuyLinh Nguyen Carnegie Mellon University
Undergraduate Students
Olivia Buzek University of Maryland
Desai Chen Carnegie Mellon University

Center for Language and Speech Processing