Models of Synchronous Grammar Induction for SMT

The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. The most successful methods have been based on synchronous context free grammars (SCFGs), which encode translational equivalences and license reordering between tokens in the source and target languages. Yet, while closely related language pairs can be translated with a high degree of precision now, the result for distant pairs is far from acceptable. In theory, however, the “right”SCFG is capable of handling most, if not all, structurally divergent language pairs. So we propose to focus on the crucial practical aspects of acquiring such SCFGs from bilingual text. We will take the pragmatic approach of starting with existing algorithms for inducing unlabelled SCFGs (e.g. the popular Hiero model), and then using state-of-the-art hierarchical non-parametric Bayesian methods to iteratively refine the syntactic constituents used in the translation rules of the grammar, hoping to approach, in an unsupervised manner, SCFGs learned from massive quantities of manually “tree-banked” parallel text.

Abstract
Final Presentation: First Session | Second Session
Final Presentation Video

 

Team Members
Senior Members
Phil BlunsomUniversity of Oxford
Trevor CohnUniversity of Sheffield
Chris DyerUniversity of Maryland
Jonathan GraehlUSC/ISI
Adam LopezUniversity of Edinburgh
Graduate Students
Ziyuan WangCLSP
Jan BothaUniversity of Oxford
Vladimir EidelmanUniversity of Maryland
ThuyLinh NguyenCarnegie Mellon University
Undergraduate Students
Olivia BuzekUniversity of Maryland
Desai ChenCarnegie Mellon University

Center for Language and Speech Processing