The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. The most successful methods have been based on synchronous context free grammars (SCFGs), which encode translational equivalences and license reordering between tokens in the source and target languages. Yet, while closely related language pairs can be translated with a high degree of precision now, the result for distant pairs is far from acceptable. In theory, however, the “right”SCFG is capable of handling most, if not all, structurally divergent language pairs. So we propose to focus on the crucial practical aspects of acquiring such SCFGs from bilingual text. We will take the pragmatic approach of starting with existing algorithms for inducing unlabelled SCFGs (e.g. the popular Hiero model), and then using state-of-the-art hierarchical non-parametric Bayesian methods to iteratively refine the syntactic constituents used in the translation rules of the grammar, hoping to approach, in an unsupervised manner, SCFGs learned from massive quantities of manually “tree-banked” parallel text.
Abstract
Final Presentation: First Session | Second Session
Final Presentation Video
Team Members | |
---|---|
Senior Members | |
Phil Blunsom | University of Oxford |
Trevor Cohn | University of Sheffield |
Chris Dyer | University of Maryland |
Jonathan Graehl | USC/ISI |
Adam Lopez | University of Edinburgh |
Graduate Students | |
Ziyuan Wang | CLSP |
Jan Botha | University of Oxford |
Vladimir Eidelman | University of Maryland |
ThuyLinh Nguyen | Carnegie Mellon University |
Undergraduate Students | |
Olivia Buzek | University of Maryland |
Desai Chen | Carnegie Mellon University |