Models of Synchronous Grammar Induction for SMT

Research Group of the 2010 Summer Workshop

The last decade of research in Statistical Machine Translation (SMT) has seen rapid progress. The most successful methods have been based on synchronous context free grammars (SCFGs), which encode translational equivalences and license reordering between tokens in the source and target languages. Yet, while closely related language pairs can be translated with a high degree of precision now, the result for distant pairs is far from acceptable. In theory, however, the “right”SCFG is capable of handling most, if not all, structurally divergent language pairs. So we propose to focus on the crucial practical aspects of acquiring such SCFGs from bilingual text. We will take the pragmatic approach of starting with existing algorithms for inducing unlabelled SCFGs (e.g. the popular Hiero model), and then using state-of-the-art hierarchical non-parametric Bayesian methods to iteratively refine the syntactic constituents used in the translation rules of the grammar, hoping to approach, in an unsupervised manner, SCFGs learned from massive quantities of manually “tree-banked” parallel text.

Abstract
Final Presentation: First Session | Second Session
Final Presentation Video

Team Members
Senior Members
Phil Blunsom	University of Oxford
Trevor Cohn	University of Sheffield
Chris Dyer	University of Maryland
Jonathan Graehl	USC/ISI
Adam Lopez	University of Edinburgh
Graduate Students
Ziyuan Wang	CLSP
Jan Botha	University of Oxford
Vladimir Eidelman	University of Maryland
ThuyLinh Nguyen	Carnegie Mellon University
Undergraduate Students
Olivia Buzek	University of Maryland
Desai Chen	Carnegie Mellon University

Models of Synchronous Grammar Induction for SMT

Upcoming Seminars

Center for Language and Speech Processing