Beyond Left-to-Right: Multiple Decomposition Structures for SMT – Kristina Toutanova (Microsoft Research)

When:
April 25, 2014 all-day
2014-04-25T00:00:00-04:00
2014-04-26T00:00:00-04:00
Where:
3400 N Charles St
Baltimore, MD 21218
USA

Abstract
Standard phrase-based translation models do not explicitly model context dependence between translation units. As a result, they rely on large phrase pairs and target language models to recover contextual effects in translation. In this work, we explore n-gram models over Minimal Translation Units (MTUs) to explicitly capture contextual dependencies across phrase boundaries. We examine the independence assumptions entailed by the direction of the n-gram decomposition order, and explore multiple static alternatives to the standard left-to-right decomposition. Additionally, we implement and test a dynamic bidirectional decomposition order, in which each translation unit can select its most predictive context. The resulting models are evaluated in an intrinsic task of lexical selection for MT as well as a full MT system, through n-best reranking. These experiments demonstrate that additional contextual modeling does indeed benefit a phrase-based system and that the direction of conditioning is important. Integrating multiple conditioning orders provides consistent benefit, and the most important directions differ by language pair.

Joint work with Hui Zhang, Chris Quirk, and Jianfeng Gao
Biography
Kristina Toutanova is a researcher at Microsoft Research, Redmond and an affiliate assistant professor at the University of Washington. She obtained her Ph.D. from the Computer Science Department at Stanford University with Christopher Manning. She has been active in research on modeling the structure of natural language using machine learning, especially in the areas of machine translation, syntactic and semantic parsing, and morphological analysis. She is a Program Co-chair for ACL 2014, a member of the Computational Linguistics editorial board as well as an action editor for TACL.

Center for Language and Speech Processing