Statistical Machine Translation by Parsing

Machine translation (MT) is more important than ever. The quality of MT output has increased substantially in recent years, due to more sophisticated utilization of statistical learning methods and objective evaluation methods. However, statistical MT (SMT) systems often generate “word salad,” where the output may contain many correct words but in the wrong order, making it hard to understand. We propose to investigate a new approach to SMT that has models of word order at its core, in contrast to other syntax-based approaches. Models that integrate word order more directly promise to greatly improve the readability of translations. Our research will simultaneously focus on two language pairs — English/French and English/Arabic — thus demonstrating the generality of the approach. In addition to improved MT, goals of the workshop include training students to contribute to MT and NLP research for years to come, and a complete easy-to-use reference implementation for worldwide distribution.

Final Presentation
The GenPar Toolkit for generalized parsing
MTV, a Multitree Viewer
Final Report

 

Team Members
Senior Members
Stephen ClarkOxford University
Keith HallJohns Hopkins University
Mary HearneDublin City University
Dan MelamedNew York University
Andy WayDublin City University
Dekai WuHong Kong University of Science and Technology
Graduate Students
Marine CarpuatHong Kong University of Science and Technology
Markus DreyerJohns Hopkins University
Declan GrovesDublin City University
Yihai ShenHong Kong University of Science and Technology
Ben WellingtonNew York University
Undergraduate Students
Andrea BurbankStanford University
Pamela FoxUniversity of Southern California

Center for Language and Speech Processing