Statistical Machine Translation by Parsing

Research Group of the 2005 Summer Workshop

Machine translation (MT) is more important than ever. The quality of MT output has increased substantially in recent years, due to more sophisticated utilization of statistical learning methods and objective evaluation methods. However, statistical MT (SMT) systems often generate “word salad,” where the output may contain many correct words but in the wrong order, making it hard to understand. We propose to investigate a new approach to SMT that has models of word order at its core, in contrast to other syntax-based approaches. Models that integrate word order more directly promise to greatly improve the readability of translations. Our research will simultaneously focus on two language pairs — English/French and English/Arabic — thus demonstrating the generality of the approach. In addition to improved MT, goals of the workshop include training students to contribute to MT and NLP research for years to come, and a complete easy-to-use reference implementation for worldwide distribution.

Final Presentation
The GenPar Toolkit for generalized parsing
MTV, a Multitree Viewer
Final Report

Team Members
Senior Members
Stephen Clark	Oxford University
Keith Hall	Johns Hopkins University
Mary Hearne	Dublin City University
Dan Melamed	New York University
Andy Way	Dublin City University
Dekai Wu	Hong Kong University of Science and Technology
Graduate Students
Marine Carpuat	Hong Kong University of Science and Technology
Markus Dreyer	Johns Hopkins University
Declan Groves	Dublin City University
Yihai Shen	Hong Kong University of Science and Technology
Ben Wellington	New York University
Undergraduate Students
Andrea Burbank	Stanford University
Pamela Fox	University of Southern California

Statistical Machine Translation by Parsing

Upcoming Seminars

Center for Language and Speech Processing