Machine translation (MT) is more important than ever. The quality of MT output has increased substantially in recent years, due to more sophisticated utilization of statistical learning methods and objective evaluation methods. However, statistical MT (SMT) systems often generate “word salad,” where the output may contain many correct words but in the wrong order, making it hard to understand. We propose to investigate a new approach to SMT that has models of word order at its core, in contrast to other syntax-based approaches. Models that integrate word order more directly promise to greatly improve the readability of translations. Our research will simultaneously focus on two language pairs — English/French and English/Arabic — thus demonstrating the generality of the approach. In addition to improved MT, goals of the workshop include training students to contribute to MT and NLP research for years to come, and a complete easy-to-use reference implementation for worldwide distribution.
Final Presentation
The GenPar Toolkit for generalized parsing
MTV, a Multitree Viewer
Final Report
Team Members | |
---|---|
Senior Members | |
Stephen Clark | Oxford University |
Keith Hall | Johns Hopkins University |
Mary Hearne | Dublin City University |
Dan Melamed | New York University |
Andy Way | Dublin City University |
Dekai Wu | Hong Kong University of Science and Technology |
Graduate Students | |
Marine Carpuat | Hong Kong University of Science and Technology |
Markus Dreyer | Johns Hopkins University |
Declan Groves | Dublin City University |
Yihai Shen | Hong Kong University of Science and Technology |
Ben Wellington | New York University |
Undergraduate Students | |
Andrea Burbank | Stanford University |
Pamela Fox | University of Southern California |