Improved Statistical Machine Translation Using Paraphrases – Chris Callison-Burch (University of Edinburgh)

December 5, 2006 all-day

In this talk I show how automatically generated paraphrases can be used to improve the quality of statistical machine translation. Specifically, I show how paraphrases can be used to alleviate problems associated with out-of-vocabulary words and phrases. Statistical translation systems currently perform poorly when they encounter a word that was unseen in the training corpus. Since they have not learned a translation of it, they either reproduce the foreign word untranslated, or delete it. I propose replacing the unknown source phrase with a paraphrase which the model has learned the translation of, and then translating the paraphrase. I show experimental results which indicate that coverage can be increased dramatically, with most of the newly covered items translating accurately. Related publications: Chris Callison-Burch, Philipp Koehn and Miles Osborne. “Improved Statistical Machine Translation Using Paraphrases.” In Proceedings NAACL-2006.

Chris Callison-Burch is a PhD student at the University of Edinburgh. He is currently finishing his thesis entitled “Paraphrasing and Translation.” This summer he participated in the CLSP summer workshop on Factored Translation Models. In 2002 he co- founded a machine translation startup company called Linear B (http://

Center for Language and Speech Processing