CLSP Homepage : Workshop Homepage
Workshop 2006
Workshop 2006 Saturday, November 7, 2009

Open Source Toolkit for Statistical Machine Translation


The objective of this JHU Workshop is the development of novel methods for statistical machine translation that improve the state of the art, specifically factored translation models, and lattice-based decoding methods. As part of this workshop, we will implement these techniques and distribute them in an open source toolkit.

We propose to extend phrase-based statistical machine translation models using a factored representation. Current statistical MT approaches represent each word simply as their textual form. A factored translation approach replaces this representation with a feature vector for each word derived from a variety of information sources. These features may be the surface form, lemma, stem, part-of-speech tag, morphological information, syntactic, semantic or automatically derived categories, etc. This representation is then used to construct statistical translation models that can be combined together to maximize translation quality.

We also propose to extend current MT decoding methods to process multiple, ambiguous hypotheses in the form of an input lattice. A lattice representation allows an MT system to arbitrate between multiple ambiguous hypotheses from upstream processing so that the best translation can be produced. During the workshop we will implement lattice decoding and run experiments with errorful ASR input. We will compare different lattice-based strategies against single-hypothesis input results.


Find Details about the plans and progress of this project here and here.

 

Team Members:

Philipp Koehn Team Leader University of Edinburgh pkoehn at inf dot ed dot ac dot uk
Marcello Federico Senior Researcher ITC-IRST federico at itc dot it
Wade Shen Senior Researcher Lincoln Labs swade at ll dot mit dot edu
Nicola Bertoldi Senior Researcher ITC-IRST bertoldi at itc dot it
Chris Callison-Burch Graduate Student University of Edinburgh callison-burch at ed dot ac dot uk
Richard Zens Graduate Student Aachen University zens at i6 dot informatik dot rwth-aachen dot de
Hieu Hoang Graduate Student University of Edinburgh H.Hoang at sms dot ed dot ac dot uk
Brooke Cowan Graduate Student MIT brooke at csail dot mit dot edu
Ondrej Bojar Graduate Student Charles University bojar at ufal dot mff dot cuni dot cz
Chris Dyer Graduate Student University of Maryland redpony at umd dot edu
Alexandra ConstantinUndergraduate Student Williams College 07aec_2 at williams dot edu
Evan Herbst Undergraduate Student Cornell evh4 at cornell dot edu
Christine Corbett Moran Undergraduate Student MIT corbett at csail dot mit dot edu
 

The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu