Fall 2003: CLSP Seminar Series
Fall 2003: CLSP Seminar Series Tuesday, May 13, 2008
CLSP Homepage Search CLSP Current Events @ CLSP

Advances in Statistical Machine Translation: Phrases, Noun Phrases and Beyond

Philipp Koehn - October 14th, 2003

USC/ISI

Presentation Slides: PDF


I will review the state of the art in statistical machine translation (SMT), present my dissertation work, and sketch out the research challenges of syntactically structured statistical machine translation.

The currently best methods in SMT build on the translation of phrases (any sequences of words) instead of single words. Phrase translation pairs are automatically learned from parallel corpora. While SMT systems generate translation output that often conveys a lot of the meaning of the original text, it is frequently ungrammatical and incoherent.

The research challenge at this point is to introduce syntactic knowledge to the state of the art in order to improve translation quality. My approach breaks up the translation process along linguistic lines. I will present my thesis work on noun phrase translation and ideas about clause structure.

Biographical Information

Philipp Koehn is expected to receive his PhD in Computer Science from the University of Southern California in Fall 2003. He is a research assistant at the Information Sciences Institute. He worked as visiting researcher at AT&T Labs and Whizbang Labs. He published a number of papers on machine translation, lexical acquisition, machine learning and related subjects. He also gave tutorials on statistical machine translation at recent HLT/NAACL and MT Summit conferences.

Seminar Schedule


The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu