Dynamic Programming based Search and Segmentation Algorithms for Statistical Machine Translation – Christoph Tillman (IBM T.J. Watson Research Center)

October 1, 2002 all-day

This talk is about the use of dynamic programming (DP) techniques for statistical machine translation (SMT). I will present a search procedure for SMT based on dynamic programming. The starting point is a DP solution to the traveling salesman problem. For SMT, the cities correspond to source sentence positions to be translated. Imposing restrictions on the order in which the source positions are translated yields a DP algorithm to carry out the word re-ordering efficiently. A simple data-driven search organization allows to prune unlikely translation hypotheses. Furthermore, I will sketch a DP-based segmentation procedure for SMT. The units of segmentation are blocks – pairs of source and target clumps. Here, the segmentation problem is related to the set cover problem and an efficient DP segmentation algorithm exists if the blocks are restricted by an underlying word-to-word alignment.

Christoph Tillmann is a Research Staff Member at the IBM T.J. Watson Research Center. He received his Dipl. degree in computer science in 1996 and his Dr. degree in computer science in 2001, both from Aachen University of Technology (RWTH), Germany. Currently, he is working on statistical machine translation. His research interests include probabilistic language modeling and probabilistic parsing.

Center for Language and Speech Processing