Machine Translation Applications of Stochastic Inversion Transduction Grammars

Dekai Wu, Hong Kong University of Science & Technology, Department of Computer Science

November 1, 1995


Abstract

We have introduced and are developing the notion of {\it bilingual language modeling}, an approach that shows promise for a number of aspects of statistical machine translation. A bilingual language model simultaneously generates matched strings of two languages following a parametric distribution. The formalism we are currently investigating, the stochastic inversion transduction grammar (SITG), is context-free but incorporates an inversion constraint that reduces computational complexity while maintaining sufficient word-order flexibility. We introduce {\it bilingual parsing} with an efficient parsing algorithm for SITGs, giving useful applications in sub-sentential alignment and bracketing of parallel texts, and automatic extraction of phrasal translations. An iterative EM training algorithm for SITGs has been developed for corpus-based estimation of the probabilities. We will also discuss current and future directions.