Machine Translation Applications of Stochastic Inversion Transduction Grammars – Dekai Wu (Hong Kong University of Science & Technology, Department of Computer Science)

November 1, 1995 all-day

We have introduced and are developing the notion of bilingual language modeling, an approach that shows promise for a number of aspects of statistical machine translation. A bilingual language model simultaneously generates matched strings of two languages following a parametric distribution. The formalism we are currently investigating, the stochastic inversion transduction grammar (SITG), is context-free but incorporates an inversion constraint that reduces computational complexity while maintaining sufficient word-order flexibility. We introduce bilingual parsing with an efficient parsing algorithm for SITGs, giving useful applications in sub-sentential alignment and bracketing of parallel texts, and automatic extraction of phrasal translations. An iterative EM training algorithm for SITGs has been developed for corpus-based estimation of the probabilities. We will also discuss current and future directions.

Center for Language and Speech Processing