Bibliography for Statistical Machine Translation

Overviews | Word-for-word translation models | Phrasal or syntactic translation models
Translation models with context | Translation models from comparable corpora | Decoding and search
Sentence Alignment | Transliteration | Corpus collection | Discriminative Training | Miscellaneous

[The original bibliography by Kevin Knight extended to 2000.]

Overviews

[Weaver, 1955]
Weaver, W., "Machine Translation of Languages," in Translation, W. Locke and A. Donald Booth, eds. New York: John Wiley & Sons.
[Brown, 1990]
Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, J. Lafferty, R. Mercer, and P. Roossin, "A Statistical Approach to Machine Translation," Computational Linguistics, 16(2). (http://www.aclweb.org/anthology/J90-2002)
[Berger et al, 1994]
Berger, A., P. Brown, S. Della Pietra, V. Della Pietra, J. Gillett, J. Lafferty, R. Mercer, H. Printz, L. Ures, "The Candide System for Machine Translation," Proceedings of the DARPA Workshop on Human Language Technology (HLT).
[Berger et al, 1996]
Berger, A., P. Brown, S. Della Pietra, V. Della Pietra, A. Kehler, R. Mercer, "Language Translation Apparatus and Method Using Context-Based Translation Models," U.S. Patent 5,510,981.
[Knight, 1997]
Knight, K., "Automating Knowledge Acquisition for Machine Translation," AI Magazine, 18(4).
[Knight, 1999]
Knight, K., "A Statistical MT Tutorial Workbook." (http://www.isi.edu/natural-language/mt/wkbk.rtf).
[Vogel et al., 2000]
Stephan Vogel, Franz Josef Och, Christoph Tillmann, Sonja Nießen, Hassan Sawaf, Hermann Ney. "Statistical Methods for Machine Translation". In: "Verbmobil: Foundations of Speech-to-Speech Translation", pp. 377-393, Wolfgang Wahlster (ed.). Springer Verlag, Berlin, July 2000. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/VMBUCH.ps)
[Och and Ney, 2000]
Franz Josef Och, Hermann Ney. "Statistical Machine Translation". EAMT Workshop, pp. 39-46, Ljubljana, Slovenia, May 2000. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/EAMT00.ps)

Word-for-word translation models

[Brown et al, 1993a]
Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer, "The Mathematics of Statistical Machine Translation: Parameter Estimation," Computational Linguistics, 19(2). (http://www.aclweb.org/anthology/J93-2003)
[Brown et al, 1993b]
Brown, P., S. Della Pietra, V. Della Pietra, M. Goldsmith, J. Hajic, R. Mercer, S. Mohanty, "But Dictionaries Are Data Too," Proceedings of the DARPA Workshop on Human Language Technology (HLT).
[Dagan et al, 1993]
Dagan, I., K. Church, and W. Gale, "Robust Bilingual Word Alignment for Machine Aided Translation," Proceedings of the Workshop on Very Large Corpora (WVLC). (http://www.aclweb.org/anthology/W93-0301)
[Brousseau, et al, 1995]
Brousseau J., C. Drouin, G. Foster, P. Isabelle, R. Kuhn, Y. Normandin, and P. Plamondon, "French Speech Recognition in an Automatic Dictation System for Translators: the TransTalk Project," Proceedings of Eurospeech 95.
[Berger, Della Pietra, and Della Pietra, 1996]
Berger, A., S. Della Pietra, and V. Della Pietra, "A Maximum Entropy Approach to Natural Language Processing," Computational Linguistics, 22(1). (http://www.aclweb.org/anthology/J96-1002)
[Wang, Lafferty, and Waibel, 1996]
Wang, Y., J. Lafferty, and A. Waibel, "Word Clustering with Parallel Spoken Language Corpora," Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP).
[Melamed, 1996]
Melamed, I., "Automatic Construction of Clean Broad-Coverage Translation Lexicons," Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA).
[Vogel, Ney, and Tillmann, 1996]
Vogel, S., H. Ney, and C. Tillman, "HMM-Based Word Alignment in Statistical Translation," Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/C96-2141)
[Melamed, 1997]
Melamed, I., "A Word-to-Word Model of Translational Equivalence," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P97-1063)
[Melamed, 1998]
Melamed, I., Empirical Methods for Exploiting Parallel Texts, Ph.D. Dissertation, University of Pennsylvania.
[McCarley and Roukos, 1998]
McCarley, S. and S. Roukos, "Fast Document Translation for Cross-Language Information Retrieval," Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA).
[Och and Weber, 1998]
Och, F.-J. and H. Weber, "Improving Statistical Natural Language Translation by Categories and Rules," Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/P98-2162)
[Turcato, 1998]
Turcato, D., "Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text," Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/P98-2212)
[Och, 1999]
Och, F.-J., "An Efficient Method for Determining Bilingual Word Classes," Conference of the European Chapter of the Association for Computational Linguistics (EACL). (http://www.aclweb.org/anthology/E99-1010)
[Al-Onaizan et al, 1999]
Al-Onaizan, Y., J. Curin, M. Jahr, K. Knight, J. Lafferty, D. Melamed, F.-J. Och, D. Purdy, N. Smith, and D. Yarowsky, "Statistical Machine Translation," tech report, the Center for Language and Speech Processing, John Hopkins University. (http://www.clsp.jhu.edu/ws99/final/Stat_Machine_Translation.pdf)
[Al-Onaizan et al, 2000]
Al-Onaizan, Y., U. Germann, U. Hermjakob, K. Knight, P. Koehn, D. Marcu, K. Yamada, "Translating with Scarce Resources," Proceedings of the National Conference on Artificial Intelligence (AAAI), 2000. (http://www.isi.edu/natural-language/mt/tetun.ps)
[Melamed, 2000]
Melamed, I. "Models of Translational Equivalence among Words," Computational Linguistics, 26(2). (http://www.aclweb.org/anthology/J00-2004)
[Och and Ney, 2000]
Och, F.-J. and H. Ney, "Improved Statistical Alignment Models," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P00-1056)
[Och and Ney, 2000]
Och, F.-J. and H. Ney, "A Comparison of Alignment Models for Statistical Machine Translation." Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/C00-2164)
[Sumita, 2000]
Sumita, E., "Lexical Transfer Using a Vector-Space Model," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P00-1054)
[Amengual, et al, 2000]
Amengual, J. C., J. M. Benedi, F. Casacuberta, A. Castano, A. Castellanos, V. M. Jimenez, D. Llorens, A. Marzal, M. Pastor, F. Prat, E. Vidal, J. M. Vilar, "The Eutrans-I Speech Translation System," Machine Translation (special issue), forthcoming.
[Och and Ney, 2000]
Franz Josef Och, Hermann Ney. "Improved Statistical Alignment Models". ACL00: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 440-447, Hongkong, China, October 2000. (http://www.aclweb.org/anthology/P00-1056)
[Garcia-Varea et al., 2001]
Ismael Garcia-Varea, Franz Josef Och, Hermann Ney, Francisco Casacuberta. "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach". In: "ACL 2001: Proc. of the 39th Annual Meeting of the Association for Computational Linguistics", pp. 204-211, Toulouse, France, July 2001. (http://www.aclweb.org/anthology/P01-1027)
[Zens, Och, and Ney, 2002]
Richard Zens, Franz Josef Och, Hermann Ney. "Phrase-Based Statistical Machine Translation". In Proc. German Conference on Artificial Intelligence (KI 2002), Springer Verlag, September 2002. (http://link.springer-ny.com/link/service/series/0558/bibs/2479/24790018.htm)
[Garcia-Varea et al., 2002]
Ismael Garcia-Varea and Franz Josef Och and Hermann Ney and Francisco Casacuberta. "Improving alignment quality in statistical machine translation using context-dependent maximum entropy models", In Proc. Int. Conf. on Computational Linguistics, Taipei, Taiwan. August 2002. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/COLING02.ps)
[Garcia-Varea et al., 2002b]
Ismael Garcia-Varea and Franz Josef Och and Hermann Ney and Francisco Casacuberta. "Efficient integration of maximum entropy lexicon models within the training of statistical alignment models", In AMTA 2002, Tiburon, CA. October 2002. (http://link.springer-ny.com/link/service/series/0558/bibs/2499/24990054.htm)
[Och and Ney, 2003]
Franz Josef Och, Hermann Ney. "A Systematic Comparison of Various Statistical Alignment Models", Computational Linguistics, 2003.

Phrasal or syntactic translation models

[Alshawi, Buschsbaum, and Xia, 1997]
Alshawi, H., A. Buchsbaum, and F. Xia, "A Comparison of Head Transducers and Transfer for a Limited Domain Translation Application", Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P97-1046)
[Melamed, 1997]
Melamed, I., "Automatic Discovery of Non-Compositional Compounds," Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). (http://www.aclweb.org/anthology/W97-0311)
[Wu, 1997]
Wu, D., "Statistical Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora," Computational Linguistics, 23(3). (http://www.aclweb.org/anthology/J97-3002)
[Wu and Wong, 1998]
Wu, D. and H. Wong, "Machine Translation with a Stochastic Grammatical Channel," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P98-2230)
[Alshawi, Bangalore, and Douglas, 1998]
Alshawi, H., S. Bangalore, and S. Douglas, "Automatic Acquisition of Hierarchical Transduction Models for Machine Translation," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P98-1006)
[Sato and Nakanishi, 1998]
Sato, K. and M. Nakanishi, "Maximum Entropy Model Learning of the Translation Rules," Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/P98-2191)
[Wang, and Waibel, 1998]
Wang, Y. and A. Waibel, "Modeling with Structures in Statistical Machine Translation," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P98-2221)
[Boutis and Piperidis, 1998]
Boutis, S., and S. Piperidis, "Aligning Clauses in Parallel Text," Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
[Alshawi, Bangalore, and Douglas, 2000]
Alshawi, H., S. Bangalore, and S. Douglas, "Learning Dependency Translation Models as Collections of Finite State Head Transducers," Computational Linguistics, 26(1). (http://www.aclweb.org/anthology/J00-1004)
[Och, Tillmann, and Ney, 1999]
Och, F.-J., C. Tillmann, and H. Ney. "Improved Alignment Models for Statistical Machine Translation." Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP). (http://www.aclweb.org/anthology/W99-0604)
[Yamada, 2002]
Yamada, K., A Syntax-based Translation Model, Ph.D. Thesis, University of Southern California. (http://www.isi.edu/natural-language/projects/rewrite/yamada-thesis.ps)
[Charniak, Knight, Yamada, 2003]
Charniak, E., K. Knight, and K. Yamada, "Syntax-based Language Models for Statistical Machine Translation." (http://www.isi.edu/~kyamada/jhu03/mtlang.pdf)
[Koehn, Och, and Marcu, 2003]
Philipp Koehn, Franz Josef Och, Daniel Marcu. "Statistical Phrase-Based Translation". In Proceedings of the Human Language Technology Conference 2003 (HLT-NAACL 2003), Edmonton, Canada, May 2003. (http://www.isi.edu/~koehn/publications/phrase2003.html)

Translation models with context

[Brown et al, 1991]
Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer, "Word-Sense Disambiguation Using Statistical Methods," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P91-1034)
[Hermjakob and Mooney, 1997]
Hermjakob, U. and R. Mooney, "Learning Parse and Translation Decisions from Examples with Rich Context," Proceedings of the Conference of the Association for Computational Linguistics (ACL/EACL). (http://www.aclweb.org/anthology/P97-1062)

Translation models from comparable corpora

[Rapp, 1995]
Rapp, R., "Identifying Word Translations in Non-Parallel Texts," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P95-1050)
[Fung and Yee, 1998]
Fung, P. and L. Y. Yee, "An IR Approach for Translating New Words from Nonparallel, Comparable Texts," COLING/ACL-98. (http://www.aclweb.org/anthology/P98-1069)
[Kikui, 1999]
Kikui, G., "Resolving Translation Ambiguity using Non-parallel Bilingual Corpora," Proceedings of the ACL Workshop on Unsupervised Learning in Natural Language Processing. (http://www.aclweb.org/anthology/W99-0905)
[Rapp, 1999]
Rapp, R., "Automatic Identification of Word Translations from Unrelated English and German Corpora," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P99-1067)
[Diab and Finch, 2000]
Diab, M. and S. Finch, "A Statistical Word-Level Translation Model for Comparable Corpora", Proceedings of the Conference on Content-Based Multimedia Information Access (RIAO).
[Koehn and Knight, 2000]
Koehn, P. and K. Knight, "Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm", Proceedings of the National Conference on Artificial Intelligence (AAAI), 2000. (http://www.isi.edu/~koehn/publications/aaai2000.ps)
[Pantel and Lin, 2000]
Pantel, P. and D. Lin, "Word-for-Word Glossing with Contextually Similar Words," Proceedings of the Conference of the North American Association for Computational Linguistics (NAACL).

Decoding and Search

[Berger et al, 1996]
Berger, A., P. Brown, S. Della Pietra, V. Della Pietra, A. Kehler, R. Mercer, "Language Translation Apparatus and Method Using Context-Based Translation Models," U.S. Patent 5,510,981.
[Wu, 1996]
Wu, D., "A Polynomial-Time Algorithm for Statistical Machine Translation," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P96-1021)
[Tillmann et al, 1997]
Tillmann, C., S. Vogel, H. Ney and A. Zubiaga, "A DP Based Search Using Monotone Alignments in Statistical Translation," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P97-1037)
[Wang, and Waibel, 1997]
Wang, Y. and A. Waibel, "Decoding Algorithm in Statistical Machine Translation," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P97-1047)
[Knight and Al-Onaizan, 1998]
Knight, K. and Y. Al-Onaizan, "Translation with Finite-State Devices," Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA).
[Niessen et al, 1998]
Niessen, S., S. Vogel, H. Ney, and C. Tillmann, "A DP Based Search Algorithm for Statistical Machine Translation," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P98-2158)
[Knight, 1999]
Knight, K., "Decoding Complexity in Word-Replacement Translation Models," Computational Linguistics, 25(4). (http://www.aclweb.org/anthology/J99-4005)
[Vogel and Ney, 2000]
Vogel, S. and H. Ney, "Translation with Cascaded Finite State Transducers," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P00-1004)
[Och, Ueffing, and Ney, 2001]
Franz Josef Och, Nicola Ueffing, Hermann Ney. "An Efficient A* Search Algorithm for Statistical Machine Translation". In: "Data-Driven Machine Translation Workshop", pp. 55-62, Toulouse, France, July 2001. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/DDMT01.ps)
Ueffing, Och, and Ney, 2002]
Nicola Ueffing, Franz Josef Och, Hermann Ney. "Generation of Word Graphs in Statistical Machine Translation". In "Proc. Conference on Empirical Methods for Natural Language Processing", pp. 156-163, Philadelphia, PA, July 2002. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/EMNLP02.ps)

Sentence Alignment

[Brown, Lai, and Mercer, 1991]
Brown, P., J. Lai, and R. Mercer, "Aligning Sentences in Parallel Corpora," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P91-1022)
[Melamed, 1997a]
Melamed, I., "A Portable Algorithm for Mapping Bitext Correspondence," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P97-1039)
[Ribiero, Lopes, and Mexia, 2000]
Ribiero, A., G. Lopes, and J. Mexia, "A Self-Learning Method of Parallel Texts Alignment," Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA).

Transliteration

[Knight and Graehl, 1997]
Knight, K. and J. Graehl, "Machine Transliteration," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P97-1017)
[Knight and Graehl, 1998]
Knight, K. and J. Graehl, "Machine Transliteration," Computational Linguistics, 24(4). (http://www.aclweb.org/anthology/J98-4003)
[Chen, Huang, Ding, and Tsai, 1998]
Chen, H.-H., S.-J. Huang, Y.-W. Ding, and S.-C. Tsai, "Proper Name Translation in Cross-Language Information Retrieval," Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/P98-1036)
[Wan and Verspoor, 1998]
Wan, S. and C. Verspoor, "Automatic English-Chinese Name Transliteration for Development of Multilingual Resources," Proceedings of the International Conference on Computational Linguistics (COLING). (http://www.aclweb.org/anthology/P98-2220)
[Stalls and Knight, 1998]
Stalls, B. and K. Knight, "Translating Names and Technical Terms in Arabic Text," Proceedings of the COLING-ACL Workshop on Computational Approaches to Semitic Languages. (http://www.aclweb.org/anthology/W98-1005)
[Knight and Yamada, 1999]
Knight, K. and K. Yamada, "A Computational Approach to Deciphering Unknown Scripts," Proceedings of the ACL Workshop on Unsupervised Learning in Natural Language Processing. (http://www.aclweb.org/anthology/W99-0906)

Corpus collection

[Resnik, 1999]
Resnik, P., "Mining the Web for Bilingual Text," Proceedings of the Conference of the Association for Computational Linguistics (ACL). (http://www.aclweb.org/anthology/P99-1068)
[Chen and Nie, 2000]
Chen, J. and J.-Y. Nie, "Automatic Construction of Parallel English-Chinese Corpus for Cross- Language Information Retrieval," Proceedings of the Conference on Applied Natural Language Processing (ANLP). (http://www.aclweb.org/anthology/A00-1004)

Discriminative Training

[Och and Ney, 2002]
Franz Josef Och, Hermann Ney. "Discriminative Training and Maximum Entropy Models for Statistical Machine Translation". In "ACL 2002: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics" (best paper award), pp. 295-302, Philadelphia, PA, July 2002. (http://www.aclweb.org/anthology/P02-1038)
[Och, 2003]
Franz Josef Och. "Minimum Classification Error Training for Statistical Machine Translation". In In "ACL 2003: Proc. of the 41st Annual Meeting of the Association for Computational Linguistics", Japan, Sapporo, July 2003.

Miscellaneous

[Macherey, Och, and Ney, 2001]
Klaus Macherey, Franz Josef Och, Hermann Ney. "Natural Language Understanding Using Statistical Machine Translation". In: "EUROSPEECH 2001 - 7th European Conference on Speech Communication and Technology", pp. 2205-2208, Aalborg, Denmark, September 2001. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/eurospeech2001.ps)
[Och and Ney, 2001]
Franz Josef Och, Hermann Ney. "Statistical Multi-Source Translation". In: "MT Summit 2001", pp. 253-258, Santiago de Compostela, Spain, September 2001. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/MST.ps)
[Och and Ney, 2001b]
Franz Josef Och, Hermann Ney. "What Can Machine Translation Learn from Speech Recognition?". In: "Workshop: MT 2010 - Towards a Road Map for MT", pp. 26-31, Santiago de Compostela, Spain, September 2001. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/WhatCanMTLearnFromASR.ps)
[Ney, Och, and Vogel, 2001]
Hermann Ney, Franz Josef Och, Stephan Vogel. "The RWTH System for Statistical Translation of Spoken Dialogues". In: "HLT: Human Language Technology", San Diego, CA, March 2001. (http://www-i6.informatik.rwth-aachen.de/Colleagues/och/HLT_SanDiego_FullPaper_04Mar01.ps)
[Och, Zens, and Ney, 2003]
F.J. Och, R. Zens, H. Ney. "Efficient Search for Interactive Statistical Machine Translation". In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Budapest, Hungary, pp. 387-393, April 2003.
(Note: this bibliography is incomplete and excludes most sentence- alignment work and most cross-language information retrieval work; it also excludes a lot of interesting work in information retrieval, summarization, question answering, etc., that has been inspired by statistical MT techniques. It's missing out on most example-based statistical MT work also)