CLSP Web SiteWS 98 Site Map
CLSP
logo
WS 98 Research Projects
An NSF Workshop: Language Engineering for Students
and Professionals Integrating Research and Education 
 
NEW: software tools developed during the workshop now available for download!
EGYPT a statistical machine translation toolkit







 
One Statistical Machine Translation
Team Goals
Project Description (old) Project Goals (7/12/99) Subtle Ad
Team Members
Kevin Knight, USC/ISI
Team Leader
knight@isi.edu and knight@clsp.jhu.edu
Yaser Al-Onaizan, USC/ISI yaser@isi.edu and yaser@clsp.jhu.edu
David Purdy, DoD dpurdy_smt@rofti.org and dpurdy@clsp.jhu.edu
Jan Curin, Charles Univ, CR curin@ufal.ms.mff.cuni.cz and curin@clsp.jhu.edu
Michael Jahr, Stanford mjahr@stanford.edu and jahr@clsp.jhu.edu
John Lafferty, CMU lafferty@cmu.edu and lafferty@clsp.jhu.edu
Dan Melamed, West Group
Noah Smith, UMD nasmith@cs.umd.edu and nasmith@clsp.jhu.edu
Franz Josef Och, RWTH Aachen och@i6.informatik.rwth-aachen.de and och@clsp.jhu.edu
David Yarowsky, CLSP/JHU yarowsky@blaze.cs.jhu.edu
 
Technical Papers & Resources
NEW!! Downloadable software tools developed during the workshop
FINAL REPORT, 12/11/99
First Planning Meeting (April, 1999)
Second Planning Meeting (May, 1999)
First Project Report (July 24, 1999)
Second Project Report (August 11, 1999)
"A Statistical MT Tutorial Workbook". Click here for Word version.
Initial Translation Model To-Do List
User-Level Documentation: Translation Model Training
Code-Level Documentation: Translation Model Training
"The Mathematics of Statistical Machine Translation" (P. Brown, S. Della Pietra, V. Della Pietra, R. Mercer), appeared in Computational Linguistics 19(2), 1993.
User-Level Documentation: Language Model Training
Code-Level Documentation: Language Model Training
User-Level Documentation: Decoding
Code-Level Documentation: Decoding
Documented Script: Analyzing Rough Features of Corpus
Whittle: Corpus Preparation Tool
Other Useful Corpus-Preparation Scripts (prep and split)
Facts about the Czech/English Corpus
Facts about the 50K French/English Corpus, very rare words replaced by UNK
Facts about the 100K French/English Corpus, very rare words replaced by UNK
Facts about the 200K French/English Corpus, very rare words replaced by UNK
Facts about the 300K French/English Corpus, very rare words replaced by UNK
Facts about the 400K French/English Corpus, very rare words replaced by UNK
Facts about the 500K French/English Corpus, very rare words replaced by UNK
Cairo: Alignment Inspection Tool (see sample screen dump)
How to Put Alignments in Cairo-Viewable Format
Czech Environment Settings
Evaluation of Czech/English Translations
 

The Center for Language and Speech Processing

Johns Hopkins University

3400 N. Charles Street, Barton Hall, Baltimore, MD 21218 
Telephone: 410 516 4237 Fax: 410 516 5050 E-mail: clsp@jhu.edu
CLSP We



b SiteWS 98 Site Map