JHU-CLSP-WS2003: Confidence Estimation Laboratory (July 10, 2003) Author: simona Gandrabur Mini-Doc for each SAMPLE-NBEST file (one per line): --------------------------------------------------- NBESTLOGALL 0 // sentence number -1 // not used: number of errors # // separator symbol in front of Chinese sentence 今年 前 两月 广东 高 新技术 产品 出口 37.6亿 美元 # // sep-symbol before MT-tokenized English sentence the first two months of this year guangdong high @-@ tech products 3.76 billion us dollars # // sep-symbol before parse-token. English sentence the first two months of this year guangdong high @-@ tech products 3.76 billion us dollars @ // sep-symbol in front of hidden variable info -1 -1 -1 -1 -1 2181658 2181658 2181658 2181658 84766 -1 -1 -1 -1 -1 -1 -1 -1 s // sep-symbol in front of alignment info A 11 4 A 10 4 A 9 4 A 8 4 A 7 3 # /// sep-symbol in front of feature function value info CostsLanguageModel -65.242 CostsLanguageModel2 -73.3438 CostsLanguageModel3 -69.7971 CostsLanguageModel4 -74.1036 CostsWordPenalty -17 CostsAlTempPenalty -4 CostsSWLex -6.34862 CostsAlTemp -1.58593 CostsRuleBased1 -2 Mini-Doc for extracted n-best features and tags: ----------------------------------------------- The SAMPLE-NBEST.dat file contains one line of 18 features + one tag per n-best alternative: 0: sentence number (irrelevant for training, only for convience) 1: rank in n-best list 2: CostsLanguageModel 3: CostsLanguageModel2 4: CostsLanguageModel3 5: CostsLanguageModel4 6: CostsWordPenalty 7: CostsAlTempPenalty 8: CostsSWLex 9: CostsAlTemp 10: CostsRuleBased1 11: CostsRuleBased2 12: CostsRuleBased3 13: CostsJump 14: source sentence length 15: target (translation) length 16: source-lenth/target-length ratio 17: base-model setence score 18: wer = tag test data: 1/10 of SAMPLE-NBEST.dat train data: 9/10 of SAMPLE-NBEST.dat The corresponding SAMPLE-NBEST-thres.tr/.te files must be in Torch-format (header = #lined #cols). The wer-tag was replaced by 0/1-tags according to a selected arbitrary threshold: translations with a wer smaller then the threshold are tagged 1, otherwise 0. Setup: ------ CE_HOME = /export/ws03_est CE_DATA = $CE_HOME/data CE_SRC = $CE_HOME/src mkdir ~you/labce/ mkdir ~you/data/ cd ~you/data/ ln -s $CE_HOME/data/SAMPLE-NBEST . ln -s $CE_HOME/data/SAMPLE-NBEST.ref . ln -s $CE_HOME/data/SAMPLE-NBEST.te . ln -s $CE_HOME/data/SAMPLE-NBEST.tr . TASK 1: Generate base .tr/.te files ----------------------------------- To generate the SAMPLE-NBEST-thres.tr/.te IN YOUR CURRENT DIRECTORY (!!!) run the $CE_SRC/thres.pl script. These files are big! Delete after lab! Start with thres = 75. Adapt your .tr/.te files to Torch format (look in $CE_DATA for examples), but keep a copy of the orginal format .te file (without the header), for matlab ROC computations. TASK 2: train/test MLPs on your .tr/.te files --------------------------------------------- Try 3 differnt MLP architectures: 0, 10, 25 hidden units (hu). See example and usage un $CE_HOME/example_mlp. TASK 3: compare base-model sentence score ROC with CE-score ROC --------------------------------------------------------------- To generate ROC curves run $CE_HOME/matlab/ROC.m file in matlab. See example and usage in $CE_HOME/example_ROC. TASK 4: compare discriminativity of features -------------------------------------------- Run the matlab $CE_HOME/matlab/CA_CR_table() for individual features on the .te file (in the original format, no Torch header). Compare IROC, equal-error-rate (EER = threshold where FR = FA). Compare with results obtained with the CE-score. TASK 5: comapare wer/0/1-tagging thresholds ------------------------------------------- Do steps 1-4 with differnet thresholds (don't keep the data files, just the resutling ROC curves and CA-CR-tables). What happens? TASK 6: HAVE FUN :-) --------------------