Vanilla System Progress Notes 10.7.97


The JHU gang, comprising Bill, Murat, Harriet and I, is building a
baseline system for the workshop and the vanilla system for our group.

1. The pronunciation trees were rebuilt using all the available
   annotation from ICSI and most of the TIMIT annotation.  Here is a
   summary of the perplexity results on the 32 minute portion of the
   ICSI (WS96 dev-test) set.   Murat will conduct another test on a
   held out portion of the training data.  (Recall this test set my
   be a bit odd.)

             Source of Training	ICSI    I+T     TIMIT   TIMIT
             Source of Test set	ICSI    ICSI    ICSI    TIMIT

             Prplxty at root (bits)  0.9008  0.8630  0.8992  0.3402

             Prplxty of tree (bits)  0.6611  0.6007  0.7330  0.1524

             Efficiency I(X,Y)/H(X)  26.61%  30.39%  18.48%  55.19%

    Remark:  The ICSI transcriptions used for testing have now been
    normalized to the new annotation conventions.  This makes
    comparisons with the old results difficult :)) 

2.  We also outlined a procedure for training the Vanilla system in a
    manner consistent with HTK's training philosophy of incremental 
    modification.  In summary, the Vanilla system discards the
    pronouncing dictionary as the source of multiple pronunciations,
    and instead uses pronunciation trees built from the ICSI and TIMIT
    transcriptions.  Some details and the current state of affairs are
    outlined in the document

     	/people/CLSP/sanjeev/vanilla-progress.ps

3.  A usable baseline system for the workshop incorporating ML-VTLN 
    and MLLR is ready (Bill). It has a word accuracy of about 58% and a
    lattice WER of about 10% on the WS96 dev-test.  Lattices have also
    been created on the WS97 (linguistically segmented) dev-test.  
    However, this set has some other problems, such as the presence of 
    long silences inside utterances, which is probably throwing off
    the adaptation etc.  Bill is not happy enough with them to
    report any result yet, and will have final numbers on this by the 
    weekend.

Harriet Nock
Last modified: Thu Jul 10 19:52:23 EDT 1997