Pronunciation Group Meeting 23.6.97  (Sanjeev Khudanpur)

 1.	Bill Byrne, Sanjeev Khudanpur, Mike Riley, Murat Saraclar and
      Chuck Wooters were present.

 2.	Murat reported some ongoing work in preparation for the building
      the vanilla system for the workshop.  The tree building stuff
      definitely works.

  (a)	ICSI transcriptions of approximately two hours were used to build
      decision trees for the phoneme-to-phone mapping.  They were tested
      on a 32 minute portion of the WS96 dev-test which has also been 
      transcribed by ICSI.

  (b)	A second set of trees was built by mixing a subset of TIMIT with
      the ICSI data.  These trees were also tested on the 32 minute set
      mentioned above.

(c&d)	A third set of trees, grown with only TIMIT training data, was
      built.  These trees were tested on the 32 minute ICSI test set as
      well as a test set of comparable size drawn from TIMIT.

      Without smoothing the tree-probabilities, the test set contains
      unseen events which get a zero probability.  Rather than smoothing
      the trees, the worst 10% tokens in the test set were discarded in
      reporting the following test results.

      	Source of Training	ICSI	I+T	TIMIT	TIMIT
      	Source of Test set	ICSI	ICSI	ICSI	TIMIT

      	Prplxty at root (bits)	0.9109	0.8749	0.9084	0.3402

      	Prplxty of tree (bits)	0.6827	0.6099	0.7406	0.1524

      	Efficiency I(X,Y)/H(X)	25.05%	30.28%	18.48%	55.19%		

      Note that while the TIMIT/TIMIT trees reduce the "unigram"- 
      perplexity by as much as a half, the same doesn't hold for the
      ICSI/ICSI set.

 3.	While on the subject of the 32 minute ICSI test set, we mention
      that the sentences in there are a shade harder than the training
      set with regards to prediction!  The test set is about one fourth
      the training set in size.  Randomly dividing the 2-hour ICSI
      training set into roughly four equal parts, Murat built five sets
      of "root-only" trees, one from each fourth, and one from the test
      set.  These trees were then tested on each other's training data
      sets. 


      	Train->	1	2	3	4	test
      	Test	
      	    1	0.69	0.71	0.71	0.71	0.78
      	    2	0.70	0.71	0.71	0.71	0.79
      	    3	0.69	0.72	0.71	0.71	0.79
      	    4	0.70	0.72	0.71	0.71	0.79
      	  test	0.92	0.91	0.91	0.91	0.87

      At this point, we conjecture that the differences are due to the
      fact that the 32-minute set comes form last year's labeling effort
      while most of the training material was labeled this year using 
      slightly different conventions!

 4.	There was also some discussion on whether entropy like measures
      were appropriate for checking the goodness of the decision trees
      for our purposes.  This, it was felt, should be resolved soon so
      that any benefits of adding new features to the trees, or other  
      modifications, can be examined in a manner short of a recognition 
      experiment.  The discussion shall be resumed when we meet in July.

 5.	Mike reported that the FSA tools for converting a (training) word 
      `sequence'+trees into a phone network seems to be working well.  
      The FSA tools for converting a (test) word `lattice'+trees into a
      phone graph is almost working - Mike is weeding out the last of
      the bugs.

 6.	Bill reported his ongoing effort with vocal tract length (VTL)
      normalization and MLLR based adaptation.  The gains from these two
      does not seem to be as significant in our HTK based system as some
      other sites (eg. BBN) have reported in the last Hub5 evaluation.
      The following are the results on the WS96 dev-test.

      	Baseline system with
      	MF-PLP				54.19%	word accuracy

      	VTL on Test only		55.25%

      	VTL Train+Test			55.66%

      	VTL + one pass of MLLR		56.04%

      The VTL normalization was done using the warping giving the
      maximum likelihood to the output of the first decoding pass, a
      second decoding pass was performed with the warped data, and MLLR
      was performed on the output of this second pass.  Even though 
      these gains are not very large, and it would be good to know why,
      we expect that this should be enough of an improvement to reduce
      the lattice WER substantially.

If you have managed to read all the way down to this line, you are one
tenacious person :)

Cheers,

	- Sanjeev

Harriet Nock
Last modified: Sat Jul 5 14:58:29 EDT 1997