Switchboard Trained Hidden Dynamic Model
HBR 180898



The following figures show how well the HDM can reproduce the data used to train it (from the same phone sequence and timing). In each figure three plots are shown:

This final plot shows where the model is not so good at modelling the given data, and highlights some things which require further attention. In particular, we have noticed the following problems After an initial scan through the training data, it seems that the acoustic error may be dominated by errors due to mistranscription, misalignment, and signal quality (notice the clicks in the spectrograms below).

Fig 1. Reproduction of the training data using the same segmentation and labelling.
The speaker says "salary" here.

Fig 2 shows an example of a poor fit to the data because the left context is missing (the file has been chopped here in the preparation stage).

Fig 2. A chopped file, removing the left context.

Fig 3 shows an example of misalignment. It is possible that problems like these could be rectified in a realignment phase.

Fig 3. An example of transcription error in the training data.