Switchboard Trained Hidden Dynamic Model
HBR 180898
|
|
The following figures show how well the HDM can reproduce the data used to
train it (from the same phone sequence and timing).
In each figure three plots are shown:
- A spectrogram of the original data
- A spectrogram of the reproduction using the HDM
- A plot of the acoustic error
This final plot shows where the model is not so good at modelling the given
data, and highlights some things which require further attention.
In particular, we have noticed the following problems
- Transcription errors
- Poor alignment
- Poor modelling
After an initial scan through the training data, it seems that the acoustic
error may be dominated by errors due to mistranscription, misalignment, and
signal quality (notice the clicks in the spectrograms below).
Fig 1. Reproduction of the training data using the same segmentation and labelling.
The speaker says "salary" here.
Fig 2 shows an example of a poor fit to the data because the left context
is missing (the file has been chopped here in the preparation stage).
Fig 2. A chopped file, removing the left context.
Fig 3 shows an example of misalignment. It is possible that problems like
these could be rectified in a realignment phase.
Fig 3. An example of transcription error in
the training data.