Formants as the Hidden Space
HBR 180898



Here, we have trained our Hidden Dynamic Model on the data shown on the previous page.

The hidden dynamic parameters (admittedly after some scaling and offsetting), have been plotted under a spectrogram of speech synthesised by the model. As can plainly be seen, in this case the hidden parameters are the formants on some warped frequency scale.

From the construction of the HDM, however, there is no constraint on the learning process for the hidden dynamic space to result in the formant space. This is because an equivalent HDM can be constructed where this hidden space is transformed by an arbitrary linear transform (it is possible simply transform the targets by an arbitrary non-singular matrix, and modify the linear transform given by the first layer of MLP weights by the inverse of this matrix to give an equivalent HDM).

In other words, it is just coincidence in this case that we have (almost) directly uncovered the formants, but in general it is likely that a two dimensional system trained on vowel data will result in some linear transformation of formant space.

Fig 1. Reproduced spectrogram and hidden space dynamics for a vowel utterance.

It is interesting to note that the 2-dimensional parameter set has managed to encode silence by making the second formant go below the first one, going into an area of formant space which is not usually defined.

Fig 2. Transitions in the unscaled hidden dynamic space.