Training the Parameters of the Hidden Dynamic Model of Speech Production

In our approach, the properties of the filter (hidden dynamics) remain fixed during the training. As the filter used is based on only one parameter, however, this can be sensibly chosen beforehand (perhaps!).

However, some parameters of the model remain to be determined from some training data:

  1. Target values (a vector in hidden parameter space for each phone class)
  2. Non-linear mapping parameters (MLP weights)
This is shown by the red "variable" arrows in Fig 1.

Fig 1. Model parameters and backpropagation paths.

Given a number of training utterances with both (the correct) phone sequences/timings and acoustic data, the model parameters can be adjusted to reduce the error on this data. This is made easier by the fact that derivatives of the error can be backpropagated through the MLP, and through the MLP and dynamics to the targets. In this way derivatives of the error function with respect to both the MLP weights and target parameters can be obtained, so all of these parameters can be trained using gradient descent.