|
Hidden Space Dynamics using Kalman Filter Algorithms
HBR/JSB 180898 |
|
A Kalman Filter (KF) computes the parameters of posterior probablility distributions for certain kinds of stochastic process. Such processes are characterized by linear transformations and additive gaussian "noise". KFs generalize the kinds of linear filters familiar in signal processing.
We can think of such processes as hidden Markov models in which the hidden state and the observations are continuous random variables. This changes the nature of the computations: Because of the linearity and the gaussian nature of the randomness, all the posteriors are gaussian too, so instead of dealing with explicit probability distributions over a finite state space we deal with means and covariances.
In our acoustic-phonetic model we use a simple KF as a fancy kind of smoother with a variable time-constant. The full KF is a multi-dimensional system, but we only need one-dimension at a time. The stochastic process that generates our smoothing filter has a scalar state (xi) which evolves as a simple gaussian random walk with a variance of 1. The observations (ti) are produced by adding another zero-mean gaussian sample to the state. The variance of this observation noise is time-varying in a known way (pi).
The KF equations for estimating the state of this hypothetical generating process are as follows:
The estimated mean and variance of the state at frame i conditioned on observations up to and including frame i-1 is
Taken together with the reverse-time versions, these equations solve the problem of the equilibrium state of the spring model of acoustic-phonetic dynamics.
In our dynamic phonetic state generator, the sequence of target values is treated as the observations ti, and we also have a pliancy pi associated with each target value.
Fig 1 shows a sequence of phonetic targets and associated "standard deviations" (square roots of the pliancies).
A forward KF pass computes a mean and variance at each frame, conditioned on the observations so far (Fig 2) and a backward KF pass considers only the future (Fig 3). To obtain a symmetrical smoother we combine the two estimates (Fig 4).




(It is important to understand that we are not claiming that the target sequence is generated by the "model process" that the KF corresponds to. We do not even claim that the dynamic phonetic state construction process is a model of the actual speech pattern generation process.)
Fig 6 shows how the Kalman filter propagates the posterior distribution for the current frame forward to form a prior distribution for the state of the system at the next frame. This prior distribution can be combined with the observation distribution to then form the posterior at the next frame, given all of the observations seen up until this point.
Fig 5 shows an example of using a prior and observation distribution to obtain a posterior. The prior here specifies the estimated state value to a much greater precision than the obervation distribution (shown by the wide bell-shaped curve) in this example.
Fig 6 shows how the posterior distribution evolves over time given a sequence of 4 observations, all with the same error distribution (shown in red). The mean value of this posterior gradually evolves towards the mean of the observation distributions. Fig 2 also shows how these distributions change with incremental observations. The distribution is represented here by the mean (shown in blue), and the plus and minus one standard deviation points (shown in green).


Fig 8 shows how forward and backward Kalman filter passes through the data can be combined to give the best estimate at time t using the evidence provided from all of the data points (and not only those prior, or after time t). First of all, we use the Kalman filter maths to obtain the best estimates at each point using all of the data prior to time t (the forward pass), and then the same in the reverse direction, obtaining the best estimates given all the data following time t.
Then, thanks to these forward and backward recursions, at each time t we have a mean and variance for the estimated position:
All the gruesome mathematical details can be found here.
To return to our springs and beads view of what we are doing, the forward pass calculates the mean position for bead i if we cut the spring connecting it to bead i+1, and the "variance" propagated forward in the same recursion represents the "springyness" of bead i given the network on the left (i.e. if you tried to move it, how much it would oppose its displacement). This is a kind of "equivalent circuit" for the spring network, as we could replace the whole network with one spring attached to one position. If we followed the resistor network analogy given on a previous page, then this would be a Thevenin equivalent circuit.
The same can be done for the spring network to the right of the bead, and so the whole network can be reduced to a network of just one bead attached to three springs (which are themselves attached to three positions):

Fig 8 shows how this has been done. Means and variances are propagated from the left and the right to the position in question. This defines two "prior" distributions (shown in blue on the right of Fig 8), and these when combined with the observation distribution (shown in red) give the posterior distribution (shown in green) at this point.
