|
Introducing Target Pliancies for Consonant Production
HBR 180898 |
|
For our production model, each speech segment is associated with a single target vector for the duration of that segment. In the simpler version of the model, the hidden dynamic state can be obtained by some simple linear, time-invariant filtering of these target values with time, resulting smooth trajectories for the hidden dynamic parameters.
However, if we refer to the hidden dynamic state of the natural method of speech production, i.e. the articulatory system, this does not correspond to the simple model for the following reason:
For consonants, usually the action of a single "critical" articulator is consistent for a given phone. For example, the critical action in producing a /d/ is to raise the tongue tip to the alveolar ridge, the remaining shape of the vocal tract is not important (for the identity of this phone), and assumes a shape which is dependent on the context.

An example of this is shown in Fig 1. The "critical" articulator here is A2, and only this articulator is heavily influenced by the consonant "targets" during the second segment. The other "non-critical" articulators move smoothly between the targets defined by the vowel context.
To allow our model to exhibit this kind of interpolation behaviour, we add to each target vector a "variance" vector which describes how much the hidden representation is allowed to deviate from the target. These extra "pliancy" parameters determine the relative influences of the current targets and the targets of the surrounding segments. The smaller the value of the pliancy for a given target parameter, the more it will "pull" the hidden dynamic trajectory towards it. The greater the pliancy, the less that target will influence the trajectory.
The following pictures show what happen when different values of pliancy are assumed for a given target in a one-dimensional system, and how it these extra parameters can be used to achieve the sort of natural behavior shown in Fig 1.


Fig 2 shows the target values for three consecutive segments, and some errorbars representing the pliancy parameters for those targets. Fig 3 shows what influence these parameters have on the trajectory produced (the green errorbars represent the hidden trajectory posterior variance, and will be explained later). The target is reached in the first and last segments which have a low target pliancy. In the centre, however, where the pliancy is greater, the hidden dynamic trajectory approaches the target, but falls far short of the actual target value during the segment.

Fig 4 shows what happens when we set the target pliancy for the centre segment to approximately the same value as the surrounding segments. Here the target is asserted by the low pliancy value, and so the target is easily reached for much of the segment duration.

Fig 5 shows what happens when the pliancy is large for the centre segment. The hidden dynamic trajectory is little influenced by the target value.
It can now be seen how these pliancy parameters can be used to achieve realistic coarticulation for consonant articulation. For consonant targets, critical articulators (hidden parameters) would have a low pliancy, forcing the system to achieve the target value (for consonants a closure or constriction). Non-critical articulators, however, would have much larger pliancies, so the articulator values for these segments would be mostly determined by their context, usually the surrounding vowels. The way in which these pliancy parameters have been implemented in our model of speech production is explained on the next page.
Öhman, S. E. G. (1966). Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America 39, 151-168.