Making use of better models of speech
HBR 14/07/1998



Rescoring the acoustic match  

More complex model can't be used efficiently in decoding directly
Rescore N-best hypotheses from a conventional recogniser
Add correct transcription
Assess performance based on how often correct transcription is chosen.

Fig 1. Rescoring the output(s) from a conventional recogniser.

The challenge is to find a rescoring algorithm that performs better than the conventional recogniser.


Training parameters of the rescoring algorithm 

Fig 2. shows the necessary inputs required for training the rescorer.
A phone sequence derived from the correct transcription is aligned to the acoustic data using some method, preferably automatic.
The aligner here could be a conventional HMM-based recogniser in forced-recognition mode.

Fig 2. Training the rescoring algorithm.

The next two pages are concerned with the contents of the rescoring box, how its parameters are trained, and also how it is used to rescore the acoustic match given the acoustic data and aligned phone sequences.