| word entrance penalty = 15 lm scaling factor = 0.3 | word entrance penalty = 4 lm scaling factor = 0.5 | |
| 64.9 % | 64.3 % | |
| speaker stats | speaker stats |
| word entrance penalty = 15 lm scaling factor = 0.6 |
| 63.6 % |
| speaker stats |
| Recognizer | Approximate frequency range(Hz) | Input layer size | Frame level error% (cv set) | word error rate% (test set) |
| Baseline (conventional) | 0:4000 | 234 | 48.62 | 67.6 |
| MLP merging of | ... | ... | ... | ... |
| all 7 subbands | 0:4000 | 392 | - | 63.7 |
| subbands 2-7 | 200:4000 | 336 | - | 64.1 |
| subbands 1-6 | 0:2767 | 336 | - | 64.2 |
| Recognizer | Approximate frequency range(Hz)*** | Input layer size | Frame level error% (cv set) | word error rate% (test set) |
| subband 1 | 0-300 | 153 | 72.27 | 74.6 |
| subband 2 | 200-670 | 207 | 66.53 | 72.6 |
| subband 3 | 550-980 | 153 | 66.81 | 78.1 |
| subband 4 | 820-1420 | 153 | 66.73 | 75.4 |
| subband 5 | 1200-1984 | 153 | 65.11 | 74.0 |
| subband 6 | 1670-2765 | 153 | 67.39 | 73.9 |
| subband 7 | 2300-4000 | 153 | 70.21 | 74.5 |
*** Note each sub-band roughly covers two critical bands
| Recognizer | Frequency range(Hz) | NN size | Frame level error % | word error rate % |
| subband 1 | 0-300 | 153:500:56 | 70.76 | 72.70 |
| subband 2 | 200-670 | 207:500:56 | 61.23 | 68.30 |
| subband 3 | 550-980 | 153:500:56 | 62.13 | 69.80 |
| subband 4 | 820-1420 | 153:500:56 | 61.05 | 69.70 |
| subband 5 | 1200-1984 | 153:500:56 | 57.80 | 67.80 |
| subband 6 | 1670-2765 | 153:500:56 | 62.33 | 68.10 |
| subband 7 | 2300-4000 | 153:500:56 | 66.56 | 71.00 |
| . | . | . | . | . |
| Baseline | 0-4000 | 234:500:56 | 42.65 | 60.90 |
| MLP merging | 0-4000 | 392:500:56 | 42.88 | 59.70 |
| Linear merging | 0-4000 | . | . | 65.10 |
Comparison with the the earlier 7 band experiment:
The four bands were recombined at the state level using :
The four band experiment
Data sets being used are:
Parameters used are:
Software used:
Recognition results
Recognizer Approximate frequency range(Hz) Input layer size Frame level error% (cv set) word error rate% (test set) subband 1 0-900 234 59.7 68.8 subband 2 800-1660 234 57.9 68.5 subband 3 1500-2550 162 60.3 69.2 subband 4 2300-4000 162 67.2 69.8
We also have combined the four bands (recombined with the ANN) with the full band probabilities :
word entrance penalty = 15
lm scaling factor = 0.661.4 %
word entrance penalty = 15
lm scaling factor = 0.661.0 %
word entrance penalty = 15
lm scaling factor = 0.659.4 %
The speaking rate
word entrance penalty = 15
lm scaling factor = 0.663.5 %
The modulation spectral filters
word entrance penalty = 15
lm scaling factor = 0.662.4 %
The Chaf
The results obtained with the CHAF are reported here
Last modified on August 23, 1996
Christophe Ris <ris@cspjhu.ece.jhu.edu
>