CLSP
WORKSHOP '96

The Results


This page collects all the results we have obtained during this workshop. Please refer to the Description of the experiments page for the complete description of the experiments.

The baseline system

Those experiments were performed using :

The multiband system

The seven band experiment

Data sets being used are:

Parameters used are:

Software used:

Latest results:

RecognizerApproximate frequency range(Hz)Input layer sizeFrame level error% (cv set)word error rate% (test set)
Baseline (conventional)0:400023448.6267.6
MLP merging of ............
all 7 subbands0:4000392-63.7
subbands 2-7200:4000336-64.1
subbands 1-60:2767336-64.2

RecognizerApproximate frequency range(Hz)***Input layer sizeFrame level error% (cv set)word error rate% (test set)
subband 1 0-30015372.2774.6
subband 2 200-67020766.5372.6
subband 3 550-98015366.8178.1
subband 4 820-142015366.7375.4
subband 5 1200-198415365.1174.0
subband 6 1670-276515367.3973.9
subband 7 2300-400015370.2174.5

*** Note each sub-band roughly covers two critical bands

Recent Multi-band results

RecognizerFrequency range(Hz)NN sizeFrame level error %word error rate %
subband 1 0-300153:500:5670.7672.70
subband 2 200-670207:500:5661.2368.30
subband 3 550-980153:500:5662.1369.80
subband 4 820-1420153:500:5661.0569.70
subband 5 1200-1984153:500:5657.8067.80
subband 6 1670-2765153:500:5662.3368.10
subband 7 2300-4000153:500:5666.5671.00
.....
Baseline0-4000234:500:5642.6560.90
MLP merging0-4000392:500:5642.8859.70
Linear merging0-4000..65.10

7 sub-band model.

Comparison with the the earlier 7 band experiment:


The four band experiment

Data sets being used are:

  • Training set: 4hr ICSI male cut
  • Cross-validation (cv) set: 10% of the training sentences
  • Test set: 240 random utterances (only males) from the dev-test

Parameters used are:

  • Features: PLP cepstra with CMS + delta_cepstra + delta-log-energy + delta-delta-log-energy
  • Number of subbands: 4
  • Number of MLP hidden units: 500
  • Number of MLP outputs: 56 (icsi phoneme set)

Software used:

  • QuickNet from ICSI
  • STRUT lattice-decoder from Mons.

Recognition results

RecognizerApproximate frequency range(Hz)Input layer sizeFrame level error% (cv set)word error rate% (test set)
subband 1 0-90023459.768.8
subband 2 800-166023457.968.5
subband 3 1500-255016260.369.2
subband 4 2300-400016267.269.8

The four bands were recombined at the state level using :

  1. a untrained linear recombination strategy : sum of the log-likelihoods
    word entrance penalty = 15
    lm scaling factor = 0.6
    61.4 %

  2. a trained linear recombination strategy : ANN without hidden layer
    word entrance penalty = 15
    lm scaling factor = 0.6
    61.0 %

We also have combined the four bands (recombined with the ANN) with the full band probabilities :
word entrance penalty = 15
lm scaling factor = 0.6
59.4 %


The speaking rate

word entrance penalty = 15
lm scaling factor = 0.6
63.5 %


The modulation spectral filters

word entrance penalty = 15
lm scaling factor = 0.6
62.4 %


The Chaf

The results obtained with the CHAF are reported
here
Last modified on August 23, 1996
Christophe Ris <ris@cspjhu.ece.jhu.edu >