|
|
|
|
| words | ||
| phonemes | ||
| phonology | "pronumciation modeling" | |
| phonetic elements | allophones? | |
| alignments | ||
| targets | one (vector) target per phonetic element type. dimensionality? | |
| dynamics | time constants, shapes, overlap, dependence on phonetic label | |
| dynamic state | Formants? Vocal Tract Resonances? vocal tract shape? ... | |
| mapping | Form, invertability, learning, algorithms, scalability, initialization | |
| acoustic pattern | mfcc, ... | |
| acoustic metric | Euclidean distance in transformed log spectrum, (perceptually motivated?) | |
| data | artificial, synthetic speech, selected, laboratory, field, speakers. | |
| testing/use | acoustics -> dynamic state -> HMM
score aligned transcripts, select synthesize unseen contexts score lattice? new recognizer?? |