MODELING DISFLUENCIES

 
General strategy:
 
  - Build an ngram model using the DF-cleaned text, and modify it with
explicit extra-ngram modeling of the various DF phenomena.
  - Model the DF phenomena by clustering the contexts in which they 
occur.
 
The DFs to be modeled ths way:
        filled pauses
        repetitions 
        substitution (how? ignore for now.)
        deletion
        discourse markers
        editing phrases (very few anyway?)
        sentential conjunctions
 
filled pauses modeling:
  Possible features
   - position in sentence/segment
   - lexical properties of word before/after
  Distinguish UH from UM: correlate with length of following pause
        (may not be needed since UH and UM are recognized well)
 
repetition modeling:
  Possible features
   - position in sentence/segment
   - length
   - text following the rep
   - # of repetition cycles
   - interaction with fp
  Avoid grammatical repetitions:
   - VERY, REALY, THAT
  Use prosody, eg silence (see below)
 
deletion modeling:
  Reset context on training, allow reset on testing
  Possible features
   - # wds from beginning
   - no coarticulation
   - pauses
 
discourse markers modeling:
 - collect stats from annotated data.
 - cluster them?
 
modeling editing phrases :
 - collect stats.  They are probably too rare to worry about.
 - relate to fp and rep?
 
* Make use of prosodic cues, eg:
 - Consider silence an LM event, and model it similarly to other DFs.
   Problem: HTK lattices do not provide silence info.
        --> look at energy levels and annotate the timeline w/
            silences, then find their position in the hypothesis. 
   Define silence of several different durations, or model the length
        as a continuous parameter.
 
* Model the conversation dynamics by considering back-side responses
as black-box LM events (classified by length and, if short, maybe by
type).  This will not help WER in current setup, b/c turn-end implies
lattice end.