Models of speech recognition (by both human and machine) have traditionally assumed the phoneme to serve as the fundamental unit of phonetic and phonological analysis. However, phoneme-centric models have failed to provide a convincing theoretical account of the process by which the brain extracts meaning from the speech signal and have fared poorly in automatic recognition of natural, informal speech (e.g., the Switchboard corpus). Over the past five months the Switchboard Transcription Project has phonetically transcribed a portion of the Switchboard corpus in an effort to better understand the failure of phoneme-centric models for machine recognition of speech, as well as to provide a database through which to improve the performance of recognition systems focused on conversational dialogs. Transcription of spoken dialogs illustrates the pitfalls of a phoneme- based system. Many words are articulated in such a fashion as to either omit or significantly transform the phonetic properties of phonemic constituents, thus resulting in wide variation of word pronunciations. Often, only the barest hint of a segment is realized phonetically, in spite of good intelligibility. Despite this large variability in phonetic realization of words, the temporal properties of speech segments, both phones and syllables, appear to conform to regular patterns. This temporal regularity suggests that much of the linguistic information in speech may be signaled through temporal variations in amplitude, pitch and the coarse spectrum, and that such patterns may be useful in the design of future-generation speech recognition systems.