SuperSID: Exploiting High-level Information for High-performance Speaker Recognition

Research Group of the 2002 Summer Workshop

Identifying individuals based on their speech is an important component technology in many application, be it automatically tagging speakers in the transcription of a board-room meeting (to track who said what), user verification for computer security or picking out a known terrorist or narcotics trader among millions of ongoing satellite telephone calls.
How do we recognize the voices of the people we know? Generally, we use multiple levels of speaker information conveyed in the speech signal. At the lowest level, we recognize a person based on the sound of his/her voice (e.g., low/high pitch, bass, nasality, etc.). But we also use other types of information in the speech signal to recognize a speaker, such as a unique laugh, particular phrase usage, or speed of speech among other things.

Most current state-of-the-art automatic speaker recognition systems, however, use only the low level sound information (specifically, very short-term features based on purely acoustic signals computed on 10-20 ms intervals of speech) and ignore higher-level information. While these systems have shown reasonably good performance, there is much more information in speech which can be used and potentially greatly improve accuracy and robustness.

In this workshop we will look at how to augment the traditional signal-processing based speaker recognition systems with such higher-level knowledge sources. We will be exploring ways to define speaker-distinctive markers and create new classifiers that make use of these multi-layered knowledge sources. The team will be working on a corpus of recorded telephone conversations (Switchboard I and II corpora) that have been transcribed both by humans and by machine and have been augmented with a rich database of phonetic and prosodic features. A well-defined performance evaluation procedure will be used to measure progress and utility of newly developed techniques.

Team Members
Senior Members
Walter Andrews	DoD
Joe Campbell	MIT Lincoln Laboratory
Jiri Navratil	IBM
Barbara Peskin	ICSI
Doug Reynolds	MIT Lincoln Laboratory
Graduate Students
Andre Adami	OGI
Qin Jin	Carnegie Mellon University
David Klusacek	Charles University
Undergraduate Students
Joy Abramson	York University
Radu Mihaescu	Princeton University

SuperSID: Exploiting High-level Information for High-performance Speaker Recognition

Upcoming Seminars

Center for Language and Speech Processing