I know that voice: an interactive lecture-demonstration of human assisted speaker recognition
John J. Godfrey and Craig S. Greenberg, Department of Defense/ National Institute of Standards and Technology
May 4, 2012
As we heard from our recent seminar guest Diana Sidtis, the ability to recognize other human voices, but most especially those of our family and close associates, has deep biological roots and an interesting neurological basis, including a sharp difference between familiar and unfamiliar voices. Computers make no such distinction. While we have made enormous progress in enabling computers to recognize voices, we have not paid much attention to how humans do it. We should – we need to know both the limits and the special capabilities of humans, both to improve our modeling and to enable computers to work hand in hand with humans in practical applications like forensics and biometrics.
So how good are humans at utilizing automatic speaker recognition technology for performing speaker verification tasks?
Don’t believe what you see on CSI or in the papers! and keep an eye on the case surrounding the tragic death of Trayvon Martin in Florida which is likely to involve such matters.
The 2010 NIST Speaker Recognition Evaluation (SRE10) included a test of Human Assisted Speaker Recognition (HASR) in which systems based in whole or in part on human expertise were evaluated on limited sets of trials. Results were submitted for 20 systems from 15 sites from 6 countries. The performance results suggest that the chosen trials were indeed difficult, as is often the case in real- life situations, and that the HASR systems did not appear to perform as well as the best fully automatic systems on these trials. This does not mean that machines are simply, always, everywhere “better” than people at speaker recognition. But what does it mean? This is worth discussing.
This lecture-demonstration will provide a live, interactive speaker recognition exercise for the audience, giving everyone a firsthand experience of the task a human forensic examiner often faces. Prepared with such experience, the audience will then hear the highlights of the NIST HASR evaluations.
John Godfrey received his PhD in Linguistics from Georgetown University, did a postdoc at AMRL in Dayton, and spent 10 years at UT-Dallas’ Callier Center as an Assist./Assoc. Professor, where he focused on speech perception and psycholinguistics. He later joined Texas Instruments Speech Research Group where, in addition to phonetics research, he worked on corpus-based evaluation, designing and collecting corpora such as: Wall Street Journal, TI-MIT, ATIS, and SWITCHBOARD. It is widely acknowledged that these helped drive speech research for the next decade and more.
He also served as the first Executive Director of the LDC, creating the infrastructure for evaluation-based “big data” research in HLT ever since.
In 1999 he became Chief of HLT Research at NSA where he oversaw both government and external R&D efforts in Speaker, Language and Speech Recognition, as well as the annual NIST evaluations in these areas. His strategic responsibilities also included liaison with academic and industrial labs, DARPA, IARPA, and NSF.
In recent years his research group’s success on classified applications has become widely known and demonstrated in the Intelligence Community. They won the NSA Research Team of the Year award in 2010.
As HLT Chief Scientist for NSA Research, he also conducts and oversees research in speaker recognition by man and machine.
Craig Greenberg received his B.A.(Hons.) degree in Logic, Information, & Computation from the University of Pennsylvania(2007), and his B.M. degree from Vanderbilt University(2003). He is currently working toward his M.S. degree (to be awarded in May 2012) in Applied Mathematics at Johns Hopkins University in the Engineering and Applied Science Program for Professionals.
He works as a Mathematician at the Gaithersburg, Maryland campus of the National Institute of Standards and Technology (NIST) in the areas of speaker recognition and language recognition. Previous positions he has held include: Computer Scientist Intern at the National Institute of Standards and Technology, Research Assistant for Professor Mitch Marcus at the University of Pennsylvania, Programmer at the Linguistic Data Consortium, and English Language Annotator at the Institute for Research in Cognitive Science.
Mr. Greenberg has been a member of the International Speech Communication Association (ISCA) since 2008. He has received two official letters of recognition for his contribution to speaker recognition evaluation.