Recent Advances in Audio Information Retrieval – Bhuvana Ramabhadran (IBM)

February 5, 2008 all-day

View Seminar Video
Early word-spotting systems processed the audio signal to produce phonetic transcripts without the use of an automatic speech recognition (ASR) system. In the past decade, most of the research efforts on spoken data retrieval have focused on extending classical IR techniques to word transcripts. Some of these have been done in the framework of the NIST TREC Spoken Document Retrieval tracks.The use of word and phonetic transcripts was explored more recently in the context of the Spoken Term Detection (STD) 2006 evaluation conducted by NIST. In this talk, I will begin with IBMs submission to the STD evaluation and cover recent work at IBM to enhance the performance of the end-to-end audio search system. The first technique proposes the use of a similarity measure based on a phonetic confusion matrix that accounts for higher-order phonetic confusions (phone bi-grams and tri-grams) and the second is an application of vector space modeling, particularly Latent Semantic Analysis (LSA), to shortlist the most relevant audio segments, resulting in the same level of performance when using only 3% of the overall collection instead of the entire collection for search.

Dr. Bhuvana Ramabhadran is a Research Staff Member in the Multilingual Analytics and User Technologies at the IBM T.J. Watson Research Center. Since joining IBM in 1995, she has made significant contributions to the ViaVoice line of products and served as the Principal Investigator for the NSF-funded project, Multilingual Access to Large Spoken Archives: MALACH and EU-funded project, TC-STAR: Technology and Corpora for Speech-to-Speech Translation. She currently manages a group that focuses on large vocabulary speech transcription, audio information retrieval and text-to-speech synthesis. Her research interests include speech recognition algorithms, statistical signal processing, pattern recognition and biomedical engineering.

Center for Language and Speech Processing