Searching for Information in Very Large Collections of Spoken Audio – Richard Rose (McGill University)
View Seminar Video
There are a vast number of applications that require the ability to extract information from spoken audio. These include, for example, searching for segments of lectures or videos in large media repositories that may be relevant to a given query. Other examples include topic classification, segmentation, and clustering in audio recordings of meetings, conversations, and broadcast news. While there has been a great deal of work in these areas, there are a number of constraints posed by these applications that limit the range of approaches that might be considered practical. Users demand sub-second response latencies to queries when searching collections which may contain thousands of hours of speech. System designers demand that the search engines be configured using little or no resources taken from the target domain. This presentation will begin with an introduction to the important problems involving search and classification of spoken audio material. An approach involving open vocabulary indexing and search of lattices generated offline by a large vocabulary continuous speech recognition engine will then be presented. The important aspect of the approach is its scalability to extremely large collections. Performance will be presented for a task involving recorded lectures taken from an online media server.
Richard Rose received B.S. and M.S. degrees in Electrical Engineering from the University of Illinois, and Ph.D. E.E. degree from the Georgia Institute of Technology. He has served on the technical staff at MIT Lincoln Laboratory working on speech recognition and speaker recognition. He was with AT&T for ten years, first with AT&T Bell Laboratories and then in the Speech and Image Processing Services Laboratory at AT&T Labs – Research. Currently, he is an Associate Professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec. Professor Rose has served in various roles in the IEEE Signal Processing Society. He was a member of the IEEE Signal Processing Society Technical Committee on Digital Signal Processing. He was elected as an at large member of the Board of Governors for the Signal Processing Society. He has served as an associate editor for the IEEE Transactions on Speech and Audio Processing and again for the IEEE Transactions on Audio, Speech, and Language Processing. He is currently a member of the editorial board for the Speech Communication Journal. He was a member of the IEEE SPS Speech Technical Committee (STC) and was founding editor of the STC Newsletter. He also served as co-chair of the IEEE 2005 Workshop on Automatic Speech Recognition and Understanding.