Transcribing Speech for Language Processing
Mari Ostendorf, University of Washington
September 18, 2007
With recent advances in automatic speech recognition, there are growing opportunities for natural language processing of speech, including applications such as information extraction, summarization and translation. As speech processing moves from simple word transcription to document processing and analyses of human interactions, it becomes increasingly important to represent structure in spoken language and incorporate structure in performance optimization. In this talk, we consider two types of structure: segmentation and syntax. Virtually all types of language processing technology, having been developed on written text, assumes knowledge of sentence boundaries; hence, sentence segmentation is critical for spoken document processing. Experiments show that sentence segmentation has a significant impact on performance of tasks such as parsing, translation and information extraction. However, optimizing for downstream task performance leads to different operating points for different tasks, which we claim argues for the additional use of subsentence prosodic structure. Parsing itself is an important analysis tool used in many human language technologies, and jointly optimizing speech recognition performance for parse and word error benefits these applications. Moreover, we show that optimizing recognition for parsing performance can benefit subsequent language processing (e.g. translation) even when parse structure is not explicitly used, because of the increased importance placed on constituent headwords. Of course, if parsing is part of the ultimate objective, recognition benefits even more from parsing language models than with simple word error rate criteria. A complication arises in working with conversational speech due to the presence of disfluencies, which reinforces the argument for subsentence prosodic modeling and explicit representation of disfluencies in parsing models.
Mari Ostendorf received the Ph.D. in electrical engineering from Stanford University in 1985. After working at BBN Laboratories (1985-1986) and Boston University (1987-1999), she joined the University of Washington (UW) in 1999. She has also served as a visiting researcher at the ATR Interpreting Telecommunications Laboratory in Japan in 1995 and at the University of Karlsruhe in 2005-2006. At UW, she is currently an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. She teaches undergraduate and graduate courses in signal processing and statistical learning, including a project-oriented freshman course that introduces students to signal processing and communications. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 160 publications and 2 paper awards. Prof. Ostendorf has served on numerous technical and advisory committees, as co-Editor of Computer Speech and Language (1998-2003), and now as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing. She is a Fellow of IEEE and a member of ISCA, ACL, SWE and Sigma Xi.