Developing Efficient Models of Intrinsic Speech Variability – Richard Rose (McGill University)
View Seminar Video
There are a variety of modeling techniques used in automatic speech recognition that have been developed with the goal of representing potential sources of intrinsic speech variability in a low dimensional subspace. The focus of much of the research in this area has been on “speaker space” based approaches where it is assumed that statistical models for an unknown speaker lie in a space whose basis vectors represent relevant variation among a set of reference speakers. As an alternative to these largely data driven approaches, more structured feature and model representations have been developed that are based on theories of speech production and acoustic phonetics. The performance improvements obtained by speaker space approaches like eigenvoice modeling, cluster adaptive training, and several others have been reported for speaker adaptation in many ASR task domains where only small amounts of adaptation data are available. The potential of systems based on phonological distinctive features has also been demonstrated on far more constrained task domains. This talk presents discussion and experimental results that attempt to explore the potential advantages of both classes of techniques. We will also focus on the limitations of these techniques in addressing some of the basic problems that still exist in state of the art ASR systems.
Richard Rose received B.S. and M.S. degrees in Electrical Engineering from the University of Illinois, and Ph.D. E.E. degree from the Georgia Institute of Technology. He has served on the technical staff at MIT Lincoln Laboratory working on speech recognition and speaker recognition. He was with AT&T for ten years, first with AT&T Bell Laboratories and then in the Speech and Image Processing Services Laboratory at AT&T Labs – Research. Currently, he is an Associate Professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec. Professor Rose has served in various roles in the IEEE Signal Processing Society. He was a member of the IEEE Signal Processing Society Technical Committee on Digital Signal Processing. He was elected as an at large member of the Board of Governors for the Signal Processing Society. He has served as an associate editor for the IEEE Transactions on Speech and Audio Processing and again for the IEEE Transactions on Audio, Speech, and Language Processing. He is currently a member of the editorial board for the Speech Communication Journal. He was a member of the IEEE SPS Speech Technical Committee (STC) and was founding editor of the STC Newsletter. He also served as co-chair of the IEEE 2005 Workshop on Automatic Speech Recognition and Understanding.