Exploiting Latent Semantic Mapping for Generic Feature Extraction – Jerome R. Bellegarda (Apple Inc.)
Originally formulated in the context of information retrieval, latent semantic analysis exhibits three main characteristics: (i) words and documents (i.e., discrete entities) are mapped onto a continuous vector space; (ii) this mapping is determined by global correlation patterns; and (iii) dimensionality reduction is an integral part of the process. Because such fairly generic properties may be advantageous in a variety of different contexts, this has sparked interest in a more inclusive interpretation of the underlying paradigm. The outcome is latent semantic mapping, a data-driven framework for modeling global relationships implicit in large volumes of (not necessarily textual) data. The purpose of this talk is to give a broad overview of the framework, highlight the attendant focus shift from semantic classification to more general feature extraction, and underscore the multi-faceted benefits it can bring to a number of problems in speech and language processing. We conclude with a discussion of the inherent trade-offs associated with the approach, and some perspectives on its likely role in information extraction going forward.
Jerome R. Bellegarda received the Ph.D. degree in Electrical Engineering from the University of Rochester, Rochester, New York, in 1987. From 1988 to 1994 he worked on automatic speech and handwriting recognition at the IBM T.J. Watson Research Center, Yorktown Heights, New York. In 1994 he joined Apple Inc, Cupertino, California, where he is currently Apple Distinguished Scientist in Speech & Language Technologies. His general interests span voice-driven man-machine communications, multiple input/output modalities, and multimedia knowledge management. In these areas he has written approximately 150 publications, and holds over 40 U.S. and foreign patents. He has also served on many international scientific committees, review panels, and editorial boards. In particular, he has worked as Expert Adviser on speech technology for both the National Science Foundation and the European Commission (DGXIII), was Associate Editor for the IEEE Transactions on Audio, Speech and Language Processing, served on the IEEE Signal Processing Society Speech Technical Committee, and is currently a member of the Speech Communication Editorial Advisory Board. He is a Fellow of the IEEE.