Deterministic Annealing for Clustering, Compression, Classification, Regression, and Speech Recognition – Kenneth Rose (Department of Electrical and Computer Engineering, University of California at Santa Barbara)

April 10, 2001 all-day

The deterministic annealing approach to clustering and its extensions have demonstrated substantial performance improvement over standard supervised and unsupervised learning methods on a variety of important problems including compression, estimation, pattern recognition and classification, and statistical regression. It has found applications in a broad spectrum of disciplines ranging from various engineering fields to physics, biology, medicine and economics. The method offers three important features: ability to avoid many poor local optima; applicability to many different structures/architectures; and ability to minimize the right cost function even when its gradients vanish almost everywhere as in the case of the empirical classification error. It is derived within a probabilistic framework from basic information theoretic principles. The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero “temperature”, produces a non-random (hard) solution. Phase transitions in the process correspond to controlled increase in model complexity. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon’s rate-distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, mixtures of experts and hidden Markov models. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods. After covering the basics, the talk will emphasize the speech recognition applications of the approach, including a brief summary of work in progress.

Kenneth Rose received the Ph.D. degree in electrical engineering from Caltech in 1991. He then joined the faculty of the Department of Electrical and Computer Engineering, University of California at Santa Barbara. His research interests are in information theory, source and channel coding, speech and general pattern recognition, image coding and processing, and nonconvex optimization in general. He currently serves as the editor for source-channel coding for the IEEE Transactions of Communications. In 1990 he received the William R. Bennett Prize-Paper Award from the IEEE Communications Society for the best paper in the Transactions.

Center for Language and Speech Processing