Approximate Distance Clustering – Lenore Cowen (Johns Hopkins University, Department of Mathematical Sciences)
Abstract
This talk is concerned with the design of algorithms for finding clusters in high-dimensional space. We develop and implement a clustering algorithm appropriate for problems in which a small number of observations are embedded in high-dimensional space. The approach, based on the approximate distance reduction methods of Linial, London and Rabonivitch, involves non-linear projections based on inter-point distances to random subsets. Algorithms are presented for both a presence and an absence of fully-classified training data. Theoretical, simulation, and experimental results are presented indicating that the `approximate distance clustering’ methodology compares favorably to conventional statistical approaches in many challenging scenarios. Many applications in language and speech processing involve clustering distance matrices embedded in 10,000+ dimensional space. It is hoped that the audience may find these techniques helpful in attacking the computational bottlenecks in their clustering problems. (joint work with Carey E. Priebe)