Approximate Distance Clustering – Lenore Cowen (Johns Hopkins University, Department of Mathematical Sciences)

September 24, 1996 all-day

This talk is concerned with the design of algorithms for finding clusters in high-dimensional space. We develop and implement a clustering algorithm appropriate for problems in which a small number of observations are embedded in high-dimensional space. The approach, based on the approximate distance reduction methods of Linial, London and Rabonivitch, involves non-linear projections based on inter-point distances to random subsets. Algorithms are presented for both a presence and an absence of fully-classified training data. Theoretical, simulation, and experimental results are presented indicating that the `approximate distance clustering’ methodology compares favorably to conventional statistical approaches in many challenging scenarios. Many applications in language and speech processing involve clustering distance matrices embedded in 10,000+ dimensional space. It is hoped that the audience may find these techniques helpful in attacking the computational bottlenecks in their clustering problems. (joint work with Carey E. Priebe)

Center for Language and Speech Processing