CUR Matrix Decompositions for Improved Data Analysis – Michael Mahoney (Yahoo Research)

March 28, 2006 all-day

View Seminar Video
Much recent work in the Theoretical Computer Science, Linear Algebra, and Machine Learning has considered matrix decompositions of the following form: given an m-by-n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a few typically a constant number of columns of A, R consists of a few typically a constant number of rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is “close” to A. Applications of such decompositions include the computation of compressed “sketches” for large matrices in a pass-efficient manner, matrix reconstruction, speeding up kernel-based statistical learning computations, sparsity-preservation in low-rank approximations, and improved interpretability of data analysis methods. In this talk we shall discuss various choices for the matrices C, U, and R that are appropriate in different application domains. The main result will be an algorithm that computes matrices C, U, and R such that the Frobenius norm of the error matrix A – CUR is almost minimal. We will also discuss applications of this algorithm and related heuristics in the bioinformatics of human genetics, in recommendation system analysis, and in medical imaging.

Center for Language and Speech Processing