CUR Matrix Decompositions for Improved Data Analysis – Michael Mahoney (Yahoo Research)
View Seminar Video
Abstract
Much recent work in the Theoretical Computer Science, Linear Algebra, and Machine Learning has considered matrix decompositions of the following form: given an m-by-n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a few typically a constant number of columns of A, R consists of a few typically a constant number of rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is “close” to A. Applications of such decompositions include the computation of compressed “sketches” for large matrices in a pass-efficient manner, matrix reconstruction, speeding up kernel-based statistical learning computations, sparsity-preservation in low-rank approximations, and improved interpretability of data analysis methods. In this talk we shall discuss various choices for the matrices C, U, and R that are appropriate in different application domains. The main result will be an algorithm that computes matrices C, U, and R such that the Frobenius norm of the error matrix A – CUR is almost minimal. We will also discuss applications of this algorithm and related heuristics in the bioinformatics of human genetics, in recommendation system analysis, and in medical imaging.