Seminars

NO.080 Recent Advances in Randomized Numerical Linear Algebra

Shonan Village Center

July 25 - 28, 2016 (Check-in: July 24, 2016 )

Organizers

  • Ravindran Kannan
    • Microsoft Research India, India
  • Michael Mahoney
    • University of California Berkeley, USA
  • David Woodruff
    • IBM Research Almaden, USA

Overview

Description of the meeting

Introduction and overview

Randomized Numerical Linear Algebra (RandNLA) is an interdisciplinary research area that exploits randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems. From a foundational perspective, RandNLA has its roots in theoretical computer science (TCS), with deep connections to mathematics (convex analysis, probability theory, metric embedding theory) and applied mathematics (scientific computing, signal processing, numerical linear algebra). From an applied perspective, RandNLA is a vital new tool for machine learning, statistics, and data analysis. Well-engineered implementations have already outperformed highly-optimized software libraries for ubiquitous problems such as least squares (LS), with good scalability in parallel and distributed environments. RandNLA promises a sound algorithmic and statistical foundation for modern large-scale data analysis.

The great interdisciplinary strength of RandNLA is also one of the main challenges to its future progress. Researchers from diverse areas don’t speak a common language, they don’t publish in the same publication venues, they often can’t evaluate the significance of important contributions of researchers from other areas. In this proposed workshop, we have assembled an interdisciplinary group of researchers who have made fundamental contributions to different aspects of RandNLA.

Our main goal is to advance mathematical aspects of RandNLA and to solve fundamental algorithmic and statistical challenges in making this theory useful in large-scale machine learning and data analysis applications. In particular, by bringing together these researchers in this workshop, we would like to develop a multi-faceted common theoretical foundation for state-of-the-art RandNLA methodologies and a detailed understanding of the complementary algorithmic and statistical issues that arise in their application to modern large-scale data analysis.

Background

Matrices are ubiquitous in computer science, statistics, and applied mathematics. An m x n matrix can encode information about m objects (each described by n features), or the behavior of a discretized differential operator on a finite element mesh; an n x n positive-definite matrix can encode the correlations between all pairs of n objects, or the edge-connectivity between all pairs of nodes in a social network; and so on. Motivated largely by technological developments that generate extremely large scientific and Internet data sets, recent years have witnessed exciting developments in the theory and practice of matrix algorithms. Particularly remarkable is the use of randomization typically assumed to be a property of the input data due to, e.g., noise in the data generation mechanism as an algorithmic or computational resource for the development of improved algorithms for fundamental matrix problems such as matrix multiplication, LS approximation, low-rank matrix approximation, Laplacian-based solvers, etc.

Each of the three organizers has written a separate NOW Publishers monograph on different aspects of randomized numerical linear algebra, including: “Spectral Algorithms” by Kannan and Vempala in 2009, “Randomized Algorithms for Matrices and Data” by Mahoney in 2011, and “Sketching as a Tool for Numerical Linear Algebra” by Woodruff in 2014.

These monographs explain how sampling, sketching, and other randomized algorithmic techniques can be used for speeding up classical solutions to numerical linear algebra problems. Since it is an inherently interdisciplinary area, RandNLA can be approached from several different, yet complementary, perspectives.

Pure and applied mathematics. RandNLA has depended on progress in convex analysis and probability theory, making heavy use of, e.g., matrix measure concentration inequalities such as the matrix-Chernoff and matrix-Bernstein bounds. Dimensionality reduction ideas (such as fast random projections) have also been critical in RandNLA theory and practice and were used in NLA applications to construct highly efficient preconditioners for LS problems and to approximately solve low-rank approximation problems. Not surprisingly, there has been two-way traffic between RandNLA and the aforementioned areas, with progress in one area motivating and inspiring new results in the other.

Computer science. RandNLA initially grew out of work in TCS that analyzed the performance of algorithms that sample columns, rows, or elements from a matrix according to data-dependent non-uniform sampling probabilities. In parallel, but largely independently, ground-breaking progress was made on the theory and practice of solvers for systems of linear equations involving Laplacian matrices. Finally, sparse subspace-preserving embeddings have made RandNLA algorithms extremely efficient for massive sparse matrices.

Statistics and machine learning. RandNLA has received much attention in machine learning and statistics, despite the substantial “impedance mismatch” in problem formulation, parameterization, and objectives of interest. An obvious reason is that data sets are often modeled as matrices or graphs, and thus low-rank approximation, Laplacian-based problems, etc. arise naturally. In many cases, running time is a concern. See, e.g., for an example of applying RandNLA methods to CUR and Nystrom-based low-rank approximations in large-scale environments. A more subtle reason is that the randomization in RandNLA often “denoises” the input, and thus one observes implicit statistical regularization.

RandNLA in the field. Much of the interest in RandNLA has come from research areas that employ matrix algorithms as “black boxes.” RandNLA has had important successes in such application domains for small- and medium-scale data, ranging from population genetics and biology to astronomy and mass spectroscopy imaging. RandNLA has thus far been studied from each of these very different perspectives largely in isolation. An important aspect of this workshop will be to address RandNLA in a more unified manner.

Report

No-080.pdf