Below, there are links to representative publications of my work (with a preference to journal publications) organized in different topics.
There is also some discussion on each topic, introducing the corresponding problem, and links to relevant publications written by others.

Column Subset Selection

The Column Subset Selection Problem (CSSP) is the following combinatorial optimization problem on matrices: given an m times n matrix A and an integer r < n, select r columns from A in a matrix C such that || A - C C^+ A ||_{mathrm{F}} (or || A - C C^+ A ||_{mathrm{2}}) is minimized among all n choose r possible choices for the matrix C. CSSP is important because a solution to it reveals the most ‘‘influential" columns in A. My work on CSSP has focused on the algorithmic aspects of the problem, providing polynomial-time methods (see FOCS paper) as well as input-sparsity-time methods (see STOC paper) with optimal approximation bounds. A representative result is this: given m times n matrix A, integer k < rank(A), and accuracy parameter epsilon > 0, there exists a randomized algorithm to select c = 2k/epsilon columns of A and form a matrix C such that the following bound holds in expectation: || A - C C^+ A ||_{mathrm{F}}^2 le left( 1 + epsilon right) || A - A_k ||_{mathrm{F}}^2; here, A_k in R^{m times n} is the best rank k matrix computed via the SVD of A: A_k = arg min_{rank(X) le k}|| A - X ||_{mathrm{F}}^2.

Principal Component Analysis

Given an m times n matrix A and an integer k < rank(A), the principal component analysis (PCA) problem is defined as finding k times n matrix B and m times k matrix U with orthonormal columns that simultaneously minimize the error || A - U cdot B ||_{mathrm{F}}. An optimal PCA is available via setting U = U_k and B = U_k^T A, where U_k is the m times k matrix containing the top k left singular vectors of A. I have studied principal component analysis from different perspectives:
1) fast principal component analysis: we are interested in computing approximations to the matrix B in o(SVD) (sub-SVD) time.
2) online principal component analysis: the columns in A are presented to the algorithm one by one, and for every presented column the algorithm must return immediately some k-dimensional column and assign it to the corresponding column of a k times n matrix B. Then, for a unique m times k orthonormal matrix U, the algorithm should make sure that the error || A - U B ||_{mathrm{F}} is small.
3) distributed principal component analysis: the columns in A are given to the algorithm in a distributed way, meaning that there are a number of machines each of which contains a column submatrix of A. The goal is to compute a PCA of A by minimizing the communication between the machines and some central processor (MPI model).

K-means Clustering

In the k-means clustering problem we are given a set of points a_1, a_2,dots,a_n in R^d and the number of clusters k. The problem asks to find a partition of the points into k disjoint sets (clusters) C_1, C_2,dots, C_k such that the function sum_{j=1}^{k} sum_{a_i in C_j}|| a_i - mu_j(C_j) ||_2^2 is minimized among all possible partitions; mu_j(C_j) = frac{1}{|C_j|} sum_{a_i in C_j} a_i in R^d. My work on this problem has focused on designing provably accurate dimensionality reduction methods: feature selection and feature extraction methods. Such dimensionality reduction methods can be used to solve the k-means problem in two steps: 1) ‘‘project’’ the input points to some r dimensional space (r ll d); 2) use some standard k-means algorithm on the low-dimensional points to find the desired partition C_1, C_2,dots, C_k. A highlight of the results that I have obtained for this problem is the following: for given point set a_1, a_2,dots,a_n in R^d, number of clusters k, and epsilon > 0, there is a randomized algorithm to select r=O(k log(k) / epsilon^2) features from the input point set such that solving the k-means problem on these r-dimensional points gives a (3+epsilon)-approximation to the optimal k-means partition. Indeed, this result is the first provably accurate feature selection method for k-means clustering.

Least-squares Problems

A linear algebraic view of a least-squares problem is: given A in R^{m times n} and b in R^m, find x in R^n to minimize || A x - b ||_2. In this generic form, this problem has found a number of applications in numerical linear algebra, machine learning and statistical data analysis. My work on least-squares problems/algorithms spans different topics:
1) fast approximation algorithms for constrained least-squares problems
2) sampling algorithms to find ‘‘coresets" in least-squares problems
3) algorithms to compute sparse solutions to least-squares problems
4) fast algorithms to solve regularized least-squares problems

Canonical Correlation Analysis

The goal of Canonical Correlation Analysis (CCA) is to analyze the relation between a pair of datasets (each in matrix form, say A and B). From a statistical point of view, CCA finds the direction of maximal correlation between A and B. From a linear algebraic point of view, CCA measures the similarities between span(A) and span(B). From a geometric point of view, CCA computes the cosine of the principal angles between span(A) and span(B). In the paper below, we presented and analyzed a fast approximation algorithm for CCA. Our algorithm is based on dimensionality reduction via fast random projections, meaning that given two tall-and-skinny matrices A and B, our algorithm first multiplies the two matrices from the left with some random matrix to significantly reduce the number of rows in A and B, and then employs some standard deterministic CCA algorithm on the ‘‘small" matrices.


The determinant of a matrix is one of the most important quantities associated with it and it has countless applications in scientific computing. In the paper below, we designed, analyzed, and implemented a fast approximation algorithm for the logarithm of the determinant of an arbitrary Symmetric Positive Definite (SPD) matrix. To our best knowledge, our algorithm is the first that comes with strong worst-case theoretical bounds and works for all SPD matrices.

Spectral Clustering

Spectral clustering is the process of clustering a given set of points via using the eigenvectors of the Laplacian matrix of the similarity graph corresponding to those points. My work on this topic has focused on fast approximation algorithms for spectral clustering. The first paper below presents a survey of such algorithms including several numerical experiments. The more recent paper presents a rigorous theoretical analysis of the use of the so-called power method to speed up spectral clustering.

Support Vector Machines

A support vector machine (SVM) is a classification algorithm to classify high-dimensional data. I did work on the topic of ‘‘dimensionality reduction’’ for SVN classification. The paper below provides a provably correct, random-projection-based such dimensionality reduction algorithm. There are also similar results on support vector regression.

Nonnegative Matrix Factorization

In Nonnegative Matrix Factorization (NMF), we are given an m times n matrix A with nonnegative entries and an integer k < rank(A); the problem asks to find an m times k matrix W with nonnegative entries and an k times n matrix H with nonnegative entries such that the error ||A - W H ||_F is as small as possible. During my undergraduate studies, I have developed an algorithm to find matrices W and H that are ‘‘good’’ starting points for some iterative algorithm that is being used to solve the NMF problem.