Matrix Completion with Selective Sampling††thanks: Deanna Needell was partially supported by NSF CAREER DMS #1348721 and NSF BIGDATA DMS #1740325
Matrix completion is a classical problem in data science wherein one attempts to reconstruct a low-rank matrix while only observing some subset of the entries. Previous authors have phrased this problem as a nuclear norm minimization problem. Almost all previous work assumes no explicit structure of the matrix and uses uniform sampling to decide the observed entries. We suggest methods for selective sampling in the case where we have some knowledge about the structure of the matrix and are allowed to design the observation set.
Although large-scale data is easily acquired and accessible, it is often highly incomplete. For example, data is often missing in surveys in which participants only answer a subset of questions, or sensor systems in which malfunctions or power/memory restrictions are common. Even more familiar may be the collaborative filtering problem—a problem of keen interest for companies, such as Netflix or Amazon—in which systems are tasked with recommending a subset of the vast catalogue of products to users based on sparse user histories.
Mathematically, this is formulated as a matrix completion problem. The goal is to reconstruct a large, low-rank matrix having observed only a few entries. Let be a real-valued matrix and be a set of observed entries. That is, we assume that we only know the entry when the pair is in . From this incomplete data, we would like to reconstruct the matrix . If the matrix is known to be inherently low rank, it may seem wise to look for the lowest rank representation of the observed data. That is, one may want to solve the problem
where is the nuclear norm of : the sum of the singular values of . In doing so, the problem is re-phrased as an -minimization problem using the singular values of . Since -minimization lends itself to sparse solutions, solving this problem results in a low rank approximation to . Several authors have proven that if the observation set , which is typically generated uniformly at random, is large enough, then (2) leads to exact reconstruction with high probability [1, 2, 3].
Recently, Molitor and Needell  adapted the ordinary nuclear norm minimization method to account for structure in the observed and unobserved entries, but most current methods for matrix completion assume little about the structure of the matrix and take the observed entries from a uniform random distribution. We propose a situation where the entries need not be observed at random, but can be chosen to account for the relationships between the columns. In application, this could be thought of as designing a survey where important questions are listed first, so that even if a user does not complete the entire survey, their answers to these questions can be used to intuit their answers to other related questions.
Ii Selective Sampling Strategies
Consider a scenario where is assumed to have some special structure, and need not be drawn uniformly at random from , but can be designed. Specifically, let for denote the columns of . For a set of size , we define to be the matrix whose columns are for . Assume for a particular set and that the corresponding matrix has some known structure.
As a first idea, we could assume that we know the correlation matrix for . However, since the map is non-convex, this information is difficult to incorporate into a tractable minimization problem. Instead, if we assume that the pairwise correlations between the columns of are near , then there is a strong possibility that is very low rank. Accordingly, rather than assume we have information about , we assume that we know . This assumption is slightly stronger than assuming that the columns of are well correlated. With this assumption, we can find a basis for the column space of along with the coordinates of the columns in this basis. Once we have identified these, we can use them as an additional constraint. Thus we propose the minimization problem:
where . It remains to design a strategy for sampling entries of so that we can recover the basis and the coordinates of the columns in this basis.
Ii-a Optimal Sampling
Assuming that , we consider the problem of explicitly determining the relationship between the columns of while using the least possible amount of observations. That is, our goal is to find a collection of columns of (we will call this collection ) and a matrix such that
That is, the matrix in (3) will consist of columns of .
The question is how to find and while observing as little of as possible. Notice, it suffices to extract an invertible submatrix from . The columns corresponding to this submatrix will define , whence we can solve for all the coefficients in with only observations. This suggests the algorithm:
Randomly sample and .
If the matrix is invertible, then
Sample the remaining entries of the rows corresponding to
Solve for using (4)
If you reach this step, save the already observed entries and return to step 1.
If is densely defined with entries coming from a continuous probability distribution, then a random submatrix will almost surely be invertible, and the loop will terminate after one step (this may not be realistic with discrete data, which could result in wasting observations while looking for an invertible submatrix). Counting the observed entries, step 2a will require observations. Determining the basis coordinates requires an additional observations in step 2b. Then we simply need the remaining elements of the columns of to perfectly reconstruct this portion of the matrix—this requires observations. Thus we will have observed total entries; this number of observations is necessary and sufficient for perfect reconstruction of , which is why we refer to this as optimal sampling. After having used these observations, we assume that the remaining observations are taken uniformly at random from . Since we are not assuming that has any special structure, we do not expect that there would be any advantage to selectively sampling the entries. Note, the optimization problem (3) can actually be ‘de-coupled’ at this point: simply setting and performing nuclear norm minimization only on which will simplify the computations.
There are two potential ways in which we can gain accuracy using this strategy: we may gain accuracy by perfectly reconstructing , and we may gain accuracy by using fewer observations while reconstructing , thus saving additional observations for However, in application, it may not be realistic to sample entire rows or columns of the matrix.
Ii-B Finding Basis Coordinates from Random Sampling
Even if is constructed uniformly at random, there will likely be some invertible submatrices within , which can be used to intuit some relationships between the columns of without sampling full rows or columns, which may be unrealistic in practice. If we cannot sample full rows or columns, we could still attempt to find a set of bases matrices , each having the same column space as , and the coordinates of a particular column in the basis , so that . This suggests the algorithm:
Set . Repeat steps 2 - 4 until the desired amount of basis matrices and basis coordinates are found.
Randomly sample and .
If the matrix is invertible, then
Choose , and add into .
Sample the entries in column from each of the rows corresponding to .
Solve for using .
Save and to use as a constraint.
When you reach this step, save the already observed entries and return to step 2.
Having done this, we will have uncovered several relationships , and we can solve the minimization problem
As we have designed it here, we are still selectively sampling the matrix, so we refer to this as selective sampling. However, if the observations were made uniformly at random, we could search the observed entries of for invertible submatrices, and perform the same steps. This formulation will not be as effective as optimal sampling, since it uses more observations and it can discover redundant relationships between the columns, but it may be more realistic in practice.
Note that in the selective sampling algorithm, we do not know the full matrix at each step. However, we do know the indices which were used to construct . Accordingly, in step 3e, we save all of the entries of that we know—these are the entries of which are observed. Likewise, the constraint in (5) should actually read for . We are enforcing that each of these specific relationships between the columns of must hold.
We implemented the ordinary matrix completion with uniform sampling, as well as the optimal sampling method and the selective sampling method. We tested these methods on matrices where is simply the first columns of the matrix and has rank . We tested the methods across several different values of , rank and observation rate .
In Figure 2, we see the results of the nuclear norm minimization with uniform sampling, optimal sampling or selective sampling. Here is as described in Figure 1 and the observation rate is , meaning that of entries are observed. The relative error is measured in the operator norm. We report the average error over trials where the observation indices are chosen independently in each trial. The
optimal sampling strategy led to an average accuracy gain of nearly and the selective sampling strategy led to an average accuracy gain of roughly .
Next, we explored how the reconstruction errors compare when different parameters are adjusted. Recall, the optimal sampling method requires observations to perfectly reconstruct . We should observe accuracy gains proportional to how much smaller this number is than the expected number of observations from using uniform sampling (which is ). Thus treating , the size of the matrix, as fixed, we should see the largest accuracy gains when is large, is large, or is small. First, fixing and again working with a matrix , we computed the gain in reconstruction accuracy when and . The results are displayed in Figure 3. This figure aligns fairly well with our expectations. Here has rank 4 in each case.
Finally, we fix (the size of ) and vary the observation rate and the rank of . The results are shown in Figure 4. In these simulations, is a matrix and so that comprises the first of the columns. The observation rate is allowed to vary from to , though in the optimal sampling case, the results are not meaningful until the total number of observations is larger than the amount needed to construct and , which is . Again, this figure aligns with our intuition. For larger , optimal sampling requires a larger observation rate in order to see accuracy gains over uniform sampling.
The matrix completion problem is at the forefront of big data analysis. In application, there are often intuitive correlations between columns of the incomplete matrix: the answers to questions on a medical survey may be predictive of each other, or viewers may have similar opinions regarding movies in a given genre. Most of the previous work on this problem has focused on the general case, neglecting to consider any structure within the matrix. Building off of this work, we have suggested two methods for the matrix completion problem under the assumption that some portion of the matrix is known to be very low rank and we are allowed to design the observation set. The first method, which we termed optimal sampling, attempts to perfectly represent the structured portion of the matrix using the minimum amount of observations. In certain scenarios, this sampling strategy led to large gains in accuracy, but it may be unrealistic in practice. Accordingly, we described a second method, selective sampling, which forsakes perfect reconstruction of the structured portion of the matrix while still uncovering some of the structure. This method, too, led to accuracy gains in certain regimes.
-  Candès, E. and Y. Plan. 2010. “Matrix Completion With Noise.” Proceedings of the IEEE 98 (6): 925-36.
-  Candès, E. and B. Recht. 2009. “Exact Matrix Completion via Convex Optimization.” Foundations of Computational Mathematics 9 (6): 717-72. https://doi.org/10.1007/s10208-009-9045-5.
-  Candès, E. and T. Tao. 2010. “The Power of Convex Relaxation: Near-Optimal Matrix Completion.” IEEE Transactions on Information Theory 56 (5): 2053-80. https://doi.org/10.1109/TIT.2010.2044061.
-  Grant, M. and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.0. http://cvxr.com/cvx, September 2013.
-  Grant, M. and S. Boyd. Graph implementations for nonsmooth convex programs, Recent Advances in Learning and Control (a tribute to M. Vidyasagar), V. Blondel, S. Boyd, and H. Kimura (eds.), p. 95-110, Notes in Control and Information Sciences, Springer, 2008.
-  Molitor, D., and D. Needell. 2018. “Matrix Completion for Structured Observations.” http://arxiv.org/abs/1801.09657.