Deterministic Completion of Rectangular Matrices Using Ramanujan Bigraphs – I:Error Bounds and Exact Recovery

Deterministic Completion of Rectangular Matrices
Using Ramanujan Bigraphs – I:
Error Bounds and Exact Recovery

Shantanu Prasad Burnwal and Mathukumalli Vidyasagar The authors are with the Indian Institute of Technology Hyderabad, Kandi, Telangana 502285, India. Emails: ee16resch11019@iith.ac.in, m.vidyasagar@iith.ac.in. This research was supported by the Department of Science and Technology, and the Science and Engineering Research Board, Government of India.
Abstract

In this paper we study the matrix completion problem: Suppose is unknown except for an upper bound on its rank. By measuring a small number of the elements of , is it possible to recover exactly, or at least, to construct a reasonable approximation of ? At present there are two approaches to choosing the sample set, namely probabilistic and deterministic. Probabilistic methods can guarantee the exact recovery of the unknown matrix, but only with high probability. At present there are very few deterministic methods, and they mostly apply only to square matrices. The focus in the present paper is on deterministic methods that work for rectangular as well as square matrices, and where possible, can guarantee exact recovery of the unknown matrix. We achieve this by choosing the elements to be sampled as the edge set of an asymmetric Ramanujan graph or Ramanujan bigraph. For such a measurement matrix, we (i) derive bounds on the error between a scaled version of the sampled matrix and unknown matrix; (ii) derive bounds on the recovery error when max norm minimization is used, and (iii) present suitable conditions under which the unknown matrix can be recovered exactly via nuclear norm minimization. In the process we streamline some existing proofs and improve upon them, and also make the results applicable to rectangular matrices.

This raises two questions: (i) How can Ramanujan bigraphs be constructed? (ii) How close are the sufficient conditions derived in this paper to being necessary? Both questions are studied in a companion paper.

1 Introduction

1.1 General Statement

Compressed sensing refers to the recovery of high-dimensional but low-complexity objects from a small number of linear measurements. Recovery of sparse (or nearly sparse) vectors, and recovery of high-dimensional but low-rank matrices are the two most popular applications of compressed sensing. The object of study in the present paper is the matrix completion problem, which is a special case of low-rank matrix recovery. The matrix completion problem has been getting a lot of attention because of its application to different areas such as image processing, sketching, quantum tomography, and recommendation systems (e.g., the Netflix problem). An excellent survey of the matrix completion problem can be found in [1].

1.2 Problem Definition

The matrix completion problem can be stated formally as follows: Suppose is an unknown matrix that we wish to recover whose rank is bounded by a known integer . Let denote the set for each integer . In the matrix completion problem, a set is specified, known as the sample set or measurement set. To be specific, suppose , where is the total number of samples. We are able to measure the values for all . Equivalently, the set of measurements can be expressed as the Hadamard product111Recall that the Hadamard product of two matrices of equal dimensions is defined by for all . where is defined by

From these measurements, and the information that , we aim to recovery completely, or at least to construct a good approximation of .

One possible approach to the matrix completion problem is to set

(1)

The above problem is a special case of minimizing the rank of an unknown matrix subject to linear constraints, and is therefore NP-hard [2]. Since the problem is NP-hard, a logical approach is to replace the rank function by its convex relaxation, which is the nuclear norm, or the sum of the singular values of a matrix, as shown in [3]. Therefore the convex relaxation of (1) is

(2)

It is known that, when the elements of are selected at random, the unique solution to (2) is the true but unknown matrix , with high probability. Such results are reviewed in Section 2.

Another emerging trend is to use the so-called “max-norm” introduced in [4]. To define this norm, we begin by recalling that, if , then an induced matrix norm is given by

where denotes the -th row of the matrix . The max-norm of a matrix is defined as

(3)

With this definition, an alternate approach to matrix completion is

(4)

1.3 Contributions of the Present Paper

In the literature to date, most of the papers assume that the sample set is chosen at random from , either without replacement as in [5], or with replacement [6]. The authors are aware of only two papers [7, 8] in which a deterministic procedure is suggested for choosing the sample set as the edge set of a Ramanujan graph. (This concept is defined below).

In case is chosen at random, it makes little difference whether the unknown matrix is square or rectangular. However, if is to be chosen in a deterministic fashion, then the approach suggested in [7, 8] requires that the unknown matrix be square.222Though the paper [7] uses the notation , in the theorems it is assumed that . The reason for this is that, while it is possible to define the notion of a Ramanujan bigraph (which would be required in the case of rectangular matrices), until now there is not a single explicit construction of such a graph, only some abstract formulas that are not explicitly computable [9, 10]. One of the main contributions of a companion paper is to present an infinite family of Ramanujan bigraphs; this is the first such explicit construcion. In the present paper, we prove bounds on how close the solution of (2) is to the true but unknown matrix. These bounds are an improvement on the available bounds in two different ways. First, these bounds are applicable for rectangular matrices, while existing deterministic methods do not apply to this case. Second, even in the case of square matrices, our bounds improve currently available bounds. These improvements are achieved though modifying the so-called “expander mixing lemma” for bipartite graphs, which is a result that is possibly of independent interest. Finally, we derive sufficient conditions under which the unique solution of (1) is the true but unknown matrix.

2 Literature Review

In [5], the authors point out that the formulations (1) or (2) do not always recover an unknown matrix. They illustrate this by taking as the matrix with a in the position and zeros elsewhere. In this case, unless , the solution to both (1) and (2) is the zero matrix, which does not equal . The difficulty in this case is that the matrix has high “coherence,” as defined next.

Definition 1.

Suppose has rank and the reduced singular value decomposition , where , , and is the diagonal matrix of the nonzero singular values of . Let denote the orthogonal projection of onto . Finally, let denote the -th canonical basis vector. Then we define

(5)

where is the -th row of . The quantity is defined analogously, and

(6)

Next, define

(7)

It is shown in [5] that . The upper bound is achieved if any canonical basis vector is a column of . (This is what happens with the matrix with all but one element equalling zero.) The lower bound is achieved if every element of has the same magnitude of , that is, a submatrix of a Walsh-Hadamard matrix.

To facilitate the statement of some known results in matrix completion, we reproduce from the literature two standard coherence assumptions on the unknown matrix .

  1. There are known upper bounds on and respectively.

  2. There is a constant such that

    (8)
    (9)

    where is shorthand for .

Assumption (A2) can be interpreted as follows: The relationship can be expressed as

Therefore, if is sufficiently large, it can be expected that

would be small.

2.1 Probabilistic Sampling

There are two approaches to choosing the sample set , namely probabilistic and deterministic. In the probabilistic approach the elements of are chosen at random from . In this setting one can further distinguish between two distinct situations, namely sampling from with replacement or without replacement. If one were to sample out of the elements of the unknown matrix without replacement, then one is guaranteed that exactly distinct elements of are measured. However, the disadvantage is that the locations of the samples are not independent, which makes the analysis quite complex. This is the approach adopted in [5].

Theorem 1.

(See [5, Theorem 1.1].) Draw

(10)

samples from without replacement. Then with probability atleast where

(11)

the recovered matrix using (2) is be the unique solution. Here are some universal constants that depend on , and .

An alternative is to sample the elements of with replacement. In this case the locations of the samples are indeed independent. However, the price to be paid is that, with some small probability, there would be duplicate samples, so that after random draws, the number of elements of that are measured could be smaller than . This is the approach adopted in [6]. On balance, the approach of sampling with replacement is easier to analyze.

Theorem 2.

(See [6, Theorem 2].) Assume without loss of generality that . Choose some constant , and draw

(12)

samples from with replacement. Define as in (2). Then, with probability at least equal to where

(13)

the true matrix is the unique solution to the optimization problem, so that .

2.2 Basic Concepts from Graph Theory

In contrast with probabilistic sampling, known deterministic approaches to sampling make use of the concept of Ramanujan graphs. For this reason, we introduce a bare minimum of graph theory. Further details about Ramanujan graphs can be found in [11, 12].

Suppose . Then can be interpreted as the biadjacency matrix of a bipartite graph with vertices on one side and vertices on the other. If , then the bipartite graph is said to be balanced, and is said to be unbalanced if . The prevailing convention is to refer to the side with the larger () vertices as the “left” side and the other as the “right” side. A bipartite graph is said to be left-regular with degree if every left vertex has degree , and right-regular with degree if every right vertex has degree . It is said to be -biregular if it is both left- and right-regular with row-degree and column-degree . Obviously, in this case we must have that . It is convenient to say that a matrix is “-biregular” to mean that the associated bipartite graph is -biregular. The bipartite graph corresponding to is defined to be a Ramanujan bigraph if

(14)

2.3 Deterministic Sampling

The following result is claimed in [7].

Theorem 3.

(See [7, Theorem 4.2].) Suppose Assumptions (A1) and (A2) hold. Choose to be the adjacency matrix of regular graph such that , and . Define as in (2). With these assumptions, if

(15)

Then the true matrix is the unique solution to the optimization problem (2).

However, there is one step in the proffered proof of the above theorem that does not appear to be justified. More details are given in the Appendix.

Theorems 1 and 2 pertain to nuclear norm minimization as in (2). In [8], an alternate set of bounds is obtained for max norm minimization as in (3). The matrix is assumed to be square, with .

Theorem 4.

(See [8, Theorem 2].) Suppose is the adjacency matrix of a -regular graph with second largest (in magnitude) eigenvalue equal to . Define as in (3). Then

(16)

where is Grothendieck’s constant, and denotes the Frobenius norm of a matrix.

There is no closed-form formula for this constant, but it is known that

See [13] for this and other useful properties of Grothendieck’s constant.

Theorems 1 and 2 on the one hand, and Theorem 4 on the other hand, have complementary strengths and weaknesses. Theorems 1 and 2 ensure the exact recovery of the unknown matrix via nuclear norm minimization. However, the bounds involve the coherence of the unknown matrix as well as its rank. In contrast, the bound in Theorem 4 is “universal” in that it does not involve either the rank or the coherence of the unknown matrix , just its max norm. Moreover, the bound is on the Frobenius norm of the difference , and thus provides an “element by element” bound. On the other hand, there are no known results under which max norm minimization exactly recovers the unknown matrix.

3 New Results

In this section we state without proof the principal new results in the paper. The proofs are given in subsequent sections.

3.1 Rationale of Using Ramanujan Bigraphs

We begin by giving a rationale of why biadjacency matrices of Ramanujan bigraphs are useful as measurement matrices. Suppose we could choose , the matrix of all ones. Then , and we could recover exactly from the measurements. However, this choice of corresponds to measuring every element of , and there would be nothing “compressed” about this sensing. Now suppose that , the biadjacency matrix of a -biregular graph. Then is the largest singular value of , with corresponding row and column singular vectors and . Let denote the second largest singular value of . Then

where denotes the spectral norm of a matrix (i.e., its largest singular value). Using the formulas for and and rescaling shows that

This formula can be expressed more compactly by defining the constant , as

where the various equalities follow from the fact that . One can think of as the fraction of elements of the unknown matrix that are sampled. Since , we see that

where . Therefore

(17)

Now note that

Therefore, the smaller is compared to , the better the approximation error is betwen and the unknown matrix .333Note that are the dimensions of the unknown matrix and are therefore fixed. Now, a Ramanujan bigraph is one for which this ratio is as small as possible. It is shown in [14] that, if are kept fixed while are increased, subject of course to the constraint that , then (14) gives the best possible upper bound on .

3.2 Error bounds using deterministic sampling

Theorem 5 below provides an upper bound on the error between a scaled version of the measurement matrix an the true matrix . It extends [7, Theorem 4.1] to rectangular matrices while at the same time providing a simpler proof. Note that there is no optimization involved in applying this bound.

Theorem 5.

Suppose the sampling set comes from a -regular bipartite graph, and let denote the second largest singular value of (and of course is the largest singular value of ). Suppose is a matrix of rank or less, and let denote its coherence as defined in (6). Then

(18)

where denotes the spectral norm (largest singular value) of a matrix.

Remark: Observe that the bound in (18) is a product of two terms: which depends on the measurement matrix , and which depends on the unknown matrix .

Corollary 1.

Suppose the sampling set comes from a - regular asymmetric Ramanujan graph, Then

(19)

Theorem 6 below extends [8, Theorem 2] to rectangular matrices. (Note that the same theorem was also independently discovered in [15, Theorem 22].) Even for square matrices, the bound in Theorem 6 is smaller by a factor of two compared to that in [8, Theorem 2], stated here as Theorem 4. Note that, similarly to Theorem 4 but in contrast with Theorem 5, the bound in Theorem 6 does not involve the coherence of the unknown matrix, nor its rank. Moreover, the bound is on the Frobenius norm of the difference, and is therefore an “element by element” bound, unlike in Theorem 5.

Theorem 6.

Suppose the sampling set comes from a -regular bipartite graph, and let denote the second largest singular value of its biadjacency matrix.444Note that biregularity implies that the largest singular value is . Suppose is a solution of (3). Then

(20)

where is the Frobenious norm, is the max norm and is Grothendieck’s constant.

Corollary 2.

Suppose the sampling set comes from a - regular asymmetric Ramanujan graph, Then

(21)

3.3 Sufficient Condition for Exact Recovery

The next theorem presents a sufficient condition under which nuclear norm minimization as in (2) and sampling matrix from a Ramanujan bigraph leads to exact recovery of the unknown matrix. Note that [7, Theorem 4.2] claims to provide such a sufficient condition for square matrices. However, in the opinion of the authors, there is a gap in the proof, as discussed in the Appendix. Therefore Theorem 7 can be thought as the first result to prove exact recovery using nuclear norm minimization and a deterministic sampling matrix.

Theorem 7.

Suppose is a matrix of rank or less, and satisfies the incoherence assumptions and with constants and .555Note that, unlike [5, 6], we do not require the constant . Suppose is a biadjacency matrix of a biregular graph , and let denote the second largest singular value of matrix . Define

(22)

where , and suppose that

(23)
(24)

Then is the unique solution of (2).

4 Proofs

In this section we give the proofs of various theorems in the previous section. Due to its length, the proof of Theorem 7 is given separately in the Appendix. We state a couple of lemmas that are used repeatedly in the sequel. Throughout we use the notation that if is a matrix, then denote the -th row and -th column of respectively. The -th element of is denoted by .

4.1 Some Preliminary Results

Theorem 8.

Suppose , , and . Suppose further that . Then

(25)
Proof.

The proof follows readily by expanding the triple product. Note that

Therefore

as desired. ∎

Theorem 9.

Suppose are as in Theorem 8. Suppose further that

(26)

Then

(27)
Proof.

Recall that, for any matrix , we have that

In particular

where the last step follows from Theorem 8. Now fix such that . Then

Therefore (27) is proved once it is established that, whenever , it follows that

(28)

To prove (28), apply Schwarz’ inequality to deduce that

(29)

Now

By entirely similar reasoning, we get

Substituting these two bounds into (29) establishes (28) and completes the proof. ∎

4.2 Proof of Theorem 5

Proof.

As before, define

and recall that

Now suppose is a singular value decomposition of , where . Define . Then . Moreover

because , and the definition of the coherence . Similarly

Now apply Theorem 9 with

and note that . Then (27) becomes

as desired. ∎

4.3 Proof of Theorem 6

The proof of Theorem 6 is based on the following extension of the expander mixing lemma from [16] for rectangular expander graphs, which might be of independent interest.

Lemma 1.

Let be the adjacency matrix of an asymmetric biregular graph with vertices so that , and is the largest singular value of . Let denote the second largest singular value of . Then for all and , we have:

(30)

where is the number of edges between the two vertex sets and , and is the total number of edges in the graph.

Remark: First we explain why this result is called the “expander mixing lemma.” Note that is the fraction of rows that are in , while is the fraction of columns that are in . If the total number of edges were to be uniformly distributed, then the term on the left side of (30) would equal zero. Therefore the bound (30) estimates the extent to which the distribution of edges deviates from being uniform.

The above result extends [8, Theorem 8] which is adapted from [16, Lemma 2.5] to regular Ramanujan graphs. Moreover, the bound given here is tighter, because of the presence of the two square-root terms on the right side. As become larger, the square root terms tend to zero. No such term is present in [16, Lemma 2.5].

Proof.

Let denote the characterstic vectors of sets respectively. Then

Write , and note that, due to the biregularity of , we have that , , and . Next, write and , where belong to the row null space and column null space of respectively. Note that , and similarly . Then

Rearranging the above gives

(31)

Next, by Schwarz’ inequality, it follows that

Now note that

and similarly

This implies that

(32)

Substituting this into (31), dividing both sides by gives the first expression in (30). The second expression follows from . ∎

Theorem 10.

Suppose and is the edge set of an asymmetric -biregular graph. Then

(33)

where is Grothendieck’s constant.

Note that the same result is independently discovered in [15, Theorem 22].

Proof.

Let be a rank sign matrix with entries, and define its corresponding binary matrix by , where is a matrix with all ones. Because is a rank sign matrix, it can be expressed as , where and . Define

Let represent the characterstic vector of set . Let and , and let denote the complements of in the sets respectively. Then , , and . Therefore

(34)

Here, the inequality comes from Lemma 1 and the inequality comes from , where equality holds when and .

Any real matrix can be expressed as a sum of rank- sign matrices in the form . Define

where the number of terms in the summation is unspecified. As stated in [8, Theorem 7] the max-norm can be related to this new norm via

(35)

Therefore