Quantum-inspired classical sublinear-time algorithm for solving low-rank semidefinite programming via sampling approaches

Semidefinite programming (SDP) is a central topic in mathematical optimization with extensive studies on its efficient solvers. Recently, quantum algorithms with superpolynomial speedups for solving SDPs have been proposed assuming access to its constraint matrices in quantum superposition. Mutually inspired by both classical and quantum SDP solvers, in this paper we present a sublinear classical algorithm for solving low-rank SDPs which is asymptotically as good as existing quantum algorithms. Specifically, given an SDP with constraint matrices, each of dimension and rank , our algorithm gives a succinct description and any entry of the solution matrix in time given access to a sample-based low-overhead data structure of the constraint matrices, where is the precision of the solution. In addition, we apply our algorithm to a quantum state learning task as an application.

Technically, our approach aligns with both the SDP solvers based on the matrix multiplicative weight (MMW) framework and the recent studies of quantum-inspired machine learning algorithms. The cost of solving SDPs by MMW mainly comes from the exponentiation of Hermitian matrices, and we propose two new technical ingredients (compared to previous sample-based algorithms) for this task that may be of independent interest:

• Weighted sampling: assuming sampling access to each individual constraint matrix , we propose a procedure that gives a good approximation of .

• Symmetric approximation: we propose a sampling procedure that gives low-rank spectral decomposition of a Hermitian matrix . This improves upon previous sampling procedures that only give low-rank singular value decompositions, losing the signs of eigenvalues.

1 Introduction

1.1 Motivations

Semidefinite programming (SDP) is a central topic in the studies of mathematical optimization and theoretical computer science, with a wide range of applications including algorithm design, machine learning, operations research, etc. The importance of SDP mutually comes from its generality that contains the better-known linear programming (LP) and the fact that it admits polynomial-time solvers. Mathematically, an SDP is defined as follows:

 max Tr[CX] (1) s.t. Tr[AiX]≤bi∀i∈[m]; (2) X⪰0, (3)

where is the number of constraints in the SDP, are Hermitian matrices, and ; Eq. 3 restricts the variable matrix to be positive semidefinite (PSD), i.e., is an Hermitian matrix with non-negative eigenvalues. An -approximate solution of this SDP is an that satisfies Eqs. 3 and 2 while the maximum in Eq. 1 is smaller than .

There is rich literature on solving SDPs. Ellipsoid methods gave the first polynomial-time SDP solvers [Kha80, GLS81]. Then, the complexities of the SDP solvers had been subsequently improved by the interior-point method [NN92] and the cutting-plane method [Ans00, Mit03]; see also the survey paper [VB96]. Currently, the state-of-the-art SDP solver [LSW15] improved the previous cutting-plane methods with running time , where is the current exponent of matrix multiplication.111Throughout the paper, hides factors that are polynomial in . However, if we tolerate polynomial dependence in , Ref. [AK07] gave an SDP solver with better complexities in and : , where , are upper bounds on the norm of the optimal primal and dual solutions, respectively.

However, these SDP solvers all use the standard, entry-wise access to matrices . In contrast, a common methodology in algorithm design is to assume a certain natural preprocessed data structure such that the problem can be solved in sublinear time, perhaps even poly-logarithmic time given queries to the preprocessed data structure (e.g., see the examples discussed in Section 1.4). Such methodology is extensively exploited in quantum algorithms, where we are given entry-wise access to matrices in superposition, a fundamental feature in quantum mechanics and the essence of quantum speedups. In particular, quantum algorithms for solving SDPs have been studied in [BS17, AGGW17, BKL17, AG18], and the state-of-the-art quantum SDP solver runs in time [AG18], which is sublinear in and and polynomially faster than the classical counterparts.

Mutually inspired by both quantum and classical SDP solvers, in this paper we study the impact of alternative classical models on solving SDPs. Specifically, we ask:

Can we solve SDP with sublinear time and queries to a reasonable classical data structure?

1.2 Contributions

We give an affirmative answer to the above question under a low-overhead data structure based on sampling.

Definition 1 (Sampling access).

Let be a matrix. We say that we have the sampling access to if we can

1. sample a row index of where the probability of row being chosen is

 ∥M(i,⋅)∥2∥∥M∥∥2F, and
2. for all , sample an index where the probability of being chosen is

 |M(i,j)|2∥M(i,⋅)∥2,

with time and query complexity for each sampling.

A low-overhead data structure that allows for this sampling access is shown in Section 2.1. Our main result is as follows.

Theorem 2 (informal; see Theorems 5 and 9).

Assume . Given the sampling access of in Definition 1, there is an algorithm that gives a succinct description and any entry of an -approximate solution of the SDP in Eqs. 1 to 3 with probability at least ; the algorithm runs in time .

Our result aligns with the studies of sample-based algorithms for solving linear algebraic problems. In particular, [FKV04] gave low-rank approximations of a matrix in poly-logarithmic time with query access to the matrix as in Definition 1. Recently, Tang extended the idea of [FKV04] to give a poly-logarithmic time algorithm for solving recommendation systems [Tan18a]. Subsequently, still under the same sampling assumption, Ref. [Tan18b] gave poly-logarithmic algorithms for principal component analysis and clustering assignments, and Refs. [GLT18, CLW18] gave poly-logarithmic algorithms for solving low-rank linear systems. However, all these sample-based sublinear algorithms directly exploit the sampling approach in [FKV04] (see Section 1.3 for details); to solve SDPs, we derive an augmented sampling toolbox which includes two novel techniques: weighted sampling and symmetric approximation.

As a corollary, our SDP solver can be applied to learning quantum states222A quantum state is a PSD matrix with trace one. efficiently. A particular task of learning quantum states is shadow tomography [Aar18], where we are asked to find a description of an unknown quantum state such that we can approximate up to error for a specific collection of Hermitian matrices where and for all (such is also known as a POVM measurement in quantum computing). Mathematically, shadow tomography can be formulated as the following SDP feasibility problem:

 Find σ such that|Tr[σEi]−Tr[ρEi]| ≤ϵ∀i∈[m]; (4) σ⪰0,  Tr[σ] =1. (5)

Under a quantum model proposed by [BKL17] where are stored as quantum states, the state-of-the-art quantum algorithm [AG18] solves shadow tomography with time where ; in other words, quantum algorithms achieve poly-logarithmic complexity in for low-rank shadow tomography. We observe the same phenomenon under our classical sample-based model:

Corollary 1 (informal; see Corollary 12).

Given the sampling access of as in Definition 1 and real numbers , there is a classical algorithm that gives a succinct description and any entry of an -approximate solution of the shadow tomography problem in Eqs. 5 and 4 with probability at least ; the algorithm runs in time .

1.3 Techniques

Matrix multiplicative weight method (MMW).

We study a normalized SDP feasibility testing problem defined as follows:

Definition 3 (Feasibility of SDP).

Given an , real numbers , and Hermitian matrices where , define as the set of all such that

 Tr[AiX] ≤ai+ϵ∀i∈[m]; (6) X ⪰0; (7) Tr[X] =1. (8)

For -approximate feasibility testing of the SDP, we require that:

• If , output “infeasible”;

• If , output an .333If and , either output is acceptable.

It is a well-known fact that one can use binary search to reduce -approximation of the SDP in Eqs. 1 to 3 to calls of the feasibility problem in Definition 3 with .444For the normalized case , we first guess a candidate value for the objective function, and add that as a constraint to the optimization problem. If this problem is feasible, the optimum is larger than and we accordingly take ; if this problem is infeasible, the optimum is smaller than and we accordingly take ; we proceed similarly for all . As a result, we could solve the optimization problem with precision using calls to the feasibility problem in Definition 3. For renormalization, it suffices to take . Therefore, throughout the paper we focus on solving feasibility testing of SDPs.

To solve the feasibility testing problem in Definition 3, we follow the matrix multiplicative weight (MMW) framework [AHK12]. To be more specific, we follow the approach of regarding MMW as a zero-sum game with two players (see, e.g., [Haz06, Wu10, GW12, LRS15, BKL17]), where the first player wants to provide a feasible , and the second player wants to find any violation of any proposed , i.e., . At the round of the game, if the second player points out a violation for the current proposed solution , the first player proposes a new solution

 Xt+1←exp[−(Aj1+⋯+Ajt)] (9)

(up to normalization); such solution by matrix exponentiation is formally named as a Gibbs state. A feasible solution is actually an equilibrium point of the zero-sum game, which can also be approximated by the MMW method [AHK12]; formal discussions are given in Section 2.2.

Improved sampling tools.

Before we describe our improved sampling tools, let us give a brief review of the techniques introduced by [FKV04]. The basic idea of [FKV04] comes from the fact that a low-rank matrix can be well-approximated by sampling a small number of its rows. More precisely, suppose that is an matrix with rank , where . Because is large, it is preferable to obtain a “description” of without using resources. If we have the sampling access to in the form of Definition 1, we can sample rows from according to statement 1 of Definition 1. Suppose is the submatrix of formed by sampling rows from ; it is shown in [FKV04] that with high probability. As a result, we can record these indices sampled from as a succinct description of . Note that the size of this description, , is independent of . This description of is made useful by utilizing statement 2 of Definition 1 in sample-based calculations. For example, it is possible to efficiently calculate the matrix . Building on the fact that , it is possible to efficiently calculate a description of the two matrices and such that , where is an real positive diagonal matrix, and is an matrix described by the linear combinations of the rows of . corresponds to the singular values of .555If we have the description of the three matrices , , and such that , we have an approximate singular value decomposition. However, the method in [FKV04] gives the description of either or , not both.

To implement the MMW framework, we need an approximate description of the matrix exponentiation

 A:=exp[−τ∑t=1Ajt]

in Eq. 9. We achieve this in two steps. First, we get an approximate description of the spectral decomposition of : , where is an matrix and is an real diagonal matrix. Then, we approximate the matrix exponentiation by .

There are two main technical difficulties that we overcome with new tools while following the above strategy. First, since changes dynamically throughout the MMW method, we cannot assume the sampling access to ; a more reasonable assumption is to have sampling access to each individual constraint matrix , but it is hard to directly sample from by sampling from each individual 666For example, assume we have such that , where is a matrix with small entries. In this case, and mostly cancel out each other and leave . Since can be arbitrarily small compared to and , it is hard to sample from by sampling from and .777Gilyén [Gil19] and Tang [Tan19] pointed out to us that one might be able to sample from by lower-bounding the cancellation and doing a rejection sampling. We did not explore this approach in detail, but this is a possible alternative to weighted sampling. In Section 3, we sidestep this difficulty by devising the weighted sampling procedure which gives a succinct description of a low-rank approximation of by sampling each individual . In other words, We cannot sample according to , but we can still find a succinct description of a low-rank approximation of .

Second, the original sampling procedure of [FKV04] gives instead of a spectral decomposition , even if is Hermitian. For our purpose of matrix exponentiation, singular value decomposition is problematic because the singular values ignore the signs of the eigenvalues; specifically, we get a large error if we approximate by naively exponentiate the singular value decomposition: . Note that this issue of missing negative signs is intrinsic to the tools in [FKV04] because they are built upon the approximation ; Suppose that has the decomposition , where is a diagonal matrix, and and are isometries. Then , cancelling out any phase on . We resolve this issue by a novel approximation procedure, symmetric approximation. Symmetric approximation is based on the result shown by [FKV04]. It then holds that , since is a small matrix of size , we can calculate it explicitly and diagonalize it, getting approximate eigenvalues of A. Together with the description of , we get the desired description of the spectral decomposition of . See Section 4 for more details.888It might be illustrative to describe some of our failed attempts before achieving symmetric approximation. We tried to separate the exponential function into even and odd parts; unfortunately that decomposes into , resulting in large cancellation and unbounded error. We also tried to obtain the eigenvectors of from ; this approach faces multiple difficulties, the most serious one being the “fake degeneracy” as shown by the following example. Suppose . has two distinct eigenvectors. However, can be satisfied by taking together with any unitary . In this case, does not give any information about the eigenvectors.

1.4 Related work

Our work follows the general methodology of leveraging preprocessed data structures; more specifically, we use sample-based data structures to fulfill the MMW framework in our SDP solver. In this subsections, we delve into related works about preprocessed data structures and MMW-based SDP solvers.

Preprocessed data structures.

Preprocessed data structures are ubiquitous in algorithm design, which enable further computation tasks to be completed within sublinear or even poly-logarithmic time. For the task of nearest neighbor search, we are given a set of points in and the goal is to preprocess a data structure such that given any point , it returns a point in that is closest to . In the case , there exists a data structure using space with time for each query [LT77]; more general cases are discussed in the survey paper [AIR18]. A related problem is orthogonal range search, where the goal is to preprocess a data structure such that one can efficiently report the points contained in an axis-aligned query box. When , Ref. [CLP11] preprocessed a data structure with space and only query time; for larger , the query time can be kept with a slight overhead on its space complexity. If preprocessed data structures are not exploited for these problems, we have to check all points in brute-force, inefficient for applications in data analytics, machine learning, computer vision, etc.

This methodology is also widespread in graph problems. It concerns fully dynamic graphs, where we start from an empty graph on fixed vertices and maintain a data structure such that edge insertions and deletions only take sublinear update time for specific graph properties. For instance, the data structure in [AOSS18] maintains the maximal independent set of the graph deterministically in amortized update time ( being the dynamic number of edges). There also exist data structures with sublinear update time for minimum vertex cover size [ORRR12] and all-pairs shortest paths [Tho05, ACK17]; furthermore, data structures with poly-logarithmic update time can be constructed for connectivity, minimum spanning tree, and bipartiteness [HKK99, HLT01]; maximum matching [Sol16, BHN17], graph coloring [BCHN18], etc.

Solving SDPs by the MMW framework.

As introduced in Section 1.1, many SDP solvers use cutting-plane methods or interior-point methods with complexity but larger complexities in and . In contrast, our SDP solver follows the MMW framework, and we briefly summarize such SDP solvers in existing literature. They mainly fall into two categories as follows.

First, MMW is adopted in solvers for positive SDPs, i.e., . In this case, the power of MMW lies in its efficiency of having only iterations and the fact that it admits width-independent solvers whose complexities are independent of and . Ref. [LN93] first gave a width-independent positive LP solver that runs in iterations; [JY11] subsequently generalized this result to give the first width-independent positive SDP solver, but the number of iterations can be as large as . The state-of-the-art positive SDP solver was proposed by [AZLO16] with only iterations.

Second, as far as we know, the vast majority of quantum SDP solvers use the MMW framework. The first quantum SDP solver was proposed by [BS17] with worst-case running time , where is the sparsity of input matrices, i.e., every row or column of has at most nonzero elements. Subsequently, the quantum complexity of solving SDPs was improved by [AGGW17, BKL17], and the state-of-the-art quantum SDP solver runs in time [AG18]. This is optimal in the dependence of and because [BS17] proved a quantum lower bound of for constant , and .

1.5 Open questions

Our paper raises a few natural open questions for future work. For example:

• Can we give faster sample-based algorithms for solving LPs? Note that a recent breakthrough by [CLS18] solves LPs with complexity999Without loss of generality, we can assume for LPs by deleting overcomplete constraints. The result only holds for the current matrix multiplication exponent ; when , the complexity becomes . , significantly faster than the state-of-the-art SDP solver [LSW15] with complexity .

• Can we prove lower bounds on sample-based methods? In particular, a lower bound in the rank can help us understand the limit of our current approach. It is also of interest to prove a lower bound in ; if one can prove a lower bound for sample-based SDP solvers, then the answer to the first open question becomes negative and there must be a trade-off between and .

• How is the empirical performance of our sample-based method? Admittedly, the exponents of our poly-logarithmic factors are large; nevertheless, it is common that numerical experiments perform better than theoretical guarantees, and we wonder if this phenomenon can be observed when applying our method.

Notations.

Throughout the paper, we denote by and the number of constraints and the dimension of constraint matrices in SDPs, respectively. We denote by the upper bounds on the norm of the optimal primal and dual solutions of the SDP in Eqs. 3, 2 and 1 respectively, and denote by the precision of its solution. We use to denote the precision of the solution of the SDP feasiblity problem in Definition 3; (see Footnote 4).

We let denote the set . For a vector , we use to denote the probability distribution on where the probability of being chosen is for all . When it is clear from the context, a sample from is often referred to as a sample from . For a matrix , we use and to denote its spectral norm and Frobenius norm, respectively; we use and to denote the row and column of , respectively. We use to denote the -dimensional vector formed by the norms of its row vectors, i.e., , for all .

Organization.

The rest of the paper is organized as follows. We formulate the sample-based data structure and the SDP feasibility problem in Section 2. Our two techniques, weighted sampling and symmetric approximation, are presented in Section 3 and Section 4, respectively. Subsequently, we apply these techniques to estimate traces with respect to Gibbs states in Section 5. Our main results on solving SDP with the application to shadow tomography are proven in Section 6.

2 Preliminaries

2.1 Sample-based data structure

As we develop sublinear-time algorithms for solving SDP in this paper, the whole constraint matrices cannot be loaded into memory since storing them requires at least linear space and time. Instead, we assume the sampling access of each constraint matrix as defined in Definition 1. This sampling access relies on a natural probability distribution that arises in many machine learning applications [CLW18, GLT18, KP17, KP18, Tan18a, Tan18b].

Technically, Ref. [FKV04] used this sampling access to develop sublinear algorithms for low-rank matrix approximation. It is well-known (as pointed out by [KP17] and also used in [CLW18, GLT18, KP18, Tan18a, Tan18b]) that there exist low-overhead preprocessed data structures that allow for the sampling access. More precisely, the existence of the data structures for the sampling access defined in Definition 1 is stated as follows.

Theorem 4 ([Kp17]).

Given a matrix with non-zero entries, there exists a data structure storing in space , which supports the following operators:

1. Reading and writing in time.

2. Evaluating in time.

3. Evaluating in time.

4. Sampling a row index of according to statement 1 of Definition 1 in time.

5. For each row, sampling an index according to statement 2 of Definition 1 in time.

Readers may refer to [KP17, Theorem A.1] for the proof of Theorem 4. In the following, we give the intuition of the data structure, which is demonstrated in Fig. 1. We show how to sample from a row vector: we use a binary tree to store the date of each row. The square of the absolute value of each entry, along with its original value are store in the leaf nodes. Each internode contains the sum of the values of its two immediate children. It is easy to see that the root node contains the square of the norm of this row vector. To sample an index and to query an entry from this row, logarithmic steps suffice. To fulfill statement 1 of Definition 1, we treat the norms of rows as a vector and organize the data of this vector in a binary tree.

2.2 Feasibility testing of SDPs

We adopt the MMW framework to solve SDPs under the zero-sum approach [Haz06, Wu10, GW12, LRS15, BKL17]. This is formulated as the following theorem:

Theorem 5 (Master algorithm).

Feasibility of the SDP in Eqs. 6 to 8 can be tested by Algorithm 1.

This theorem is proved in [BKL17, Theorem 2.3]; note that the weight matrix therein is where , but this gives the same Gibbs state as in Algorithm 1 since for any Hermitian matrix and real number ,

 (10)

It should also be understood that this master algorithm is not the final algorithm; the step of trace estimation with respect to the Gibbs state (Algorithm 1) will be fulfilled by our sample-based approach.

3 Weighted sampling

The objective of this section is to provide a method for sampling a small submatrix of of the form where the sampling access of each is given. Note that the standard FKV sampling method [FKV04] is not capable of this task, as the sampling access of each does not trivially imply the sampling access of . In the following, we propose the weighted sampling method. The intuition is assigning each a different weight when computing the probability distribution, and then sampling a row/column index of according to this probability distribution.

We first give the method for sampling row indices of as in LABEL:proc:samp_row. The objective of this procedure is to sample a submatrix such that .

\@float

algocf[htbp]     \end@float

After applying LABEL:proc:samp_row, we obtain the row indices . Let be matrices such that for all and . Define the matrix as

 S=S1+⋯+Sτ. (11)

Next, we give the method for sampling column indices of as in LABEL:proc:samp_col: we need to sample a submatrix from such that .

\@float

algocf[htbp]     \end@float

Now, we obtained column indices . Let be matrices such that for all and , where for . Define the matrix as

 W=W1+⋯+Wτ. (12)

Before showing and , we first prove the following general result.

Lemma 2.

Let be a matrices. Independently sample rows indices from according to the probability distribution where

 Pi≥∑τj=1Drows(Mj)(i)∥∥Mj∥∥2F(τ+1)∑τℓ=1∥∥Mℓ∥∥2F. (13)

Let be matrices with

 Nℓ(it,⋅)=Mℓ(it,⋅)√pPit, (14)

for and . Define . Then for all , it holds that

 Pr(∥∥M†M−N†N∥∥F≥θτ∑ℓ=1∥∥Mℓ∥∥2F)≤(τ+1)2θ2p. (15)
Proof.

We first show that the expected value of each entry of is the corresponding entry of as follows.

 E(N†(i,⋅)N(⋅,j)) =p∑t=1E(N∗(t,i)N(t,j)) (16) =p∑t=1n∑k=1PkM∗(k,i)M(k,j)pPk (17) =M†(i,⋅)M(⋅,j). (18)

Furthermore, we have

 E(|N†(i,⋅)N(⋅,j)−M†(i,⋅)M(⋅,j)|2) ≤p∑t=1E((N∗(t,i)N(t,j))2) (19) =p∑t=1n∑k=1Pk(M∗(k,i))2(M(k,j))2p2P2k (20) ≤(τ+1)∑τℓ=1∥∥Mℓ∥∥2Fpn∑k=1(M∗(k,i))2(M(k,j))2∑τℓ′=1Drows(Mℓ′)(k)∥∥Mℓ′∥∥2F (21) =(τ+1)∑τℓ=1∥∥Mℓ∥∥2Fpn∑k=1(M∗(k,i))2(M(k,j))2∑τℓ′=1∥Mℓ′(k,⋅)∥2 (22)

Now, we bound the expected distance between and :

 E(∥∥M†M−N†N∥∥2F) =n∑i,j=1E(|N†(i,⋅)N(⋅,j)−M†(i,⋅)M(⋅,j)|2) (23) ≤(τ+1)∑τℓ=1∥∥Mℓ∥∥2Fpn∑k=1∑ni,j=1(M∗(k,i))2(M(k,j))2∑τℓ′=1∥Mℓ′(k,⋅)∥2 (24) =(τ+1)∑τℓ=1∥∥Mℓ∥∥2Fpn∑k=1∥M(k,⋅)∥4∑τℓ′=1∥Mℓ′(k,⋅)∥2 (25) =(τ+1)∑τℓ=1∥∥Mℓ∥∥2Fpn∑k=1∥M(k,⋅)∥2(τ∑τℓ′=1∥Mℓ′(k,⋅)∥2)∑τℓ′′=1∥Mℓ′′(k,⋅)∥2 (26) =τ(τ+1)∑τℓ=1∥∥Mℓ∥∥2Fp∥∥M∥∥2F (27) ≤τ(τ+1)(∑τℓ=1∥∥Mℓ∥∥2F)2p (28) ≤(τ+1)2(∑τℓ=1∥∥Mℓ∥∥2F)2p. (29)

Consequently, the result of this lemma follows from Markov’s inequality. ∎

The following technical claim will be used multiple times in this paper. It relates the three quantities: , , and :

Claim 3.

Let be a matrix with the sampling access for each as in Definition 1. Let and be defined by Eqs. 12 and 11. Then, with probability at least it holds that

 1τ+1τ∑ℓ=1∥∥Aℓ∥∥2F≤τ∑ℓ=1∥∥Sℓ∥∥2F≤2τ+1τ+1τ∑ℓ=1∥∥Aℓ∥∥2F, (30)

and

 1τ+1τ∑ℓ=1∥∥Sℓ∥∥2F≤τ∑ℓ=1∥∥Wℓ∥∥2F≤2τ+1τ+1τ∑ℓ=1∥∥Sℓ∥∥2F, (31)
Proof.

We first evaluate as follows. For all ,

 E(∥∥Sℓ∥∥2F)=p∑i=1E(∥Sℓ(i,⋅)∥2)=p∑i=1n∑j=1Pj∥Aℓ(j,⋅)∥2pPj=n∑j=1∥Aℓ(j,⋅)∥2=∥∥Aℓ∥∥2F. (32)

Then we have

 ∥Sℓ(i,⋅)∥2=n∑j=1|Aℓ(i,j)|2pPi ≤n∑j=12|Aℓ(i,j)|2∑τℓ=1∥∥Aℓ∥∥2Fp∑τj=1∥∥Aj(i,⋅)∥∥2 (33) =2∥Aℓ(i,⋅)∥2∑τℓ=1∥∥Aℓ∥∥2Fp∑τj=1∥∥Aj(i,⋅)∥∥2≤2∑τℓ=1∥∥Aℓ∥∥2Fp. (34)

Note that the quantity can be viewed as a sum of independent random variables . As a result,

 Var(∥∥Sℓ∥∥2F)=pVar(∥Sℓ(i,⋅)∥2) ≤pE(∥Sℓ(i,⋅)∥4) (35) (36)

According to Chebyshev’s inequality, we have

 Pr⎛⎝∣∣∥∥Sℓ∥∥2F−∥∥Aℓ∥∥2F∣∣≥∑τℓ=1∥∥Aℓ∥∥2Fτ⎞⎠≤2(∑τℓ=1∥∥Aℓ∥∥2F)2p=2τ2p. (37)

Therefore, with probability at least , it holds that

 −1τ+1∑j≠ℓ∥∥Aj∥∥2F+ττ+1∥∥Aℓ∥∥2F≤∥∥Sℓ∥∥2F≤1τ+1∑j≠ℓ∥∥Aj∥∥2F+τ+2τ+1∥∥Aℓ∥∥2F, (38)

which implies that

 1τ+1τ∑ℓ=1∥∥Aℓ∥∥2F≤n∑ℓ=1∥∥Sℓ∥∥2F≤2τ+1τ+1τ∑ℓ=1∥∥Aℓ∥∥2F. (39)

Eq. 31 can be proven in a similar way. ∎

Now, the main result of the weighted sampling method, namely, and is a consequence of Lemma 2:

Corollary 4.

Let be a matrix with the sampling access for each as in Definition 1. Let and be defined by Eqs. 12 and 11. Letting , then, with probability at least , the following holds:

 ∥∥A†A−S†S∥∥F ≤θτ∑ℓ=1∥∥Aℓ∥∥2F, and (40) ∥∥SS†−WW†∥∥F ≤θτ∑ℓ=1∥∥Sℓ∥∥2F≤2θτ∑ℓ=1∥∥Aℓ∥∥2F. (41)
Proof.

First note that Eq. 40 follows from Lemma 2. For the second statement, we need the probability to satisfy Eq. 13 in Lemma 2. In fact,

 P′j=p∑t=1Qj|itp =1pp∑t=1∑τk=1DAk(it,⋅)(j)∥Ak(it,⋅)∥2∑τℓ=1∥Aℓ(it,⋅)∥2 (42) =1pp∑t=1∑τk=1|Ak(it,j)|2∑τℓ=1∥Aℓ(it,⋅)∥2 (43) =1pp∑t=1pPit∑τk=1|Sk(it,j)|2∑τℓ=1∥Aℓ(it,⋅)∥2 (44) =p∑t=1∑τj=1∥∥Aj(it,⋅)∥∥2∑τℓ=1∥∥Aℓ∥∥2F∑τk=1|Sk(it,j)|2∑τℓ=1∥Aℓ(it,⋅)∥2 (45) =p∑t=1∑τk=1|Sk(it,j)|2∑τℓ=1∥∥Aℓ∥∥2F (46) =∑τk=1∥Sk(⋅,j)∥2∑τℓ=1∥∥Aℓ∥∥2F (47) ≥∑τk=1∥Sk(⋅,j)∥2(τ+1)∑τℓ=1∥∥Sℓ∥∥2F, (48)

where the last inequality follows from Claim 3. Note that the probability satisfies Eq. 13; as a result of Lemma 2, Eq. 41 holds. ∎

With the weighted sampling method, we obtained a small submatrix from . Now, we use the singular values and singular vectors of to approximate the ones of . This is shown in Algorithm 2.