Semidefinite programming (SDP) is a central topic in mathematical optimization with extensive studies on its efficient solvers. Recently, quantum algorithms with superpolynomial speedups for solving SDPs have been proposed assuming access to its constraint matrices in quantum superposition. Mutually inspired by both classical and quantum SDP solvers, in this paper we present a sublinear classical algorithm for solving lowrank SDPs which is asymptotically as good as existing quantum algorithms. Specifically, given an SDP with constraint matrices, each of dimension and rank , our algorithm gives a succinct description and any entry of the solution matrix in time given access to a samplebased lowoverhead data structure of the constraint matrices, where is the precision of the solution. In addition, we apply our algorithm to a quantum state learning task as an application.
Technically, our approach aligns with both the SDP solvers based on the matrix multiplicative weight (MMW) framework and the recent studies of quantuminspired machine learning algorithms. The cost of solving SDPs by MMW mainly comes from the exponentiation of Hermitian matrices, and we propose two new technical ingredients (compared to previous samplebased algorithms) for this task that may be of independent interest:

Weighted sampling: assuming sampling access to each individual constraint matrix , we propose a procedure that gives a good approximation of .

Symmetric approximation: we propose a sampling procedure that gives lowrank spectral decomposition of a Hermitian matrix . This improves upon previous sampling procedures that only give lowrank singular value decompositions, losing the signs of eigenvalues.
1 Introduction
1.1 Motivations
Semidefinite programming (SDP) is a central topic in the studies of mathematical optimization and theoretical computer science, with a wide range of applications including algorithm design, machine learning, operations research, etc. The importance of SDP mutually comes from its generality that contains the betterknown linear programming (LP) and the fact that it admits polynomialtime solvers. Mathematically, an SDP is defined as follows:
(1)  
s.t.  (2)  
(3) 
where is the number of constraints in the SDP, are Hermitian matrices, and ; Eq. 3 restricts the variable matrix to be positive semidefinite (PSD), i.e., is an Hermitian matrix with nonnegative eigenvalues. An approximate solution of this SDP is an that satisfies Eqs. 3 and 2 while the maximum in Eq. 1 is smaller than .
There is rich literature on solving SDPs. Ellipsoid methods gave the first polynomialtime SDP solvers [Kha80, GLS81]. Then, the complexities of the SDP solvers had been subsequently improved by the interiorpoint method [NN92] and the cuttingplane method [Ans00, Mit03]; see also the survey paper [VB96]. Currently, the stateoftheart SDP solver [LSW15] improved the previous cuttingplane methods with running time , where is the current exponent of matrix multiplication.^{1}^{1}1Throughout the paper, hides factors that are polynomial in . However, if we tolerate polynomial dependence in , Ref. [AK07] gave an SDP solver with better complexities in and : , where , are upper bounds on the norm of the optimal primal and dual solutions, respectively.
However, these SDP solvers all use the standard, entrywise access to matrices . In contrast, a common methodology in algorithm design is to assume a certain natural preprocessed data structure such that the problem can be solved in sublinear time, perhaps even polylogarithmic time given queries to the preprocessed data structure (e.g., see the examples discussed in Section 1.4). Such methodology is extensively exploited in quantum algorithms, where we are given entrywise access to matrices in superposition, a fundamental feature in quantum mechanics and the essence of quantum speedups. In particular, quantum algorithms for solving SDPs have been studied in [BS17, AGGW17, BKL17, AG18], and the stateoftheart quantum SDP solver runs in time [AG18], which is sublinear in and and polynomially faster than the classical counterparts.
Mutually inspired by both quantum and classical SDP solvers, in this paper we study the impact of alternative classical models on solving SDPs. Specifically, we ask:
Can we solve SDP with sublinear time and queries to a reasonable classical data structure?
1.2 Contributions
We give an affirmative answer to the above question under a lowoverhead data structure based on sampling.
Definition 1 (Sampling access).
Let be a matrix. We say that we have the sampling access to if we can

sample a row index of where the probability of row being chosen is

for all , sample an index where the probability of being chosen is
with time and query complexity for each sampling.
A lowoverhead data structure that allows for this sampling access is shown in Section 2.1. Our main result is as follows.
Theorem 2 (informal; see Theorems 5 and 9).
Assume . Given the sampling access of in Definition 1, there is an algorithm that gives a succinct description and any entry of an approximate solution of the SDP in Eqs. 1 to 3 with probability at least ; the algorithm runs in time .
Our result aligns with the studies of samplebased algorithms for solving linear algebraic problems. In particular, [FKV04] gave lowrank approximations of a matrix in polylogarithmic time with query access to the matrix as in Definition 1. Recently, Tang extended the idea of [FKV04] to give a polylogarithmic time algorithm for solving recommendation systems [Tan18a]. Subsequently, still under the same sampling assumption, Ref. [Tan18b] gave polylogarithmic algorithms for principal component analysis and clustering assignments, and Refs. [GLT18, CLW18] gave polylogarithmic algorithms for solving lowrank linear systems. However, all these samplebased sublinear algorithms directly exploit the sampling approach in [FKV04] (see Section 1.3 for details); to solve SDPs, we derive an augmented sampling toolbox which includes two novel techniques: weighted sampling and symmetric approximation.
As a corollary, our SDP solver can be applied to learning quantum states^{2}^{2}2A quantum state is a PSD matrix with trace one. efficiently. A particular task of learning quantum states is shadow tomography [Aar18], where we are asked to find a description of an unknown quantum state such that we can approximate up to error for a specific collection of Hermitian matrices where and for all (such is also known as a POVM measurement in quantum computing). Mathematically, shadow tomography can be formulated as the following SDP feasibility problem:
(4)  
(5) 
Under a quantum model proposed by [BKL17] where are stored as quantum states, the stateoftheart quantum algorithm [AG18] solves shadow tomography with time where ; in other words, quantum algorithms achieve polylogarithmic complexity in for lowrank shadow tomography. We observe the same phenomenon under our classical samplebased model:
Corollary 1 (informal; see Corollary 12).
Given the sampling access of as in Definition 1 and real numbers , there is a classical algorithm that gives a succinct description and any entry of an approximate solution of the shadow tomography problem in Eqs. 5 and 4 with probability at least ; the algorithm runs in time .
1.3 Techniques
Matrix multiplicative weight method (MMW).
We study a normalized SDP feasibility testing problem defined as follows:
Definition 3 (Feasibility of SDP).
Given an , real numbers , and Hermitian matrices where , define as the set of all such that
(6)  
(7)  
(8) 
For approximate feasibility testing of the SDP, we require that:

If , output “infeasible”;

If , output an .^{3}^{3}3If and , either output is acceptable.
It is a wellknown fact that one can use binary search to reduce approximation of the SDP in Eqs. 1 to 3 to calls of the feasibility problem in Definition 3 with .^{4}^{4}4For the normalized case , we first guess a candidate value for the objective function, and add that as a constraint to the optimization problem. If this problem is feasible, the optimum is larger than and we accordingly take ; if this problem is infeasible, the optimum is smaller than and we accordingly take ; we proceed similarly for all . As a result, we could solve the optimization problem with precision using calls to the feasibility problem in Definition 3. For renormalization, it suffices to take . Therefore, throughout the paper we focus on solving feasibility testing of SDPs.
To solve the feasibility testing problem in Definition 3, we follow the matrix multiplicative weight (MMW) framework [AHK12]. To be more specific, we follow the approach of regarding MMW as a zerosum game with two players (see, e.g., [Haz06, Wu10, GW12, LRS15, BKL17]), where the first player wants to provide a feasible , and the second player wants to find any violation of any proposed , i.e., . At the round of the game, if the second player points out a violation for the current proposed solution , the first player proposes a new solution
(9) 
(up to normalization); such solution by matrix exponentiation is formally named as a Gibbs state. A feasible solution is actually an equilibrium point of the zerosum game, which can also be approximated by the MMW method [AHK12]; formal discussions are given in Section 2.2.
Improved sampling tools.
Before we describe our improved sampling tools, let us give a brief review of the techniques introduced by [FKV04]. The basic idea of [FKV04] comes from the fact that a lowrank matrix can be wellapproximated by sampling a small number of its rows. More precisely, suppose that is an matrix with rank , where . Because is large, it is preferable to obtain a “description” of without using resources. If we have the sampling access to in the form of Definition 1, we can sample rows from according to statement 1 of Definition 1. Suppose is the submatrix of formed by sampling rows from ; it is shown in [FKV04] that with high probability. As a result, we can record these indices sampled from as a succinct description of . Note that the size of this description, , is independent of . This description of is made useful by utilizing statement 2 of Definition 1 in samplebased calculations. For example, it is possible to efficiently calculate the matrix . Building on the fact that , it is possible to efficiently calculate a description of the two matrices and such that , where is an real positive diagonal matrix, and is an matrix described by the linear combinations of the rows of . corresponds to the singular values of .^{5}^{5}5If we have the description of the three matrices , , and such that , we have an approximate singular value decomposition. However, the method in [FKV04] gives the description of either or , not both.
To implement the MMW framework, we need an approximate description of the matrix exponentiation
in Eq. 9. We achieve this in two steps. First, we get an approximate description of the spectral decomposition of : , where is an matrix and is an real diagonal matrix. Then, we approximate the matrix exponentiation by .
There are two main technical difficulties that we overcome with new tools while following the above strategy. First, since changes dynamically throughout the MMW method, we cannot assume the sampling access to ; a more reasonable assumption is to have sampling access to each individual constraint matrix , but it is hard to directly sample from by sampling from each individual ^{6}^{6}6For example, assume we have such that , where is a matrix with small entries. In this case, and mostly cancel out each other and leave . Since can be arbitrarily small compared to and , it is hard to sample from by sampling from and .^{7}^{7}7Gilyén [Gil19] and Tang [Tan19] pointed out to us that one might be able to sample from by lowerbounding the cancellation and doing a rejection sampling. We did not explore this approach in detail, but this is a possible alternative to weighted sampling. In Section 3, we sidestep this difficulty by devising the weighted sampling procedure which gives a succinct description of a lowrank approximation of by sampling each individual . In other words, We cannot sample according to , but we can still find a succinct description of a lowrank approximation of .
Second, the original sampling procedure of [FKV04] gives instead of a spectral decomposition , even if is Hermitian. For our purpose of matrix exponentiation, singular value decomposition is problematic because the singular values ignore the signs of the eigenvalues; specifically, we get a large error if we approximate by naively exponentiate the singular value decomposition: . Note that this issue of missing negative signs is intrinsic to the tools in [FKV04] because they are built upon the approximation ; Suppose that has the decomposition , where is a diagonal matrix, and and are isometries. Then , cancelling out any phase on . We resolve this issue by a novel approximation procedure, symmetric approximation. Symmetric approximation is based on the result shown by [FKV04]. It then holds that , since is a small matrix of size , we can calculate it explicitly and diagonalize it, getting approximate eigenvalues of A. Together with the description of , we get the desired description of the spectral decomposition of . See Section 4 for more details.^{8}^{8}8It might be illustrative to describe some of our failed attempts before achieving symmetric approximation. We tried to separate the exponential function into even and odd parts; unfortunately that decomposes into , resulting in large cancellation and unbounded error. We also tried to obtain the eigenvectors of from ; this approach faces multiple difficulties, the most serious one being the “fake degeneracy” as shown by the following example. Suppose . has two distinct eigenvectors. However, can be satisfied by taking together with any unitary . In this case, does not give any information about the eigenvectors.
1.4 Related work
Our work follows the general methodology of leveraging preprocessed data structures; more specifically, we use samplebased data structures to fulfill the MMW framework in our SDP solver. In this subsections, we delve into related works about preprocessed data structures and MMWbased SDP solvers.
Preprocessed data structures.
Preprocessed data structures are ubiquitous in algorithm design, which enable further computation tasks to be completed within sublinear or even polylogarithmic time. For the task of nearest neighbor search, we are given a set of points in and the goal is to preprocess a data structure such that given any point , it returns a point in that is closest to . In the case , there exists a data structure using space with time for each query [LT77]; more general cases are discussed in the survey paper [AIR18]. A related problem is orthogonal range search, where the goal is to preprocess a data structure such that one can efficiently report the points contained in an axisaligned query box. When , Ref. [CLP11] preprocessed a data structure with space and only query time; for larger , the query time can be kept with a slight overhead on its space complexity. If preprocessed data structures are not exploited for these problems, we have to check all points in bruteforce, inefficient for applications in data analytics, machine learning, computer vision, etc.
This methodology is also widespread in graph problems. It concerns fully dynamic graphs, where we start from an empty graph on fixed vertices and maintain a data structure such that edge insertions and deletions only take sublinear update time for specific graph properties. For instance, the data structure in [AOSS18] maintains the maximal independent set of the graph deterministically in amortized update time ( being the dynamic number of edges). There also exist data structures with sublinear update time for minimum vertex cover size [ORRR12] and allpairs shortest paths [Tho05, ACK17]; furthermore, data structures with polylogarithmic update time can be constructed for connectivity, minimum spanning tree, and bipartiteness [HKK99, HLT01]; maximum matching [Sol16, BHN17], graph coloring [BCHN18], etc.
Solving SDPs by the MMW framework.
As introduced in Section 1.1, many SDP solvers use cuttingplane methods or interiorpoint methods with complexity but larger complexities in and . In contrast, our SDP solver follows the MMW framework, and we briefly summarize such SDP solvers in existing literature. They mainly fall into two categories as follows.
First, MMW is adopted in solvers for positive SDPs, i.e., . In this case, the power of MMW lies in its efficiency of having only iterations and the fact that it admits widthindependent solvers whose complexities are independent of and . Ref. [LN93] first gave a widthindependent positive LP solver that runs in iterations; [JY11] subsequently generalized this result to give the first widthindependent positive SDP solver, but the number of iterations can be as large as . The stateoftheart positive SDP solver was proposed by [AZLO16] with only iterations.
Second, as far as we know, the vast majority of quantum SDP solvers use the MMW framework. The first quantum SDP solver was proposed by [BS17] with worstcase running time , where is the sparsity of input matrices, i.e., every row or column of has at most nonzero elements. Subsequently, the quantum complexity of solving SDPs was improved by [AGGW17, BKL17], and the stateoftheart quantum SDP solver runs in time [AG18]. This is optimal in the dependence of and because [BS17] proved a quantum lower bound of for constant , and .
1.5 Open questions
Our paper raises a few natural open questions for future work. For example:

Can we give faster samplebased algorithms for solving LPs? Note that a recent breakthrough by [CLS18] solves LPs with complexity^{9}^{9}9Without loss of generality, we can assume for LPs by deleting overcomplete constraints. The result only holds for the current matrix multiplication exponent ; when , the complexity becomes . , significantly faster than the stateoftheart SDP solver [LSW15] with complexity .

Can we prove lower bounds on samplebased methods? In particular, a lower bound in the rank can help us understand the limit of our current approach. It is also of interest to prove a lower bound in ; if one can prove a lower bound for samplebased SDP solvers, then the answer to the first open question becomes negative and there must be a tradeoff between and .

How is the empirical performance of our samplebased method? Admittedly, the exponents of our polylogarithmic factors are large; nevertheless, it is common that numerical experiments perform better than theoretical guarantees, and we wonder if this phenomenon can be observed when applying our method.
Notations.
Throughout the paper, we denote by and the number of constraints and the dimension of constraint matrices in SDPs, respectively. We denote by the upper bounds on the norm of the optimal primal and dual solutions of the SDP in Eqs. 3, 2 and 1 respectively, and denote by the precision of its solution. We use to denote the precision of the solution of the SDP feasiblity problem in Definition 3; (see Footnote 4).
We let denote the set . For a vector , we use to denote the probability distribution on where the probability of being chosen is for all . When it is clear from the context, a sample from is often referred to as a sample from . For a matrix , we use and to denote its spectral norm and Frobenius norm, respectively; we use and to denote the row and column of , respectively. We use to denote the dimensional vector formed by the norms of its row vectors, i.e., , for all .
Organization.
The rest of the paper is organized as follows. We formulate the samplebased data structure and the SDP feasibility problem in Section 2. Our two techniques, weighted sampling and symmetric approximation, are presented in Section 3 and Section 4, respectively. Subsequently, we apply these techniques to estimate traces with respect to Gibbs states in Section 5. Our main results on solving SDP with the application to shadow tomography are proven in Section 6.
2 Preliminaries
2.1 Samplebased data structure
As we develop sublineartime algorithms for solving SDP in this paper, the whole constraint matrices cannot be loaded into memory since storing them requires at least linear space and time. Instead, we assume the sampling access of each constraint matrix as defined in Definition 1. This sampling access relies on a natural probability distribution that arises in many machine learning applications [CLW18, GLT18, KP17, KP18, Tan18a, Tan18b].
Technically, Ref. [FKV04] used this sampling access to develop sublinear algorithms for lowrank matrix approximation. It is wellknown (as pointed out by [KP17] and also used in [CLW18, GLT18, KP18, Tan18a, Tan18b]) that there exist lowoverhead preprocessed data structures that allow for the sampling access. More precisely, the existence of the data structures for the sampling access defined in Definition 1 is stated as follows.
Theorem 4 ([Kp17]).
Given a matrix with nonzero entries, there exists a data structure storing in space , which supports the following operators:

Reading and writing in time.

Evaluating in time.

Evaluating in time.

Sampling a row index of according to statement 1 of Definition 1 in time.

For each row, sampling an index according to statement 2 of Definition 1 in time.
Readers may refer to [KP17, Theorem A.1] for the proof of Theorem 4. In the following, we give the intuition of the data structure, which is demonstrated in Fig. 1. We show how to sample from a row vector: we use a binary tree to store the date of each row. The square of the absolute value of each entry, along with its original value are store in the leaf nodes. Each internode contains the sum of the values of its two immediate children. It is easy to see that the root node contains the square of the norm of this row vector. To sample an index and to query an entry from this row, logarithmic steps suffice. To fulfill statement 1 of Definition 1, we treat the norms of rows as a vector and organize the data of this vector in a binary tree.
2.2 Feasibility testing of SDPs
We adopt the MMW framework to solve SDPs under the zerosum approach [Haz06, Wu10, GW12, LRS15, BKL17]. This is formulated as the following theorem:
Theorem 5 (Master algorithm).
Feasibility of the SDP in Eqs. 6 to 8 can be tested by Algorithm 1.
This theorem is proved in [BKL17, Theorem 2.3]; note that the weight matrix therein is where , but this gives the same Gibbs state as in Algorithm 1 since for any Hermitian matrix and real number ,
(10) 
It should also be understood that this master algorithm is not the final algorithm; the step of trace estimation with respect to the Gibbs state (Algorithm 1) will be fulfilled by our samplebased approach.
3 Weighted sampling
The objective of this section is to provide a method for sampling a small submatrix of of the form where the sampling access of each is given. Note that the standard FKV sampling method [FKV04] is not capable of this task, as the sampling access of each does not trivially imply the sampling access of . In the following, we propose the weighted sampling method. The intuition is assigning each a different weight when computing the probability distribution, and then sampling a row/column index of according to this probability distribution.
We first give the method for sampling row indices of as in LABEL:proc:samp_row. The objective of this procedure is to sample a submatrix such that .
algocf[htbp] \end@float
After applying LABEL:proc:samp_row, we obtain the row indices . Let be matrices such that for all and . Define the matrix as
(11) 
Next, we give the method for sampling column indices of as in LABEL:proc:samp_col: we need to sample a submatrix from such that .
algocf[htbp] \end@float
Now, we obtained column indices . Let be matrices such that for all and , where for . Define the matrix as
(12) 
Before showing and , we first prove the following general result.
Lemma 2.
Let be a matrices. Independently sample rows indices from according to the probability distribution where
(13) 
Let be matrices with
(14) 
for and . Define . Then for all , it holds that
(15) 
Proof.
We first show that the expected value of each entry of is the corresponding entry of as follows.
(16)  
(17)  
(18) 
Furthermore, we have
(19)  
(20)  
(21)  
(22) 
Now, we bound the expected distance between and :
(23)  
(24)  
(25)  
(26)  
(27)  
(28)  
(29) 
Consequently, the result of this lemma follows from Markov’s inequality. ∎
The following technical claim will be used multiple times in this paper. It relates the three quantities: , , and :
Claim 3.
Let be a matrix with the sampling access for each as in Definition 1. Let and be defined by Eqs. 12 and 11. Then, with probability at least it holds that
(30) 
and
(31) 
Proof.
We first evaluate as follows. For all ,
(32) 
Then we have
(33)  
(34) 
Note that the quantity can be viewed as a sum of independent random variables . As a result,
(35)  
(36) 
According to Chebyshev’s inequality, we have
(37) 
Therefore, with probability at least , it holds that
(38) 
which implies that
(39) 
Eq. 31 can be proven in a similar way. ∎
Now, the main result of the weighted sampling method, namely, and is a consequence of Lemma 2:
Corollary 4.
Let be a matrix with the sampling access for each as in Definition 1. Let and be defined by Eqs. 12 and 11. Letting , then, with probability at least , the following holds:
(40)  
(41) 
Proof.
First note that Eq. 40 follows from Lemma 2. For the second statement, we need the probability to satisfy Eq. 13 in Lemma 2. In fact,
(42)  
(43)  
(44)  
(45)  
(46)  
(47)  
(48) 
where the last inequality follows from Claim 3. Note that the probability satisfies Eq. 13; as a result of Lemma 2, Eq. 41 holds. ∎
With the weighted sampling method, we obtained a small submatrix from . Now, we use the singular values and singular vectors of to approximate the ones of . This is shown in Algorithm 2.