Approximations of Schatten Norms via Taylor Expansions
In this paper we consider symmetric, positive semidefinite (SPSD) matrix and present two algorithms for computing the -Schatten norm . The first algorithm works for any SPSD matrix . The second algorithm works for non-singular SPSD matrices and runs in time that depends on , where is the -th eigenvalue of . Our methods are simple and easy to implement and can be extended to general matrices. Our algorithms improve, for a range of parameters, recent results of Musco, Netrapalli, Sidford, Ubaru and Woodruff (ITCS 2018.) and match the running time of the methods by Han, Malioutov, Avron, and Shin (SISC 2017) while avoiding computations of coefficients of Chebyshev polynomials.
In many applications of data science and machine learning data is represented by large matrices. Fast and accurate analysis of such matrices is a challenging task that is of paramount importance for the aforementioned applications. Randomized numerical algebra (RNA) is an popular area of research that often provides such fast and accurate algorithmic methods for massive matrix computations. Many critical problems in RNA boil down to approximating spectral functions and one of the most fundamental examples of such spectral functions is Schatten norm. The -th Schatten norm for matrix is defined as the norm of a vector comprised of singular values of matrix , i.e.,
where is the -th singular value of .
1.1 Our Results and Related Work
In this paper we consider symmetric, positive semidefinite (SPSD) matrix and present two algorithms for computing . The first algorithm in Section 3 works for any SPSD matrix . The second algorithm in Section 4 works for non-singular SPSD matrices and runs in time that depends on , where is the -th eigenvalue of . The table below summarizes our results. Our methods are simple and easy to implement and can be extended to general matrices. Indeed, to compute for matrix , one can apply our methods to SPSD matrix and note that . It is well known that for SPSD matrix , and thus, our algorithms provide multiplicative approximations for .
Musco, Netrapalli, Sidford, Ubaru and Woodruff  (see also the full version in ) provided a general approach for approximating spectral functions that works for general matrices. Table summarizes some of their results for Schatten norms for general matrices. Our result in Theorem 3.1 improves the bounds in [5, 4] for a range of parameters, for example, when and or when and .
In , Han, Malioutov, Avron, and Shin proposed to use general Chebyshev polynomials to approximate a wide class of functions and Schatten norms in particular. The methods in  work for invertible matrices and the running time depends on . Table summarizes their results for Schatten norms. Under the assumptions from  the running time in Remark 4.2 matches the running time of the algorithm in  for constant . In addition, our methods do not need to compute or store the coefficients of Chebyshev polynomials.
A straightforward computation of Chebyshev coefficients for Schatten norm  requires time where
Our work has been inspired by the approach of Boutsidis, Drineas, Kambadur, Kontopoulou, and Zouzias  that uses Taylor expansion for to approximate the log determinant. The coefficients of this expansion have the same sign and thus the Hutchinson estimator  can be applied to each partial sum of this expansion. This is not the case for the Taylor expansion of . The key idea of our approach is to find such a constant minus the partial sum of the first terms is positive. Thus, we can apply the Hutchinson estimator to the corresponding matrix polynomial.
In Section 1.3 we introduce necessary notations. Section 2 provides a Hutchinson estimator for a special matrix polynomial that will be used in our algorithms. Section 3 describes an algorithm for approximating that does not require knowledge of . Section 4 describes an algorithm for approximating that depends on . Section 5 contains necessary technical claims.
We use the following symbols: be the integer part of ,
We use to denote the natural logarithm and dedicate lower case Latin letters for constants and real variables and upper case Latin letters for matrices. Consider the Taylor expansion
It follows from that for :
According to ,
Denote by be the Hutchinson estimator  for the trace:
where are i.i.d. vectors whose entries are independent Rademacher variables. Note that . We will be using the following well-known results.
(Roosta-Khorasani and Ascher , Theorem ) Let be an SPSD matrix. Let . Then, with probability at least :
(Boutsidis, Drineas, Kambadur, Kontopoulou, and Zouzias , Lemma ) Let be a symmetric positive semidefinite matrix. There exists an algorithm that runs in time and outputs such that, with probability at least :
2 Hutchinson Estimator for
The following simple algorithm computes .
Let be a matrix and let be an integer. Then the output of Algorithm 1 is and the running time of the algorithm is
For denote and note that and , by . Fix and note that after the -th iteration we have
Thus, Denote ; thus the algorithm outputs the estimator and the theorem follows. The bound on the running time follows from direct computations. ∎
3 Algorithm for estimates without
We have for , Let be a SPSD matrix with . Denote . Since we have that is also a SPSD matrix and
According to Claim 5.4 the matrix is SPSD. So, we may apply estimator to this matrix. We also have
Below we will bound each separately.
According to Theorem 1.1 for we have with probability at least :
Note that .
Denoting we have, using Claim 5.3:
The final bound for
Thus, if , i.e., then
with probability at least
Back to matrix
4 Algorithm for estimates with
Let be an SPSD matrix such that and let be the output of Algorithm 3 on input . Then, with probability at least , we have:
In  the authors assume that they are given and . This assumption is stronger than our assumption that we only know . Under the stronger assumption of  we do not need to estimate using Lemma 1.2. Thus we don’t need to perform Step of Algorithm 3. Hence the running time becomes
which, for constant , is the same as in .
so for . From here and Claim 5.5, with ,
From here we conclude that for such the matrix is SPSD. In addition,
To conclude, we will provide estimates for and .
Estimate forAccording to Theorem 1.1, with probability at least , But and so (14)
We have . Applying Claim 5.6 with we get:
if satisfies . For such :
The final estimate for matrix
then satisfies , so for such from and :
Back to matrix
We have and . Recall that and . So we get from , w.p. at least :
for satisfying . Since and by substituting with and with the theorem follows. ∎
5 Technical Lemmas
There exists a constant , (defined in ), that depends only on , such that
for any .
If then . Thus, for :
Since for and , we have:
Further, it is known from Langrange formula that and thus
From here and we obtain
The claim follows. ∎
For all ,
Assume first that Then and so . Let now . Then and . Assume now . Then and
For we have
where the last inequality holds because . From here and the claim follows. ∎
For and :
where the first inequality follows from the definition of in , the second inequality follows since and the third inequality follows from Claim 5.1.Let us estimate the last sum
The claim follows from here and the above bound. ∎
For any and :
We have , and, by Claim 5.3, . So . The claim follows. ∎
Let . If
then for .
Claim 5.3 yields
Solving the inequality with respect to we get
Now we use the inequality
To check it, note that this inequality is equivalent to