Quantum query complexity of entropy estimation

Estimation of Shannon and Rényi entropies of unknown discrete distributions is a fundamental problem in statistical property testing and an active research topic in both theoretical computer science and information theory. Tight bounds on the number of samples to estimate these entropies have been established in the classical setting, while little is known about their quantum counterparts. In this paper, we give the first quantum algorithms for estimating -Rényi entropies (Shannon entropy being 1-Renyi entropy). In particular, we demonstrate a quadratic quantum speedup for Shannon entropy estimation and a generic quantum speedup for -Rényi entropy estimation for all , including a tight bound for the collision-entropy (2-Rényi entropy). We also provide quantum upper bounds for extreme cases such as the Hartley entropy (i.e., the logarithm of the support size of a distribution, corresponding to ) and the min-entropy case (i.e., ), as well as the Kullback-Leibler divergence between two distributions. Moreover, we complement our results with quantum lower bounds on -Rényi entropy estimation for all .

Our approach is inspired by the pioneering work of Bravyi, Harrow, and Hassidim (BHH) [13] on quantum algorithms for distributional property testing, however, with many new technical ingredients. For Shannon entropy and 0-Rényi entropy estimation, we improve the performance of the BHH framework, especially its error dependence, using Montanaro’s approach to estimating the expected output value of a quantum subroutine with bounded variance [41] and giving a fine-tuned error analysis. For general -Rényi entropy estimation, we further develop a procedure that recursively approximates -Rényi entropy for a sequence of s, which is in spirit similar to a cooling schedule in simulated annealing. For special cases such as integer and (i.e., the min-entropy), we reduce the entropy estimation problem to the -distinctness and the -distinctness problems, respectively. We exploit various techniques to obtain our lower bounds for different ranges of , including reductions to (variants of) existing lower bounds in quantum query complexity as well as the polynomial method inspired by the celebrated quantum lower bound for the collision problem.

1 Introduction

Motivations. Property testing is a rapidly developing field in theoretical computer science (e.g. see the survey [55]). It aims to determine properties of an object with the least number of independent samples of the object. Property testing is a theoretically appealing topic with intimate connections to statistics, learning theory, and algorithm design. One important topic in property testing is to estimate statistical properties of unknown distributions (e.g., [61]), which are fundamental questions in statistics and information theory, given that much of science relies on samples furnished by nature. The Shannon [56] and Rényi [54] entropies are central measures of randomness compressibility. In this paper, we focus on estimating these entropies for an unknown distribution.

Specifically, given a distribution over a set of size (w.l.o.g. let ) where denotes the probability of , the Shannon entropy of this distribution is defined by

(1.1)

A natural question is to determine the sample complexity (i.e., the necessary number of independent samples from ) to estimate , with error and high probability. This problem has been intensively studied in the classical literature. For multiplicative error , Batu et al. [7, Theorem 2] provided the upper bound of , while an almost matching lower bound of was shown by Valiant [61, Theorem 1.3]. For additive errors, Paninski gave a nonconstructive proof of the existence of sublinear estimators in [49, 50], while an explicit construction using samples was shown by Valiant and Valiant in [60] when ; for the case , Wu and Yang [64] and Jiao et al. [34] gave the optimal estimator with samples. A sequence of works in information theory [34, 64, 33] studied the minimax mean-squared error, which becomes also using samples.

One important generalization of Shannon entropy is the Rényi entropy of order , denoted , which is defined by

(1.2)

The Rényi entropy of order 1 is simply the Shannon entropy, i.e., . General Rényi entropy can be used as a bound on Shannon entropy, making it useful in many applications (e.g., [6, 17]). Rényi entropy is also of interest in its own right. One prominent example is the Rényi entropy of order 2, (also known as the collision entropy), which measures the quality of random number generators (e.g., [62]) and key derivation in cryptographic applications (e.g., [11, 32]). Motivated by these and other applications, the estimation of Rényi entropy has also been actively studied [4, 34, 33]. In particular, Acharya et al. [4] have shown almost tight bounds on the classical query complexity of computing Rényi entropy. Specifically, for any non-integer , the classical query complexity of -Rényi entropy is and . Surprisingly, for any integer , the classical query complexity is , i.e., sublinear in . When , the classical query complexity is and , which is always superlinear.

The extreme case () is known as the min-entropy, denoted , which is defined by

(1.3)

Min-entropy plays an important role in the randomness extraction (e.g., [59]) and characterizes the maximum number of uniform bits that can be extracted from a given distribution. Classically, the query complexity of min-entropy estimation is , which follows directly from [60].

Another extreme case (), also known as the Hartley entropy [29], is the logarithm of the support size of distributions, where the support of any distribution is defined by

(1.4)

It is a natural and fundamental quantity of distributions with various applications (e.g., [20, 58, 26, 22, 36, 51, 31]). However, estimating the support size is impossible in general because elements with negligible but nonzero probability, which are very unlikely to be sampled, could still contribute to . Two related quantities (support coverage and support size) have hence been considered as alternatives of 0-Rényi entropy with roughly complexity. (See details in Section 8.)

Besides the entropic measures of a discrete distribution, we also briefly discuss an entropic measure between two distributions, namely the Kullback-Leibler (KL) divergence. Given two discrete distributions and with cardinality , the KL divergence is defined as

(1.5)

KL divergence is a key measure with many applications in information theory [37, 18], data compression [15], and learning theory [35]. Classically, under the assumption that for some , can be approximated within constant additive error with high success probability if samples are taken from and samples are taken from .

Main question. In this paper, we study the impact of quantum computation on estimation of general Rényi entropies. Specifically, we aim to characterize quantum speed-ups for estimating Shannon and Rényi entropies.

Our question aligns with the emerging topic called “quantum property testing” (see the survey [43]) and focuses on investigating the quantum advantage in testing classical statistical properties. To the best of our knowledge, the first research paper on distributional quantum property testing is by Bravyi, Harrow, and Hassidim (BHH) [13], where they discovered quantum speedups for testing uniformity, orthogonality, and statistical difference on unknown distributions. Some of these results were subsequently improved by Chakraborty et al. [16]. Reference [13] also claimed that Shannon entropy could be estimated with query complexity , however, without details and explicit error dependence. Indeed, our framework is inspired by [13], but with significantly new ingredients to achieve our results. There is also a related line of research on spectrum testing or tomography of quantum states [45, 46, 25, 47]. However, these works aim to test properties of general quantum states, while we focus on using quantum algorithms to test properties of classical distributions (i.e., diagonal quantum states)111Note that one can also leverage the results of [45, 46, 25, 47] to test properties of classical distributions. However, they are less efficient because they deal with a much harder problem involving general quantum states..

Distributions as oracles. The sampling model in the classical literature assumes that a tester is presented with independent samples from an unknown distribution. One of the contributions of BHH is an alternative model that allows coherent quantum access to unknown distributions. Specifically, BHH models a discrete distribution on by an oracle for some . The probability () is proportional to the size of pre-image of under . Namely, an oracle generates if and only if for all ,

(1.6)

(note that we assume s to be rational numbers). If one samples uniformly from , then the output is from distribution . Instead of considering sample complexity—that is, the number of used samples—we consider the query complexity in the oracle model that counts the number of oracle uses. Note that a tester interacting with an oracle can potentially be more powerful due to the possibility of learning the internal structure of the oracle as opposed to the sampling model. However, it is shown in [13] that the query complexity of the oracle model and the sample complexity of the sampling model are in fact the same classically.

A significant advantage of the oracle model is that it naturally allows coherent access when extended to the quantum case, where we transform into a unitary operator acting on such that

(1.7)

Moreover, this oracle model can also be readily obtained in some algorithmic settings, e.g., when distributions are generated by some classical or quantum sampling procedure. Thus, statistical property testing results in this oracle model can be potentially leveraged in algorithm design.

Our Results. Our main contribution is a systematic study of both upper and lower bounds for the quantum query complexity of estimation of Rényi entropies (including Shannon entropy as a special case). Specifically, we obtain the following quantum speedups for different ranges of .

Theorem 1.1.

There are quantum algorithms that approximate of distribution on within an additive error with success probability at least 2/3 using222It should be understood that the success probability can be boosted to close to 1 without much overhead, e.g., see Lemma 5.5 in Section 5.1.5.

  • quantum queries when , i.e., Hartley entropy. See Theorem 8.2.3330-Rényi entropy estimation is intractable without any assumption, both classically and quantumly. Here, the results are based on the assumption that nonzero probabilities are at least . See Section 8 for more information.

  • quantum queries444 hides factors that are polynomial in and . when . See Theorem 5.2.

  • quantum queries when , i.e., Shannon entropy. See Theorem 3.1.

  • quantum queries when for some . See Theorem 6.1.

  • quantum queries when . See Theorem 5.1.

  • quantum queries when , where is the quantum query complexity of the -distinctness problem. See Theorem 7.1.

Our quantum testers demonstrate advantages over classical ones for all ; in particular, our quantum tester has a quadratic speedup in the case of Shannon entropy. When , our quantum upper bound depends on the quantum query complexity of the -distinctness problem, which is open to the best of our knowledge555Existing quantum algorithms for the -distinctness problem (e.g., [5] has query complexity and [9] has query complexity for some ) do not behave well for super-constant s. and might demonstrate a quantum advantage.

As a corollary, we also obtain quadratic quantum speedup for estimating KL divergence:

Corollary 1.1 (see Theorem 4.1).

Assuming and satisfies for some function , , there is a quantum algorithm that approximates within an additive error with success probability at least using quantum queries to and quantum queries to .

We also obtain corresponding quantum lower bounds on entropy estimation as follows. We summarize both bounds in Table 1 and visualize them in Figure 1.

Theorem 1.2 (See Theorem 9.1).

Any quantum algorithm that approximates of distribution on within additive error with success probability at least 2/3 must use

  • quantum queries when , assuming .

  • quantum queries when .

  • quantum queries when , assuming .

  • quantum queries when .

  • quantum queries when .

classical bounds quantum bounds ( this paper)
[63, 48] ,
, [4] ,
[60, 34, 64] ,
, [4] ,
[4]
[4] , ,
[60] ,
Table 1: Summary of classical and quantum query complexity of , assuming .
Figure 1: Visualization of classical and quantum query complexity of . The -axis represents and the -axis represents the exponent of . Red curves and points represent quantum upper bounds. Green curves and points represent classical tight bounds. The Blue curve represents quantum lower bounds.

Techniques. At a high level, our upper bound is inspired by BHH [13], where we formulate a framework (in Section 2) that generalizes the technique in BHH and makes it applicable in our case. Let for some function and distribution . Similar to BHH, we design a master algorithm that samples from and then use the quantum counting primitive [12] to obtain an estimate of and outputs . It is easy to see that the expectation of the output of the master algorithm is roughly666The accurate expectation is . Intuitively, we expect to be a good estimate of . . By choosing appropriate s, one can recover or as well as the ones used in BHH. It suffices then to obtain a good estimate of the output expectation of the master algorithm, which was achieved by multiple independent runs of the master algorithm in BHH.

The performance of the above framework (and its analysis) critically depends on how close the expectation of the algorithm is to and how concentrated the output distribution is around its expectation, which in turn heavily depends on the specific in use. Our first contribution is a fine-tuned error analysis for specific s, such as in the case of Shannon entropy (i.e., ) whose values could be significant for boundary cases of . Instead of only considering the case when is a good estimate of as in BHH, we need to analyze the entire distribution of using quantum counting. We also leverage a generic quantum speedup for estimating the expectation of the output of any quantum procedure with additive errors [41], which significantly improves our error dependence as compared to BHH. These improvements already give a quadratic quantum speedup for Shannon (Section 3) and 0-Rényi (Section 8) entropy estimation. As an application, it also gives a quadratic speedup for estimating the KL-divergence between two distributions (see Section 4).

For general -Rényi entropy , we choose and let so that . Instead of estimating with additive errors in the case of Shannon entropy, we switch to working with multiplicative errors which is harder since the aforementioned quantum algorithm [41] is much weaker in this setting. Indeed, by following the same technique, we can only obtain quantum speedups for -Rényi entropy when .

For general , our first observation is that if one knew the output expectation is within such that , then one can slightly modify the technique in [41] (as shown in Theorem 2.2) and obtain a quadratic quantum speedup similar to the additive error setting. This approach, however, seems circular since it is unclear how to obtain such in advance. Our second observation is that for any close enough , can be used to bound . Precisely, when , we have (see Lemma 5.3). As a result, when estimating , we can first estimate to provide a bound on , where differ by a factor and moves toward 1. We apply this strategy recursively on estimating until is very close to 1 from above when initial or from below when initial , where a quantum speedup is already known. At a high level, we recursively estimate a sequence (of size ) of such s that eventually converges to 1, where in each iteration we establish some quantum speedup which leads to an overall quantum speedup. We remark that our approach is in spirit similar to the cooling schedules in simulated annealing (e.g. [57]). (See Section 5.)

For integer , we observe a connection between and the -distinctness problem which leads to a more significant quantum speedup. Precisely, let be the oracle in (1.7), we observe that is proportional to the -frequency moment of which can be solved quantumly [42] based on any quantum algorithm for the -distinctness problem (e.g., [9]). However, there is a catch that a direct application of [42] will lead to a dependence on rather than . We remedy this situation by tweaking the algorithm and its analysis in [42] to remove the dependence on for our specific setting. (See Section 6.)

The integer algorithm fails to extend to the min-entropy case (i.e., ) because the hidden constant in has a poor dependence on (see Remark 6.1). Instead, we develop another reduction to the -distinctness problem by exploiting the so-called “Poissonized sampling” technique [39, 60, 34]. At a high level, we construct Poisson distributions that are parameterized by s and leverage the “threshold” behavior of Poisson distributions (see Lemma 7.1). Roughly, if passes some threshold, with high probability, these parameterized Poisson distributions will lead to a collision of size that will be caught by the -distinctness algorithm. Otherwise, we run again with a lower threshold until the threshold becomes trivial. (See Section 7.)

Some of our lower bounds come from reductions to existing ones in quantum query complexity, such as the quantum-classical separation of symmetric boolean functions [1], the collision problem [2, 38], and the Hamming weight problem [44], for different ranges of . We also obtain lower bounds with a better error dependence by the polynomial method, which is inspired by the celebrated quantum lower bound for the collision problem [2, 38]. (See Section 9.)

Open questions. Our paper raises a few open questions. A natural question is to close the gaps between our quantum upper and lower bounds. Our quantum techniques on both ends are actually quite different from the state-of-the-art classical ones (e.g., [60]). It is interesting to see whether one can incorporate classical ideas to improve our quantum results. It is also possible to achieve better lower bounds by improving our application of the polynomial method or exploiting the quantum adversary method (e.g., [30, 10]). Finally, our result motivates the study of the quantum algorithm for the -distinctness problem with super-constant , which might also be interesting by itself.

Notations. Throughout the paper, we consider a discrete distribution on , and represents the -power sum of . In the analyses of our algorithms, ‘’ is natural logarithm; ‘’ omits lower order terms.

2 Master algorithm

Let be a discrete distribution on encoded by the quantum oracle defined in (1.7). Inspired by BHH, we develop the following master algorithm to estimate a property with the form for a function .

1 Set ;
2 Regard the following subroutine as : 
3      Draw a sample according to ;
4       Use EstAmp or EstAmp with queries to obtain an estimation of ;
5       Output ;
6      
7Use for executions in Theorem 2.1 or Theorem 2.2 and output to estimate ;
Algorithm 1 Estimate of a discrete distribution on .

Comparing to BHH, we introduce a few new technical ingredients in the design of Algorithm 1 and its analysis, which significantly improve the performance of Algorithm 1 especially for specific s in our case, e.g., (Shannon entropy) and (Rényi entropy).

The first one is a generic quantum speedup of Monte Carlo methods [41], in particular, a quantum algorithm that approximates the output expectation of a subroutine with additive errors that has a quadratic better sample complexity than the one implied by Chebyshev’s inequality.

Theorem 2.1 (Additive error; Theorem 5 of [41]).

Let be a quantum algorithm with output such that . Then for where , by using executions of and , Algorithm 3 in [41] outputs an estimate of such that

(2.1)

It is worthwhile mentioning that classically one needs to use executions of  [19] to estimate . Theorem 2.1 demonstrates a quadratic improvement on the error dependence. In the case of approximating , we need to work with multiplicative errors while existing results (e.g. [41]) have a worse error dependence which is insufficient for our purposes. Instead, inspired by [41], we prove the following theorem (our second ingredient) that takes auxiliary information about the range of into consideration, which might be of independent interest.

Theorem 2.2 (Multiplicative error; Appendix A).

Let be a quantum algorithm with output such that for a known . Assume that . Then for where , by using and for executions, Algorithm 10 (given in Appendix A) outputs an estimate of such that

(2.2)

The third ingredient is a fine-tuned error analysis due to the specific s. Similar to BHH, we rely on quantum counting (named EstAmp[12] to estimate the pre-image size of a Boolean function, which provides another source of quantum speedup. In particular, we approximate any probability in the query model ((1.7)) by by estimating the size of the pre-image of a Boolean function with if and otherwise. However, for cases in BHH, it suffices to only consider the probability when and are close, while in our case, we need to analyze the whole output distribution of quantum counting. Specifically, letting and for some , we have

Theorem 2.3 ([12]).

For any , there is a quantum algorithm (named EstAmp) with quantum queries to that outputs for some such that

(2.3)

where . This promises with probability at least for and with probability greater than for . If then with certainty.

Moreover, we also need to slightly modify EstAmp to avoid outputting in estimating Shannon entropy. This is because is not well-defined at . Let EstAmp be the modified algorithm. It is required that EstAmp outputs when EstAmp outputs 0 and outputs EstAmp’s output otherwise.

By leveraging Theorem 2.1, Theorem 2.2, Theorem 2.3, and carefully setting parameters in Algorithm 1, we have the following corollaries that describe the complexity of estimating any .

Corollary 2.1 (additive error).

Given . If where and is large enough such that , then Algorithm 1 approximates with an additive error and success probability using quantum queries to .

Corollary 2.2 (multiplicative error).

Assume a procedure using quantum queries that returns an estimated range , and that with probability at least 0.9. Let where and . For large enough such that , Algorithm 1 estimates with a multiplicative error and success probability with queries.

3 Shannon entropy estimation

We develop Algorithm 2 for Shannon entropy estimation with EstAmp in Line 1, which provides quadratic quantum speedup in .

1 Set ;
2 Regard the following subroutine as : 
3      Draw a sample according to ;
4       Use EstAmp with queries to obtain an estimation of ;
5       Output ;
6      
7Use for executions in Theorem 2.1 and output an estimation of ;
Algorithm 2 Estimate the Shannon entropy of on .
Theorem 3.1.

Algorithm 2 approximates within an additive error with success probability at least using quantum queries to .

Proof.

We prove this theorem in two steps. The first step is to show that the expectation of the subroutine ’s output (denoted ) is close to . To that end, we divide into partitions based on the corresponding probabilities. Let and , , . For convenience, denote . Then

(3.1)

Our main technical contribution is the following upper bound on the expected difference between and in terms of the partition , :

(3.2)

By linearity of expectation, we have

(3.3)

As a result, by applying (3.1) and Cauchy-Schwartz inequality to (3.3), we have

(3.4)

Because a constant overhead does not influence the query complexity, we may rescale Algorithm 2 by a large enough constant so that .

The second step is to bound the variance of the random variable, which is

(3.5)

Since for any , EstAmp outputs such that , we have . As a result, by Corollary 2.1 we can approximate up to additive error with failure probability at most using

(3.6)

quantum queries. Together with , Algorithm 2 approximates up to additive error with failure probability at most . ∎

It remains to prove (3.2). We prove:

(3.7)

For in (3.2), the proof is similar because the dominating term has the angles of and fall into the same interval of length , and as a result .

Proof of (3.7).

For convenience, denote where and . Because , when , hence is an increasing function; when , hence is a decreasing function; when , and reaches its maximum .

Since , we can write where . By Theorem 2.3, for any , the output of EstAmp when taking queries satisfies

(3.8)
(3.9)

Combining (3.8), (3.9), and the property of function discussed above, for any we have

(3.10)
(3.11)
(3.12)
(3.13)
(3.14)

where (3.10) comes from (3.8) and (3.9), (3.11) comes from the property of , (3.12) holds because , (3.13) holds because , and (3.14) holds because . Consequently,

(3.15)

4 Application: KL divergence estimation

Classically, there does not exist any consistent estimator that guarantees asymptotically small error over the set of all pairs of distributions [27, 14]. These two papers then consider pairs of distributions with bounded probability ratios specified by a function , namely all pairs of distributions in the set as follows:

(4.1)

Denote the number of samples from and to be and , respectively. References [27, 14] shows that classically, can be approximated within constant additive error with high success probability if and only if and .

Quantumly, we are given unitary oracles and defined by (1.7). Algorithm 3 below estimates the KL-divergence between and , which is similar to Algorithm 2 that uses EstAmp, while adapts to be mutually defined by and .

1 Set ;
2 Regard the following subroutine as : 
3      Draw a sample according to ;
4       Use the modified amplitude estimation procedure EstAmp with and quantum queries to and to obtain estimates and , respectively;
5       Output ;
6      
7Use for times in Theorem 2.1 and outputs an estimation of ;
Algorithm 3 Estimate the KL divergence of and on .
Theorem 4.1.

For , Algorithm 3 approximates within an additive error with success probability at least using quantum queries to and quantum queries to , where hides polynomials terms of , , and .

Proof.

If the estimates and were precisely accurate, the expectation of the subroutine’s output would be . On the one hand, we bound how far the actual expectation of the subroutine’s output is from its exact value . By linearity of expectation,

(4.2)