Minimax optimal goodness-of-fit testing for densitiesunder a local differential privacy constraint

Minimax optimal goodness-of-fit testing for densities under a local differential privacy constraint

Abstract

Finding anonymization mechanisms to protect personal data is at the heart of machine learning research. Here we consider the consequences of local differential privacy constraints on goodness-of-fit testing, i.e. the statistical problem assessing whether sample points are generated from a fixed density , or not. The observations are hidden and replaced by a stochastic transformation satisfying the local differential privacy constraint. In this setting, we propose a new testing procedure which is based on an estimation of the quadratic distance between the density of the unobserved sample and . We establish minimax separation rates for our test over Besov balls. We also provide a lower bound, proving the optimality of our result. To the best of our knowledge, we provide the first minimax optimal test and associated private transformation under a local differential privacy constraint, quantifying the price to pay for data privacy.

1 Literature

Ensuring user privacy is at the core of the development of Artificial Intelligence. In particular, someone with access to the training set or the outcome of the algorithm should not be able to retrieve the original dataset. However, classical anonymization and cryptographic approaches fail to prevent the disclosure of sensitive information in the context of learning. Hence differential privacy mechanisms were developed to cope with such issues.

Such considerations can be traced back to a few major papers. In particular, [WAR65] presents the first privacy mechanism which is now a baseline method for binary data: Randomized response. Another important result is presented in the works of [DL86, DL89, FS98], where they expose a trade-off between statistical utility, or in other terms perfomance, and privacy.

Differential privacy as expressed in [DMN+06b, DKM+06a] is the most common formalization of the problem of privacy. It can be summed up as the following condition: altering a single data point of the training set only affects the probability of an outcome to a limited degree. One main advantage of such a definition of the privacy is that it can be parametrized by some , where low values of correspond to a more restrictive privacy condition. Such a definition is global to the private dataset.

Now we consider a stronger privacy condition, where the analyst himself is not trusted with the data: local differential privacy. This has been extensively studied through the concept of local algorithms, especially in the context of privacy-preserving data mining [WAR65, AS00, AA01, vv02, EGS03, AH05, MS06, JSW08, KLN+11]. Recent results detailed in [DJW13a, DWJ13c, DJW13b] give information processing inequalities where appears. Those can be used to obtain Fano or Le Cam-type inequalities in order to obtain a minimax lower bound for estimation or testing problems.

Testing problems have appeared as crucial tools in machine learning since they enable to assess whether a model fits the observations, hence enabling anomalies or novelties to be detected. In particular, goodness of fit measures the discrepancy between observed values and a known density provided by the expected model for the behavior of the data. This motivates our study of goodness-of-fit testing under a local differential privacy constraint. Goodness-of-fit testing is a classical hypothesis testing problem in statistics. It consists in testing whether the density of independent and identically distributed (i.i.d.) observations equals a specified distribution or not. Assuming that and belong to , it is natural to propose a test based on an estimation of the squared -distance between and . In order to test uniformity over of the samples , [NEY37] introduces an orthonormal basis of where . The uniformity assumption is rejected if the estimator exceeds some threshold, where is a given integer. Data-driven versions of this test, where the parameter is chosen to minimize some penalized criterion have been introduced by [BR92, LED94, KAL95, ING96].

We want to design our tests so that they can reject the null hypothesis if the data was not actually generated from the given model with a given confidence level. Additionally, we want to find the limitations of the test by determining how close the two hypotheses can get while remaining separated by the testing procedure. This classical problem has been studied under the lens of minimax optimality in the seminal work by [ING87, ING93]. Non-asymptotic performances and an extension to composite null hypotheses are provided in the paper by [FL06]. In order to introduce the notion of minimax optimality for a testing procedure, let us recall some definitions. We consider the uniform separation rate as defined in [BAR02]. Let be a -level test with values in , where corresponds to the decision of rejecting the null hypothesis . The uniform separation rate of the test , over a class of alternatives such that satisfies smoothness assumptions, with respect to the -norm, is defined for all in by

(1)

where denotes the distribution of the i.i.d. sample with common density .

The uniform separation rate is then the smallest value in the sense of the -norm of for which the second kind error of the test is uniformly controlled by over . This definition extends the notion of critical radius introduced in [ING93] to the non-asymptotic framework. Note that in general, minimax separation rates are different from minimax estimation rates since testing and estimation are problems of different kinds.

A test of level having the optimal performances should then have the smallest possible uniform separation rate (up to a multiplicative constant) over . To quantify this, [BAR02] introduces the non-asymptotic minimax rate of testing defined by

(2)

where the infimum is taken over all tests of level . A test is optimal in the minimax sense over the class if its uniform separation rate is upper-bounded, up to some constant, by the non-asymptotic minimax rate of testing.

Assuming that is some Hölder class with smoothness parameter , [ING93] establishes the asymptotic minimax rate of testing . The test proposed in his paper is not adaptive since it makes use of a known smoothness parameter . Adaptive goodness-of-fit tests are provided in [ING00] and [FL06]. These tests achieve the separation rate over a wide range of regular classes with smoothness parameter , the term being the price to pay for adaptation to the unknown parameter .

A few problems have already been tackled in order to obtain minimax rates under local privacy constraint. The main question is whether the minimax rates are affected by the local privacy constraint and to quantify the degradation of the rate in that case. For a few problems, a degradation of the effective sample size by a multiplicative constant is found. In [DWJ13c], they obtain minimax estimation rates for multinomial distributions with dimensions and find a sample degradation of . That is, if is the necessary and sufficient number of samples in order to solve the classical problem, the -local differential private problem is solved with samples, where is the number of dimensions. In [DJW18], they also find a multiplicative sample degradation of for generalized linear models, and for median estimation. However, in other problems, a polynomial degradation is noted. For one-dimensional mean estimation, the usual minimax rate is , whereas the private rate is for random variables such that and . As for the problem of nonparametric density estimation, the rate goes from to over an elliptical Sobolev space. This result was extended in [BDK+19] over Besov ellipsoids. The classical minimax mean squared errors were presented in [YU97, YB99, TSY04].

We list out our contributions:

  • We provide the first minimax lower bound for the problem of goodness-of-fit test under local privacy constraint over Besov balls.

  • We present the first minimax optimal test with the associated local differentially private channel in this setting.

  • The test is made adaptive to the smoothness up to a logarithmic term.

In a setting very similar to ours, [GLR+16] tackles the problems of independence testing and identity testing. More precisely, they test whether sample points were drawn from a known multinomial distribution. However, we consider densities instead. Besides they work under differential privacy constraints, whereas we enforce local privacy. Note that they apply Laplace perturbation to the frequencies, whereas we apply the perturbation onto the coefficients of a wavelet basis, and the choice of the basis is crucial in obtaining the optimal rate. Finally and most importantly, they do not provide guarantees on the convergence rates. [ADK+19] tackle two-sample equivalence testing with unequal sized samples and independence testing under a global differential privacy constraint. In particular, their novel privatization method maintains sample efficiency of the testing method presented in [DK16].

The rest of the paper is articulated as follows. In Section 2, we detail our setting and sum up our results. We introduce a test and a privacy mechanism in Section 3. This will lead to an upper bound on the minimax separation distance for identity testing. However, the proposed test depends on the smoothness parameter which is unknown in general. That is the reason why we present a version of the test in Section 4 that is adaptive to . A lower bound that matches the upper bound is introduced in Section 5. Afterwards, we conclude the paper with a final discussion in Section 6. Finally, in the Supplementary Material, the proofs of all the results presented in this paper are contained in Section A and discussions on possible alternatives for the proof of the lower bound in Section B.

All along the paper, will denote some absolute constant, will be constants depending only on their arguments. The constants may vary from line to line.

2 Setting

2.1 Local differential privacy

Let be some positive integer. Consider unobserved random variables , which are independent and identically distributed (i.i.d.) with density with respect to Lebesgue measure on .

Let . Observe which are -local differentially private views of . That is, there exist such that for all , is a stochastic transformation of by the channel and

(3)

Equation (3) represents the -local differential privacy assumption in the general interactive case. The stronger assumption corresponding to the non-interactive case (see [WAR65] and [EGS03]) is expressed, for all , as

(4)

Our results focus on the non-interactive case. Let be the set of channels satisfying the condition in Equation (4).

2.2 Separation rates over Besov balls

The aim of the paper is to provide optimal separation rates for goodness-of-fit tests over Besov balls under privacy constraints. We first recall the definition of Besov balls and we define the uniform separation rates of testing in the private setting.

We consider a pair of compactly supported wavelets such that for all in ,

is an orthonormal basis of . For the sake of simplicity, we consider the Haar basis where and . In this case, for all , .
We denote for all , and .

Let and . The Besov ball with radius associated with the Haar basis is defined as

Fix a density . We want to test the hypotheses

The twist on classical goodness-of-fit testing is in the fact that the samples from are unobserved, we only observe their private views.

For and , we construct an -local differentially private channel and a -level test such that

where

is the joint marginal channel such that Equation (4) holds (we assume here that for all ).

We then define the uniform separation rate of the test over the class as

(5)

A good channel and a good test are characterized by a small uniform separation rate. This leads us to the definition of the -private minimax separation rate over the class

(6)

where the infimum is taken over every possible -private channel and all -level test based on the private observations .

2.3 Overview of the results

We introduce the following classes of alternatives : for any , we define the set as follows

(7)

We also assume that . Note that the class depends on since only regularity for the difference is required to establish the separation rates. Nevertheless, for the sake of simplicity we omit in the notation of this set.

The results presented in Theorems 3.3 and 5.2 can be condensed into the following conclusion that holds for any , , , such that ,

(8)
Remark 2.1.

Since we obtain matching bounds up to a log term in Equation (2.3), we can deduce the minimax separation rate for goodness-of-fit testing under a local privacy constraint. It can be decomposed into two different regimes. When is larger than , then the rates of our upper and lower bounds match exactly. Then the minimax rate is of order , which coincides with the rate obtained in the non-private case in [ING87]. However, when is smaller than , the rates of our upper and lower bounds only match in . The minimax rate is then of order and so we show a polynomial degradation in the rate due to the privacy constraints. Such a degradation has also been discovered in the related problem of second moment estimation and mean estimation, as well as for the density estimation in [BDK+19]. Our bounds do not match in however and this leads to untight bounds when . This is not an issue in practice, since will be taken small in order to guarantee privacy.

3 Definition of a test and a privacy mechanism

We will firstly define a testing procedure coupled with a privacy mechanism for which an upper bound on the uniform separation rate matches the right-hand side in Equation (2.3).

Let be i.i.d. with common density . We assume that and are supported on and belong to . We want to test

(9)

from -local differentially private views of . Let us first propose a transformation of the data, satisfying the differentially privacy constraints.

3.1 Privacy mechanism

We consider the privacy mechanism introduced in [BDK+19]. Let us fix some integer . We consider for all the functions introduced in Section 2.2. We define, for all the vector , by

(10)

where are i.i.d. Laplace distributed random variables with parameter and

Lemma 3.1.

To each random variable of the sample set , we associate the vector . The random vectors are non-interactive -local differentially private views of the samples . Namely, they satisfy the condition in Equation (4).

The proof is due to [BDK+19]. We recall here the main arguments.

Proof.

The random vectors are i.i.d. by definition. Let us denote by the density of the vector , conditionally to . For any in ,

Since and since for a single value of , we get

by definition of , which concludes the proof by application of Lemma 3.2.

Lemma 3.2.

Denote the density of the vector , conditionally to , with respect to the measure . Then , if and only if there exists a measurable set with such that for any .

Proof.

Assume there exists with such that for any .

Let be some measurable subset of the support of . Let .

Then

So

Assume that . Then for any measurable , we have That is, for any ,

So there exists with such that for any .

3.2 Definition of the test

Our aim is now to define a testing procedure for the testing problem defined in Equation (9) from the observation of the vectors . Our test statistic is defined as

(11)

where .

We consider the test function

(12)

where denotes the -quantile of under . Note that this quantile can be estimated by simulations, under the hypothesis . We can indeed simulate the vector if the density of is assumed to be . Hence the test rejects the null hypothesis if

The test is obviously of level by definition of the threshold.

Comments:

In a similar way as in [FL06], the test is based on an estimation of the quantity Note indeed that is an unbiased estimator of , where denotes the orthogonal projection in onto the space generated by the functions .

In the next section, we provide non-asymptotic theoretical results for the power of this test.

3.3 Upper bound on the minimax separation rate

We provide an upper bound on the uniform separation rate for our test and privacy channel over Besov balls in Theorem 3.3. It also constitutes an upper bound on the minimax separation rate.

Theorem 3.3.

Let be i.i.d. with common density on . Let be some given density on . From the observation of the random vectors defined by Equation (10), for a given , we want to test the hypotheses

We assume that is uniformly bounded by and that .

We consider the test defined by Equation (12) with , where is the smallest integer such that .

The uniform separation rate, defined by Equation (5), of the test over defined by Equation (7) satisfies for all , , such that

The proof of this result is in Section A.1 of the Supplementary Material.

Comments:

When the sample is directly observed, [FL06] propose a testing procedure with uniform separation rate over the set controlled by

which is an optimal result. Hence we obtain here a loss in the uniform separation rate, due to the fact that we only observe -differentially private views of the original sample. This loss occurs when , otherwise, we get the same rate as when the original sample is observed. We will see in Section 5 that this result is optimal.

Finally, having represents an extreme case, where the sample size is really low in conjunction with a very strict privacy condition. In such a range of , is taken equal to 0, but this does not lead to optimal rates.

The test proposed in Theorem 3.3 depends (via the parameter ) on the smoothness parameter of the Besov ball . In a second step, we will propose a test adaptive to . In Section 4, we construct an aggregated testing procedure, which is independent of the smoothness parameter and achieves the minimax separation rates established in Equation (2.3) over a wide range of Besov balls simultaneously, up to a logarithmic term.

4 Adaptive tests

In Section 2.2, we have defined a testing procedure which depends on the parameter . The performances of the test depend on this parameter. We have optimized the choice of to obtain the smallest possible upper bound for separation rate over the set . Nevertheless, the test is not adaptive since this optimal choice of depends on the smoothness parameter .

In order to obtain adaptive procedure, we propose, as in [FL06] to aggregate a collection of tests. For this, we introduce the set

For a given level , the aggregated testing procedure rejects the hypothesis if

where is defined by

(13)

Hence is the least conservative choice leading to a -level test. We easily notice that . Indeed,

Let us now consider the second kind error for the aggregated test, which is the probability to accept the null hypothesis incorrectly. This quantity is upper bounded by the smallest second kind error of the tests of the collection, at the price that has been replaced by . Indeed,

(14)

We obtain the following theorem for the aggregated procedure.

Theorem 4.1.

Let be i.i.d. with common density in . Let be some given density in . From the observation of the random vectors defined by Equation (10), for a given , we want to test the hypotheses

We assume that are uniformly bounded by and we assume that .

We consider the set and the aggregated test

where is defined by Equation (13). The uniform separation rate, defined by Equation (5), of the test over the set defined by Equation (7) satisfies for all , , , such that ,

The proof of this result is in Section A.2 of the Supplementary Material.

Comments: Comparing this result with the rates obtained in Theorem 3.3, which will be proved to be optimal in the next section, we have here a logarithmic loss due to the adaptation. We recall the separation rates in the non-private setting obtained by [ING00] and [FL06] for adaptive procedures over Besov balls. They were of order . We do not know if the logarithmic term that we obtain here is optimal or not.

5 Lower bound on the minimax separation rate

We consider for any , the classes of alternatives defined by Equation (7).

This section will focus on the presentation of a lower bound on the minimax separation rate over Besov balls defined in Equation (6) for the problem of identity testing under a local differential privacy constraint. The test and privacy mechanism showcased in Section 3 will turn out to be minimax optimal since the lower bound will match the upper bound obtained in Theorem 3.3.

Let us apply a Bayesian approach, where we will define a prior distribution which corresponds to a mixture of densities satisfying . Such a proof technique is classical for lower bounds in minimax testing, as described in [BAR02]. Its application is mainly due to [ING93] and inequalities on the total variation distance from [LE 86]. The result of this approach is summarized in the following lemma.

Lemma 5.1.

Let and such that . Let and . We define

Let and let be some -private channel. Let be some probability measure such that and let be defined, for all measurable set by

We note the total variation distance between two probability measures and as .

Then if

we have

where the infimum is taken over all possible -level test, hence satisfying

The idea is to establish the connection between the second kind error and the total variation distance between arbitrary distributions with respective supports in and . It turns out that the closer the distributions from and can be, the higher the second kind error. So if we are able to provide distributions from and which are close from one another, we can guarantee that the second kind error of any test will have to be high.

Theorem 5.2.

Let such that . Let .

We obtain the following lower bound for the -private minimax separation rate defined by Equation (6) for non-interactive channels in over the class of alternatives

Note that the result only holds for non-interactive channels. The details of the proof can be found in Section A.4 of the Supplementary Material. We outline the intuition behind the main arguments in the following sketch.

Sketch of proof. We want to find the largest -distance between the initial density under the null hypothesis and the density in the alternative hypothesis such that their transformed counterparts by an -private channel cannot be discriminated by a test. We will rely on the singular vectors of in order to define densities and their private counterparts with ease. We define a mixture of densities in the private space such that they have a fixed -distance to , which is the private transformation of by . We obtain a sufficient condition for the total variation distance between the mixture and to be small enough for both hypotheses to be indistinguishable. Then we ensure that the densities that we have considered in the private set are associated with densities for the original sample that belong to the regularity class . Employing bounds on the singular values of , we obtain sufficient conditions for the original densities to have the right regularity. Collecting all these elements, the conclusion relies on Lemma 5.1.

Remark 5.3.

The total variation distance is a good criterion in order to determine whether two distributions are distinguishable. Another natural idea to prove Theorem 5.2 is to bound the total variation distance between two private densities by the total variation distance between the densities of the original samples, up to some constants depending on the privacy constraints. Following this intuitive approach, we can provide a lower bound using Theorem 1 in [DJW13b] combined with Pinsker’s inequality. However, the resulting lower bound does not match the upper bound for the separation rates of goodness-of-fit testing presented in our Section 3. Details on this approach are provided in Section B of the Supplementary Material.

6 Discussion

We provided the first minimax optimal test and local differentially private channel for the problem of goodness-of-fit testing over Besov balls. Besides the test and channel remain optimal up to a log factor even if the smoothness parameter is unknown. Among our technical contributions, it is to note that we used a proof technique in the lower bound that does not involve Theorem 1 from [DJW13b]. The minimax separation rate turns out to suffer from a polynomial degradation in the private case. However, we point out an elbow effect, where the rate is the same as the usual case up to some constant if is large enough. Future possible works could extend our results to larger Besov classes and to the discrete case. Besides, a lower bound including the study of interactive channels is open for further research.

Acknowledgements. B. Laurent and J-M. Loubes recognize the funding by ANITI ANR-19-PI3A-0004.

Appendix A Proofs

In the following, denotes the image function and the dimension function.

a.1 Upper bound: proof of Theorem 3.3

We want to establish a condition on , under which the second kind error of the test is controlled by , namely

(15)

Denoting by the -quantile of under , the condition in Equation (15) holds as soon as . Hence, we provide an upper bound for and a lower bound for .

Upper bound for

Since we have by Markov’s inequality, for all ,

So, considering the -quantile of under , we have

So

(16)

Note that one can rewrite as

where

In order to provide an upper bound for the variance , let us first state a lemma controling the variance of a -statistic of order . This result is a particular case of Lemma 8 in [MAL+19].