Gradual Release of Sensitive Data under Differential Privacy

Gradual Release of Sensitive Data under Differential Privacy

Fragkiskos Koufogiannis, Shuo Han, George J. Pappas1
11Department of Electrical and Systems Engineering, University of Pennsylvania. Emails: {fkouf,hanshuo,pappasg} This work was supported in part by the TerraSwarm Research Center, one of six centers supported by the STARnet phase of the Focus Center Research Program (FCRP) a Semiconductor Research Corporation program sponsored by MARCO and DARPA.

We introduce the problem of releasing sensitive data under differential privacy when the privacy level is subject to change over time. Existing work assumes that privacy level is determined by the system designer as a fixed value before sensitive data is released. For certain applications, however, users may wish to relax the privacy level for subsequent releases of the same data after either a re-evaluation of the privacy concerns or the need for better accuracy. Specifically, given a database containing sensitive data, we assume that a response that preserves -differential privacy has already been published. Then, the privacy level is relaxed to , with , and we wish to publish a more accurate response while the joint response preserves -differential privacy. How much accuracy is lost in the scenario of gradually releasing two responses and compared to the scenario of releasing a single response that is -differentially private? Our results show that there exists a composite mechanism that achieves no loss in accuracy.

We consider the case in which the private data lies within with an adjacency relation induced by the -norm, and we focus on mechanisms that approximate identity queries. We show that the same accuracy can be achieved in the case of gradual release through a mechanism whose outputs can be described by a lazy Markov stochastic process. This stochastic process has a closed form expression and can be efficiently sampled. Our results are applicable beyond identity queries. To this end, we demonstrate that our results can be applied in several cases, including Google’s RAPPOR project, trading of sensitive data, and controlled transmission of private data in a social network. Finally, we conjecture that gradual release of data without performance loss is an intrinsic property of differential privacy and, thus, holds in more general settings.

1 Introduction

Differential privacy is a framework that provides rigorous privacy guarantees for the release of sensitive data. The intrinsic trade-off between the privacy guarantees and accuracy of the privacy-preserving mechanism is controlled by the privacy level ; smaller values of imply stronger privacy and less accuracy. Specifically, end users, who are interested in the output of the mechanism, demand acceptable accuracy of the privacy-preserving mechanism, whereas, owners of sensitive data are interested in strong enough privacy guarantees.

Existing work on differential privacy assumes that the privacy level is determined prior to release of any data and remains constant throughout the life of the privacy-preserving mechanism. However, for certain applications, the privacy level may need to be revised after data has been released, due to either users’ need for improved accuracy or after owners’ re-evaluation of the privacy concerns. One such application is trading of private data, where the owners re-evaluate their privacy concerns after monetary payments. Specifically, the end users initially access private data under privacy guarantees and they later decide to “buy” more accurate data, relax privacy level to , and enjoy better accuracy. Furthermore, the need for more accurate responses may dictate a change in the privacy level. In particular, a database containing sensitive data is persistent over time; e.g. a database of health records contains the same patients with the same health history over several years. Future uses of the database may require better accuracy, especially, after a threat is suspected (e.g. virus spread, security breach). These two example applications share the same core questions.

Is it possible to release a preliminary response with -privacy guarantees and, later, release a more accurate and less private response with overall -privacy guarantees? How is this scenario compared to publishing a single response under -privacy guarantees? In fact, is the performance of the second response damaged by the preliminary one?

Composition theorems [1] provide a simple, but suboptimal, solution to gradually releasing sensitive data. Given an initial privacy level , a noisy, privacy-preserving response is generated. Later, the privacy level is increased to a new value and a new response is published. For an overall privacy level of , the second response needs to be -private, according to the composition theorem. Therefore, the accuracy of the second response deteriorates because of the initial release .

In this work, we derive a composite mechanism which exhibits no loss in accuracy after the privacy level is relaxed. This mechanism employs correlation between successive responses, and, to the best of our knowledge, is the first mechanism that performs gradual release of sensitive data.

1.1 Our Results

This work introduces the problem of gradually releasing sensitive data. Our results focus on the case of vector-valued sensitive data with an -norm adjacency relation. Our first result states that, for the one-dimensional () identity query, there is an algorithm which relaxes privacy in two steps without sacrificing any accuracy. Although our technical treatment focuses on identical queries, our results are applicable to a broader family of queries. We also prove the Markov property for this algorithm and, thus, we can easily (without any computational complexity) relax privacy in any number of steps. These two results provide a different perspective of differential privacy, and lead to the definition of a lazy Markov stochastic process indexed by the privacy level . Gradually releasing sensitive data is performed by sampling once from this stochastic process. We also extend the results to the high-dimensional case.

On a theoretical level, our contributions add a whole new dimension to differential privacy — that of a varying parameter . We focus on the mechanism that adds Laplace-distributed noise to the private data :


where is the privacy level, is the -norm, and are independent and identically distributed samples from the stochastic process which has the following properties:

  1. is Markov: .

  2. is Laplace-distributed: .

  3. is lazy, i.e. there is positive probability of not changing value):

    where is Dirac’s delta function. For a fixed , mechanism (1) reduces to the Laplace mechanism.

Mechanism (1) has the following properties and, thus, performs gradual release of private data:

  • Privacy: For any set of privacy levels , the mechanism that responds with is -private.

  • Accuracy: For a fixed , the mechanism is the optimal -private mechanism.

In practice, gradual release of private data is achieved by sampling the stochastic process :

  1. Draw a single sample from the stochastic process .

  2. Compute the signal , .

  3. For -privacy guarantees, release the random variable .

  4. Once privacy level is relaxed from to , where , release the random variable .

  5. In order to relax privacy level in an arbitrarily many times, , repeat the last step.

More formally, our main result derives a composite mechanism that gradually releases private data by relaxing the privacy level in an arbitrary number of steps.

Theorem 1 (A. Gradual Privacy as a Composite Mechanism).

Let be the space of privacy data equipped with an -norm adjacency relation. Consider privacy levels such that which successively relax the privacy level. Then, there exists a composite mechanism of the form


such that:

  1. The restriction of the mechanism to the first coordinates is -private, for any .

  2. Each coordinate of the mechanism achieves the optimal mean-squared error .

The mechanism that satisfies Theorem 1 has a closed-form expression and provides a new perspective of differential privacy. Instead of designing composite mechanisms of the form (2), we consider the continuum of privacy levels . Our results are more succinctly stated in terms of a stochastic process . A composite mechanism is recovered from the stochastic process by sampling the process at a finite set of privacy levels .

Theorem 1 (B. Gradual Privacy as a Stochastic Process).

Let be the space of privacy data equipped with the -norm. Then, there exists a stochastic process that defines the family of mechanisms parametrized by :


such that:

  • Privacy: For any , the mechanism that releases the signal is -private.

  • Accuracy: The mechanism that releases the random variable is the optimal -private mechanism, i.e. the noise sample achieves the optimal mean-squared error .

Figure 1: Gradual release of identity queries is achieved with the use of the stochastic process for . For tight values of privacy (), high values of noise () are returned, whereas, almost zero samples () are returned for large privacy budgets (). The process is Markov; future samples depend only on the current value of the process which eases implementation. Furthermore, the process is lazy; the value of the process changes only a few times.

From a more practical point of view, our results are applicable to cases beyond identity queries. Specifically, our results are directly applicable to a broad family of privacy-preserving mechanisms that are built upon the Laplace mechanism and, informally, have the following form. The sensitive data is initially preprocessed, then, the Laplace mechanism is invoked, and, finally, a post-processing step occurs. Under the assumption that the preprocessing step is invariant of the privacy level, gradual release of sensitive data is possible. We demonstrate the applicability of our results on Google’s RAPPOR project [2], which analyzes software features that individuals use while respecting their privacy. In particular, if a software feature is suspected to be malicious, privacy level can be gradually relaxed and a more accurate analysis can be performed. On another direction, our results broaden the spectrum of applications of differential privacy. To this end, we present an application to social networks where users have different privacy concerns against close friends, acquaintances, and strangers.

We conclude our paper with a conjecture. Although present work focuses on mechanisms that add Laplace-distributed noise, we conjecture that the feasibility of gradually releasing sensitive data is a more general property of differential privacy. In particular, we formulate the conjecture that repeatedly relaxing the privacy level without loss of accuracy is possible for a larger family of privacy-aware mechanisms.

1.2 Previous Work

Differential privacy is an active field of research and a rich spectrum of differential private mechanisms has appeared in the literature. The exponential mechanism [1] is a powerful and generic tool for building differential private mechanisms. In particular, mechanisms that efficiently approximate linear (counting) queries have received a lot of attention [3], [4], [5]. Besides counting queries, privacy-aware versions of more complex quantities have been introduced such as signal filtering [6], optimization problems [7], [8], and allocation problems [9]. In addition to the theoretical work, differential privacy has been deployed in software tools [10].

The aforementioned work assumes that the privacy level is a designer’s choice that is held fixed throughout the life of the privacy-aware mechanism. To the best of our knowledge, our work is the first approach that considers privacy-aware mechanisms with a varying privacy level . Gradually releasing private data resembles the setting of differential privacy under continuous observation, which was first studied in [11]. In that setting of [11], the privacy level remains fixed while more sensitive data is being added to the database and more responses are released. In contrast, our setting assumes that both the sensitive data and the quantity of interest are fixed and the privacy level is varying.

Gradual release of sensitive data is closely related to optimality results. Work in [3] established optimality results in an asymptotic sense (with the size of the database). Instead, our work requires exact optimality results and, therefore, is presented within a tighter version of differential privacy that was explored in [12], [13], where exact optimality results exist. This tighter notion which is targeted for metric spaces and we call Lipschitz privacy, allows for the use of optimization techniques and calculus tools. Prior work on Lipschitz privacy includes the exact optimality of the Laplace mechanism is established under Lipschitz privacy [13], [14].

On a more technical level, most prior work on differential privacy [7], [8], [9] introduces differential private mechanisms that are built upon the Laplace mechanism and variations of it. Although building upon the Laplace mechanism limits the solution space, there is a good reason for doing so. Specifically, for non-trivial applications, the space of probability measures can be extremely rich and hard to deal with. Technically, our approach deviates from prior work by searching over the whole space of differential private mechanisms. Work in [15] is another example that proposes a non-Laplace distribution in order to achieve better performance on subsequent queries while satisfying overall differential privacy constraints. The Laplace mechanism, then, naturally emerges as the optimal mechanism.

2 Background Information

2.1 Differential Privacy

The framework of differential privacy [16], [17] dictates that, whenever sensitive data is accessed, a noisy response is returned. The statistics of the injected noise are deliberately designed to ensure two things. First, an adversary that observes the noisy response cannot confidently infer the original sensitive data. The privacy level is parametrized by , where smaller values of imply stronger privacy guarantees. Second, the noisy response can still be used as a surrogate of the exact response without severe performance degradation. On the other hand, the accuracy of the noisy response is quantified by the mean-squared error from the exact response.

Work in [16] defined differential privacy, which provides strong privacy guarantees against a powerful adversary.

Definition 2 (Differential Privacy).

Let be a set of private data, be a symmetric binary relation (called adjacency relation) and be a set of possible responses. For , the randomized mapping (called mechanism) is -differentially private if

Remark 1.

We assume the existence of a rich-enough -algebra on the set of possible responses . Then, denotes the set of probability measures over .

Let be a noisy response produced by the -differentially private mechanism . For brevity, we say that “output preserves -privacy of the input ”.

The adjacency relation captures the aspects of the private data that are deemed sensitive. Consider a scheme with users, where each user contributes her real-valued private data , and a private database is composed. For , an adjacency relation that captures the participation of a single individual to the aggregating scheme is defined as:


Adjacency relation can be relaxed to , which is induced by the -norm and is defined as:


where it holds that .

Resilience to post-processing establishes that any post-processing on the output of an -differentially private mechanism cannot hurt the privacy guarantees.

Proposition 3 (Resilience to Post-Processing).

Let be an -differentially private mechanism and be a possibly randomized function. Then, the mechanism is also -differentially private.

More complicated mechanisms can be defined from simple ones using the composition theorem.

Proposition 4 (Composition).

Let mechanisms respectively satisfy and -differential privacy. Then, the composite mechanism defined by is -differentially private.

Proposition 4 provides privacy guarantees whenever the same sensitive data is repeatedly used. Moreover, the resulting privacy level given by Proposition 4 is an upper bound and can severely over-estimate the actual privacy level. The mechanism presented in this paper introduces correlation between mechanisms and , so that it provides much stronger privacy guarantees.

2.2 Lipschitz Privacy

Lipschitz privacy [12], [13] is a slightly stronger version of differential privacy and is often used when the data is defined on metric spaces.

Definition 5 (Lipschitz Privacy).

Let be a metric space and be the set of possible responses. For , the mechanism is -Lipschitz private if the following Lipschitz condition holds:


Lipschitz privacy is closely related to the original definition of differential privacy, where the adjacency relation in differential privacy is defined through the metric . In fact, any Lipschitz private mechanism is also differentially private.

Proposition 6.

For any , an -Lipschitz private mechanism is -differentially private under the adjacency relation :


Adjacency relation defined in (6) can be captured by the -norm under the notion of Lipschitz privacy; the metric is .

Our results are stated within the Lipschitz privacy framework. Proposition 6 implies that our privacy results remain valid within the framework of differential privacy. For brevity, we call an -Lipschitz private mechanism as -private and imply that a differentially private mechanism can be derived.

Similar to differential privacy, Lipschitz privacy is preserved under post-processing (Proposition 3) and composition of mechanisms (Proposition 4). Compared to differential privacy, Lipschitz privacy is more convenient to work with when the data and adjacency relation are defined on a metric space, which allows for the use of calculus tools. Under mild assumptions, the Lipschitz constraint (7) is equivalent to a derivative bound. In particular, for equipped with the metric induced by the norm , a mechanism is -Lipschitz private if


where is the dual norm of . In practice, we check condition (9) to establish the privacy properties of mechanism .

2.3 Optimality of the Laplace Mechanism

Computing the optimal private mechanism for a fixed privacy level is considered an open problem. Specifically, let be the space of private data, be an adjacency relation, be a query, and be a fixed privacy level. The exponential mechanism [1] is a popular technique for constructing private mechanisms.

Proposition 7 (Exponential Mechanism).

Let be -Lipschitz in . Consider the mechanism whose output satisfies


Then, is -Lipschitz private.

The Laplace mechanism is a special instance of the exponential mechanism for real spaces .

Definition 8 (Laplace Mechanism).

Let be the space of private data. The Laplace mechanism is defined as:


The Laplace mechanism can be shown to be -differentially private. In general, however, the Laplace mechanism is suboptimal in the sense of minimum mean-squared error. For the single-dimensional case, the staircase mechanism [18] is the optimal -differentially private mechanism; the mechanism which adds noise whose distribution is shown in Figure 2. However, the Laplace mechanism is proven to be the optimal -Lipschitz private mechanism in the sense of both minimum entropy [14] and minimum mean-squared error [13], whereas the staircase mechanism fails to satisfy Lipschitz privacy due to its discontinuous probability density function.

Figure 2: The staircase mechanism is the optimal -differential private mechanism, whereas the Laplace mechanism is the optimal -Lipschitz private mechanism. Therefore, the
Theorem 9 ([13] Optimality of Laplace).

Consider the -Lipschitz private (in ) mechanism of the form , with . Then, the Laplace mechanism that adds noise with density minimizes the mean-squared error. Namely, for any density , we have:


The optimal private mechanism characterizes the privacy-performance trade-off and is required for gradually releasing sensitive data. Thus, optimality of the Laplace mechanism in Theorem 9 is a key ingredient in our results and renders the problem tractable.

3 Gradual Release of Private Data

The problem of gradually releasing private data is now formulated. Initially, we focus on a single privacy level relaxation from to to and a single-dimensional space of private data . Subsections 3.2 and 3.3 present extensions to high-dimensional spaces and multiple rounds of privacy level relaxations, respectively.

Consider two privacy levels and with . We wish to design a composite mechanism that performs gradual release of data. The first and second coordinates respectively refer to the initial -private and the subsequent -private responses. In practice, given privacy levels and and an input , we sample from the distribution . Initially, only coordinate is published satisfying -privacy guarantees. Once privacy level is relaxed to , response is released as a more accurate response of the same query on the same private data.

An adversary that wishes to infer the private input eventually has access to both responses and . Therefore, the pair needs to satisfy -privacy. On the other hand, an honest user wishes to maximize the accuracy of the response and, therefore, she is tempted to use an estimator and infer a more accurate response . In order to relieve honest users from any computational burden, we wish the best estimator to be as the truncation:


The composition theorem [1] provides a trivial, yet highly conservative, approach. Specifically, compositional rules imply that, if satisfies -privacy and satisfies -privacy, coordinate itself should be -private. In the extreme case that , response alone is expected to be -private and, therefore, is highly corrupted by noise. This is unacceptable, since estimator (13) yields an even noisier response than the initial response . Even if honest users are expected to compute more complex estimators than the truncation one in (13), the approach dictated by composition theorem can still be unsatisfactory.

Specifically, consider the following two scenarios:

  1. An -private response is initially released. Once privacy level is relaxed from to , an supplementary response is released.

  2. No response is initially released. Response is released as soon as the privacy level is relaxed to .

Then, there is no guarantee that the best estimator in Scenario 1 will match the accuracy of the response in Scenario 2. An accuracy gap between the two scenarios would severely impact differential privacy. Specifically, the system designer needs to be strategic when choosing a privacy level. Differently stated, a market of private data based on composition theorems would exhibit friction.

The key idea to overcome this friction is to introduce correlation between responses and . In this work, we focus on Euclidean spaces and mechanisms that approximate the identity query . Our main result states that a frictionless market of private data is feasible and Scenarios 1 and 2 are equivalent. This result has multi-fold implications:

  • A system designer is not required to be strategic with the choice of the privacy level. Specifically, she can initially under-estimate the required privacy level with and she can later fine-tune it to without hurting the accuracy of the final response.

  • A privacy data market can exist and private data can be traded “by the pound”. An -private response can be initially purchased. Next, a supplementary payment can be made in return for a privacy level relaxation to and a refined response . The accuracy of the refined response is, then, unaffected by the initial transaction and is controlled only by the final privacy level .

More concretely, given privacy levels and with , we wish to design a composite mechanism with the following properties:

  1. The restriction of to the first coordinate should match the performance of the optimal -private mechanism . More restrictively, the first coordinate of the composite mechanism should be distributed identically to the optimal -private mechanism :

  2. The restriction of to the first coordinate should be -private. This property is imposed by constraint 1.

  3. The restriction of to the second coordinate should match the performance of the optimal -private mechanism . Similarly to the first coordinate, the second coordinate of the composite mechanism must be distributed identically to the optimal -private mechanism :

  4. Once both coordinates are published, -privacy should be guaranteed. According to Lipschitz privacy, the requirement is stated as follows:


Equations (14) and (15) require knowledge of the optimal -private mechanism. In general, computing the -private mechanism that maximizes a reasonable performance criterion is still an open problem. Theorem 9 establish the optimality of the Laplace mechanism as the optimal private approximation of the identity query.

3.1 Single-Dimensional Case

Initially, we consider the single-dimensional case where equipped with the absolute value. Theorem 9 establish the optimal -private mechanism that is required by Equations (14) and (15):


Mechanism (17) minimizes the mean-squared error from the identity query among all -private mechanisms that use additive noise:


Theorem 10 establishes the existence of a composite mechanism that relaxes privacy from to without any loss of performance.

Theorem 10.

Consider privacy levels and with , and mechanisms of the form:


Then, for density with:


where is the Dirac delta function, the following properties hold:

  1. The mechanism is -private.

  2. The mechanism is optimal, i.e. minimizes the mean-squared error .

  3. The mechanism is -private.

  4. The mechanism is optimal, i.e. minimizes the mean-squared error .


Consider the mechanism induced by the noise density (20). We prove that this mechanism satisfies all the desired properties:

  1. The first coordinate is Laplace-distributed with parameter . For , we get:


    The case follows from the symmetry . Therefore, the first coordinate is -private and achieves optimal performance.

  2. The second coordinate is Laplace-distributed with parameter . We have:


    Thus, the second coordinate achieves optimal performance.

  3. Lastly, we need to prove that the composite mechanism is -private. We handle the delta part separately by defining for a measurable . The probability of landing in set is:


    We take the derivative and use Fubini’s theorem to exchange the derivative with the integral:


This completes the proof. ∎

3.1.1 Single Round of Privacy Relaxation

Theorem 10 achieves gradual release of sensitive data in two steps, first with -privacy and, then, with -privacy. In practice, Theorem 10 can be used as follows:

  • Given the private value , sample noise and release response , which is optimal and respects -privacy.

  • Once privacy level is relaxed from to , sample noise from the conditioned on distribution:


    and release response . Distribution (25) is derived from the joint distribution (20) and ensures both that is -private and that is optimally distributed.

Conditional distribution (25) is shown in Figure 3. Note that for , Distribution (25) is reduced to a delta function:


In words, for no privacy relaxation effectively happens and, thus, no updated response is practically released. Moreover, for , a limiting argument shows that Distribution 25 is reduced to:


Practically, letting cancel any privacy constraints and the exact value of private data can be released . For general values of and , Pearson’s correlation coefficient decreases for more aggressive privacy relations, . Algorithm 1 provides a simple and efficient way to sample given .

Figure 3: Gradual release of private data is performed in the following way. First, the -private response is released, where . Once privacy level is relaxed from to , the supplementary response is released, where is distributed as shown above. The composite mechanism that releases is -private and is optimally distributed.
Privacy levels and , such that , and noise sample .
function RelaxPrivacy()
     switch randomly
         case  with probability :
              return .          
         case  with probability :
              return .          
         case  with probability :
         case  with probability :
              return .          
     end switch
end function
Algorithm 1 Sampling from Distribution (25) for the second noise sample given the first noise sample can be efficiently performed.

3.1.2 Single Round of Privacy Tightening

Tightening the privacy level is impossible, since it implies revoking already released data. Nonetheless, generating a more private version of the same data is still useful in cases such as private data trading. In that case, distribution (20) can be sampled in the opposite direction. Specifically, noise is initially sampled, , and the -private response is released. Next, private data is traded to a different agent under the stronger -privacy guarantees. Noise sample is drawn from distribution


and the -private response is released. Remarkably, response can be generated conditioning only on :


where is independent of , . In words, tightening privacy under the Laplace mechanism does not require access to the original data and can be performed by an agent other than the private data owner. Theorem 3 suggests that the randomized post-processing of the -private response is at least -private. For given by distribution (28), this tightening of privacy level is precisely quantified, i.e. . Recall that our results are tight; no excessive accuracy is sacrificed in the process.

3.2 High-Dimensional Case

Theorem 10 can be generalized for the case that the space of private data is Euclidean equipped with the -norm. Theorem 9 establishes that the Laplace mechanism:


minimizes the mean-squared error from the identity query among all -private mechanisms that use additive noise :


Theorem 9 shows that each coordinate of is independently sampled. This observation implies that Theorem 10 can be applied to dimensions independently.

Theorem 11.

Consider privacy levels , with . Let be an -private mechanism and an -private mechanism of the form:


where Then, gradual release of sensitive data from to is achieved by the probability distribution :


where , . Namely:

  • Mechanism is -private and optimal.

  • Mechanism is the optimal -private mechanism.

  • Mechanism is -private.


Let denote the coordinates of a vector . The desired probability distribution is defined by independently sampling each coordinate using Theorem 10. Let:


The probability distribution satisfies the required marginal distributions:

Moreover, it satisfies -privacy constraints:

where in the last line we used the fact that is -private. This completes the proof. ∎

3.3 Multiple Privacy Relaxations

Theorems 10 and 11 perform privacy relaxation from to . However, the privacy level is possibly updated multiple times. Theorem 12 handles the case where the privacy level is successively relaxed from to , to , until . Specifically, Theorem 12 enables the use of Theorem 10 multiple times while relaxing privacy level from to for . We call this statement the Markov property of the Laplace mechanism.

Theorem 12.

Consider privacy levels with and mechanisms of the form:


Consider the distribution , with:


where . Then, distribution has the following properties:

  1. Each prefix mechanism is -private, for .

  2. Each mechanism is the optimal -private mechanism, i.e. it minimizes the mean-squared error .


The proof uses induction on . The case is handled by Theorem 10. For brevity, we prove the statement for . Let and . Consider the joint probability :


where . Measure possesses all the properties that perform gradual release of private data:

  • All marginal distributions of measure are Laplace with parameters , , and , respectively:

  • Mechanism is -private since is Laplace-distributed with parameter .

  • Mechanism is -private. Margining out shows that , which guarantees -privacy according to Theorem 10.

  • Mechanism is -private. It holds that:


    Algebraic manipulation of the last expression establishes the result: