Truncated Laplacian Mechanism for Approximate Differential Privacy

Truncated Laplacian Mechanism for Approximate Differential Privacy

Quan Geng, Wei Ding, Ruiqi Guo, and Sanjiv Kumar
Google AI
New York, NY 10011
Email: {qgeng, vvei, guorq, sanjivk}@google.com
Abstract

We derive a class of noise probability distributions to preserve -differential privacy for single real-valued query function. The proposed noise distribution has a truncated exponential probability density function, which can be viewed as a truncated Laplacian distribution. We show the near-optimality of the proposed truncated Laplacian mechanism in various privacy regimes in the context of minimizing the noise amplitude and noise power. Numeric experiments show the improvement of the truncated Laplacian mechanism over the optimal Gaussian mechanism by significantly reducing the noise amplitude and noise power in various privacy regions.

I Introduction

Differential privacy, introduced by Dwork et al. (2006b), is a framework to quantify to what extent individual privacy in a statistical dataset is preserved while releasing useful aggregate information about the dataset. Differential privacy provides strong privacy guarantees by requiring the near-indistinguishability of whether an individual is in the dataset or not based on the released information. For more motivation and background of differential privacy, we refer the readers to the survey by Dwork (2008) and the book by Dwork and Roth (2014).

The classic differential privacy is called -differential privacy, which imposes an upper bound on the multiplicative distance of the probability distributions of the randomized query outputs for any two neighboring datasets, and the standard approach for preserving -differential privacy is to add a Laplacian noise to the query output. Since its introduction, differential privacy has spawned a large body of research in differentially private data-releasing mechanism design, and the noise-adding mechanism has been applied in many machine learning algorithms to preserve differential privacy, e.g., logistic regression (Chaudhuri and Monteleoni, 2008), empirical risk minimization (Chaudhuri et al., 2011), online learning (Jain et al., 2012), statistical risk minimization (Duchi et al., 2012), deep learning (Shokri and Shmatikov, 2015; Abadi et al., 2016; Phan et al., 2016; Agarwal et al., 2018), hypothesis testing (Sheffet, 2018), matrix completion (Jain et al., 2018), expectation maximization (Park et al., 2017), and principal component analysis (Chaudhuri et al., 2012; Ge et al., 2018).

To fully make use of the randomized query outputs, it is important to understand the fundamental trade-off between privacy and utility (accuracy). Ghosh et al. (2009) studied a very general utility-maximization framework for a single count query with sensitivity one under -differential privacy. Gupte and Sundararajan (2010) derived the optimal noise probability distributions for a single count query with sensitivity one for minimax (risk-averse) users. Geng and Viswanath (2016b) derived the optimal -differentially privacy noise adding mechanism for single real-valued query function with arbitrary query sensitivity, and show that the optimal noise distribution has a staircase-shaped probability density function. Geng et al. (2015) generalized the result in Geng and Viswanath (2016b) to two-dimensional query output space for the cost function, and show the optimality of a two-dimensional staircase-shaped probability density function. Soria-Comas and Domingo-Ferrer (2013) also independently derived the staircase-shaped noise probability distribution under a different optimization framework.

A relaxed notion of -differential privacy is -differential privacy, introduced by Dwork et al. (2006a). The common interpretation of -differential privacy is that it is -differential privacy “except with probability (Mironov, 2017). The standard approach for preserving -differential privacy is the Gaussian mechanism, which adds a Gaussian noise to the query output. Geng and Viswanath (2016a) studied the trade-off between utility and privacy for a single integer-valued query function in -differential privacy. Geng and Viswanath (2016a) show that for and cost functions, the discrete uniform noise distribution is optimal for -differential privacy when the query sensitivity is one, and is asymptotically optimal as for arbitrary query sensitivity. Balle and Wang (2018) improved the classic analysis of the Gaussian mechanism for -differential in the high privacy regime (), and develops an optimal Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation. Geng et al. (2018) derive the optimal noise-adding mechanism for single real-valued query function under -differential privacy, and show that a uniform noise distribution with probability mass at the origin is optimal for a large class of cost functions.

I-a Our Contributions

In this work, we derive a class of noise probability distributions to preserve -differential privacy for single real-valued query function. The proposed noise distribution has a truncated exponential probability density function, which can be viewed as a truncated Laplacian distribution. We show the near-optimality of the proposed truncated Laplacian mechanism in various privacy regimes in the context of minimizing the noise amplitude and noise power. Our result closes the multiplicative gap between the lower bound and the upper bound (using uniform distribution) in the analysis of Geng and Viswanath (2016a). Numeric experiments show the improvement of the truncated Laplacian mechanism over the optimal Gaussian mechanism by significantly reducing the noise amplitude and noise power.

I-B Organization

The paper is organized as follows. In Section II, we give some preliminaries on differential privacy, and formulate the trade-off between privacy and utility under -differential privacy for a single real-valued query function as a functional optimization problem. Section III presents the truncated Laplacian mechanism for preserving -differential privacy. Section IV applies the truncated Laplacian mechanism to derive new upper bounds for the and cost functions, corresponding to noise amplitude and noise power. Section V derives new lower bounds on the minimum noise magnitude and noise power, and show that the lower bounds and upper bounds are close in various privacy regimes, which thus establishes the (asymptotic) optimality of the truncated Laplacian mechanism. Section VI conducts numeric experiments to compare the performance of the truncated Laplacian mechanism with the optimal Gaussian mechanisms in the context of minimizing noise amplitude and noise power.

Ii Problem Formulation

In this section, we first give some preliminaries on differential privacy, and then formulate the trade-off between privacy and utility under -differential privacy for a single real-valued query function as a functional optimization problem.

Ii-a Background on Differential Privacy

Consider a real-valued query function

where is the set of all possible datasets. The real-valued query function will be applied to a dataset, and the query output is a real number. Two datasets are called neighboring datasets if they differ in at most one element, i.e., one is a proper subset of the other and the larger dataset contains just one additional element Dwork (2008). A randomized query-answering mechanism for the query function will randomly output a number with probability distribution depends on query output , where is the dataset.

Definition 1 (-differential privacy (Dwork et al., 2006a)).

A randomized mechanism gives -differential privacy if for all data sets and differing on at most one element, and all ,

(1)

Ii-B -Differential Privacy Constraint on the Noise Probability Distribution

The sensitivity of a real-valued query function is defined as:

Definition 2 (Query Sensitivity (Dwork, 2008)).

For a real-valued query function , the sensitivity of is defined as

for all differing in at most one element.

A standard approach for preserving differential privacy is query-output independent noise-adding mechanisms, where a random noise is added to the query output. Given a dataset , a query-output independent noise-adding mechanism will release the query output corrupted by an additive random noise with probability distribution :

The -differential privacy constraint (1) on is that for any such that (corresponding to the query outputs for two neighboring datasets),

(2)

where , is defined as the set .

Equivalently, the -differential privacy constraint on the noise probability distribution is

(3)

Ii-C Utility Model

Consider a cost function , which is a function of the additive noise in the query-output noise-adding mechanism. Given an additive noise , the cost is , and thus the expectation of the cost over is

(4)

Our objective is to minimize the expectation of the cost over the noise probability distribution for preserving -differential privacy.

Ii-D Optimization Problem

Combining the differential privacy constraint (3) and the objective function (4), we formulate a functional optimization problem:

(5)
subject to
(6)

Iii Truncated Laplacian Mechanism

In this section, we present a class of noise probability distributions to preserve -differential privacy constraint. The probability distribution can be viewed as a truncated Laplacian distribution.

Given and the query sensitivity , consider a probability distribution with a symmetric probability density function of defined as:

(7)

where

Fig. 1: Noise probability density function of the truncated Laplacian mechanism.

We discuss some properties of the probability density function :

  • is symmetric and monotonically decreasing in .

  • is exponentially decaying in with rate proportional to .

  • , i.e., . Indeed,

  • is a valid probability density function, and . Indeed,

Definition 3 (Truncated Laplacian mechanism).

Given the query sensitivity , and the privacy parameters , the truncated Laplacian mechanism adds a noise with probability distribution defined in (7) to the query output.

Theorem 1.

The truncated Laplacian mechanism preserves -differential privacy.

Proof.

Equivalently, we need to show that the truncated Laplacian distribution defined in (7) satisfies the -differential privacy constraint (6).

We are interested in maximizing in (6) and show that the maximum over is upper bounded by . Since is symmetric and monotonically decreasing in , without loss of generality, we can assume .

To maximize , shall not contain points in , as

shall not contain points in , as

Therefore, is maximized for some set . Since is monotonically decreasing in , is maximized at and the maximum value is

We conclude that satisfies the -differential privacy constraint (6). ∎

Iv Upper Bound on Noise Amplitude and Noise Power

In this section, we apply the truncated Laplacian mechanism to derive new upper bounds on the minimum noise amplitude and noise power, corresponding to and cost functions, under -differential privacy.

Let denote the set of noise probability distributions satisfying the -differential privacy constraint (6).

Theorem 2 (Upper Bound on Minimum Noise Amplitude).

For the cost function, i.e., ,

(8)
Proof.

We can compute the cost for the truncated Laplacian distribution defined in (7) via

Since the noise with probability density distribution can preserve -differential privacy,this gives an upper bound on . ∎

Note that in Theorem 2:

  • Given , when , the upper bound converges to , and the truncated Laplacian mechanism will be reduced to the standard Laplacian mechanism.

  • Given , when , the upper bound converges to . Indeed, when , , and thus

    and the truncated Laplacian mechanism is reduced to a uniform distribution in the interval with probability density .

  • In the regime , the upper bound is

    (9)

Similarly, we can derive an upper bound on the minimum noise power for the cost function.

Theorem 3 (Upper Bound on Minimum Noise Power).

For the cost function, i.e., ,

(10)
Proof.

We can compute the cost for the truncated Laplacian distribution defined in (7) via

Since the noise with probability density distribution can preserve -differential privacy, this gives an upper bound on . ∎

Note that in Theorem 3:

  • Given , when , the upper bound converges to , and the truncated Laplacian mechanism will be reduced to the standard Laplacian mechanism.

  • Given , when , the upper bound converges to . Indeed, let . When , , and thus

    and the truncated Laplacian mechanism is reduced to a uniform distribution in the interval with probability density .

  • In the regime , the right hand side of (10) is

    (11)

V Lower Bound on Noise Amplitude and Noise Power

In this section, we derive new lower bounds on the minimum noise amplitude and noise power, and show that they match the upper bounds in Section IV in various privacy regimes, and hence establish the (asymptotic) optimality of the truncated Laplacian mechanisms in these privacy regimes.

Geng and Viswanath (2016a) derived a lower bound for -differential privacy for an integer-valued query function. Extending this result to the continuous setting, we show a similar lower bound for real-valued query function under -differential privacy. In particular, we show that the lower bound matches the upper bound (achieved by truncated Laplacian mechanism) in the high privacy regime when . Our results closes the multiplicative gap between the lower bound and the upper bound (using uniform distribution) in the analysis of Geng and Viswanath (2016a).

First, we give a lower bound for -differential privacy for integer-valued query function due to Geng and Viswanath (2016a).

Define

To avoid integer rounding issues, assume that there exists an integer such that

Theorem 4 (Theorem 8 in Geng and Viswanath (2016a)).

Consider a symmetric cost function . Given the privacy parameters and the discrete query sensitivity , if a discrete probability distribution satisfies that

(12)

and the cost function satisfies that

(13)

then we have

(14)
Theorem 5 (Lower Bound on Minimum Noise Amplitude).

For the cost function, i.e., ,

(15)
Proof.

Given , we can derive a lower bound on the cost by discretizing the probability distributions and applying the lower bound for integer-valued query functions in Theorem 4.

We first discretize the probability distributions . Given a positive integer , define a discrete probability distribution via

Define the corresponding discrete cost function via

It is easy to see that

As the continuous probability distribution satisfies -differential privacy constraint with query sensitivity , the discrete probability distribution satisfies the discrete -differential privacy constraint (12) with query sensitivity , i.e., satisfies

We can verify that the condition (13) in Theorem 4 holds for and with query sensitivity when is sufficiently large. Indeed,

where the last step holds when .

The lower bound in (14) is

Therefore, for any , we have

and thus

Next we compare the lower bound (15) in Theorem 5 with the upper bound (8) in Theorem 2 on minimum noise amplitude under -differential privacy, and show that they are close in various privacy regions, which establishes the optimality of the truncated Laplacian mechanism in these privacy regimes. More precisely,

Corollary 6 (Comparison of Lower bound and Upper bound on Minimum Noise Amplitude).
(16)
(17)
(18)
Proof.
  • Case : When , the upper bound (8) converges to , and the lower bound converges to , which matches the upper bound as . Therefore,

  • Case : When , the upper bound (8) converges to . For the lower bound, we have

    and thus the lower bound converges to as . Therefore,

  • Case : The upper bound (8) converges to . For the lower bound, since , we have

    As , , and thus

    Note that as .

    Therefore, as ,

    Therefore, is lower bounded by in the regime . Since it is also upper bounded by by the truncated Laplacian mechanism (see Equation (9)), we conclude that

    and thus the truncated Laplacian mechanism is asymptotically optimal in this regime in the context of minimizing noise amplitude.

Theorem 7 (Lower Bound on Minimum Noise Power).

For the cost function, i.e., ,

(19)
Proof.

Similar to the proof of Theorem 5, given , we can derive a lower bound on the cost by discretizing the probability distributions and applying the lower bound for integer-valued query functions in Theorem 4.

We first discretize the probability distribution . Given a positive integer , define a discrete probability distribution via

Define the corresponding discrete cost function via

It is easy to see that

As the continuous probability distribution satisfies -differential privacy constraint with continuous query sensitivity , the discrete probability distribution