Strong Hardness of Privacy from Weak Traitor Tracing

Strong Hardness of Privacy from Weak Traitor Tracing

Lucas Kowalczyk Columbia University, Department of Computer Science.luke@cs.columbia.edu.    Tal Malkin Columbia University, Department of Computer Science. tal@cs.columbia.edu.    Jonathan Ullman Northeastern University College of Computer and Information Science. jullman@ccs.neu.edu.    Mark Zhandry MIT EECS. mzhandry@gmail.com.
Abstract

Despite much study, the computational complexity of differential privacy remains poorly understood. In this paper we consider the computational complexity of accurately answering a family of statistical queries over a data universe under differential privacy. A statistical query on a dataset asks “what fraction of the elements of satisfy a given predicate on ?” Dwork et al. (STOC’09) and Boneh and Zhandry (CRYPTO’14) showed that if both and are of polynomial size, then there is an efficient differentially private algorithm that accurately answers all the queries, and if both and are exponential size, then under a plausible assumption, no efficient algorithm exists.

We show that, under the same assumption, if either the number of queries or the data universe is of exponential size, then there is no differentially private algorithm that answers all the queries. Specifically, we prove that if one-way functions and indistinguishability obfuscation exist, then:

  1. For every , there is a family of queries on a data universe of size such that no time differentially private algorithm takes a dataset and outputs accurate answers to every query in .

  2. For every , there is a family of queries on a data universe of size such that no time differentially private algorithm takes a dataset and outputs accurate answers to every query in .

In both cases, the result is nearly quantitatively tight, since there is an efficient differentially private algorithm that answers queries on an exponential size data universe, and one that answers exponentially many queries on a data universe of size .

Our proofs build on the connection between hardness results in differential privacy and traitor-tracing schemes (Dwork et al., STOC’09; Ullman, STOC’13). We prove our hardness result for a polynomial size query set (resp., data universe) by showing that they follow from the existence of a special type of traitor-tracing scheme with very short ciphertexts (resp., secret keys), but very weak security guarantees, and then constructing such a scheme.

1 Introduction

The goal of privacy-preserving data analysis is to release rich statistical information about a sensitive dataset while respecting the privacy of the individuals represented in that dataset. The past decade has seen tremendous progress towards understanding when and how these two competing goals can be reconciled, including surprisingly powerful differentially private algorithms as well as computational and information-theoretic limitations. In this work, we further this agenda by showing a strong new computational bottleneck in differential privacy.

Consider a dataset where each of the elements is one individual’s data, and each individual’s data comes from some data universe . We would like to be able to answer sets of statistical queries on , which are queries of the form “What fraction of the individuals in satisfy some property ?” However, differential privacy [DMNS06] requires that we do so in such a way that no individual’s data has significant influence on the answers.

If we are content answering a relatively small set of queries , then it suffices to perturb the answer to each query with independent noise from an appropriate distribution. This algorithm is simple, very efficient, differentially private, and ensures good accuracy—say, within of the true answer—as long as queries [DN03, DN04, BDMN05, DMNS06].

Remarkably, the work of Blum, Ligett, and Roth [BLR13] showed that it is possible to output a summary that allows accurate answers to an exponential number of queries—nearly —while ensuring differential privacy. However, neither their algorithm nor the subsequent improvements [DNR09, DRV10, RR10, HR10, GRU12, NTZ13, Ull15] are computationally efficient. Specifically, they all require time at least to privately and accurately answer a family of statistical queries on a dataset . Note that the size of the input is bits, so a computationally efficient algorithm runs in time .111It may require exponential time just to describe and evaluate an arbitrary counting query, which would rule out efficiency for reasons that have nothing to do with privacy. In this work, we restrict attention to queries that are efficiently computable in time , so they are not the bottleneck in the computation. For example, in the common setting where each individual’s data consists of binary attributes, namely , the size of the input is but . As a result, all known private algorithms for answering arbitrary sets of statistical queries are inefficient if either the number of queries or the size of the data universe is superpolynomial.

This accuracy vs. computation tradeoff has been the subject of extensive study. Dwork et al. [DNR09] showed that the existence of cryptographic traitor-tracing schemes [CFN94] yields a family of statistical queries that cannot be answered accurately and efficiently with differential privacy. Applying recent traitor-tracing schemes [BZ14], we conclude that, under plausible cryptographic assumptions (discussed below), if both the number of queries and the data universe can be superpolynomial, then there is no efficient differentially private algorithm. [Ull13] used variants of traitor-tracing schemes to show that in the interactive setting, where the queries are not fixed but are instead given as input to the algorithm, assuming one-way functions exist, there is no private and efficient algorithm that accurately answers more than statistical queries. All of the algorithms mentioned above work in this interactive setting, but for many applications we only need to answer a fixed family of statistical queries.

Despite the substantial progress, there is still a basic gap in our understanding. The hardness results for Dwork et al. apply if both the number of queries and the universe are large. But the known algorithms require exponential time if either of these sets is large. Is this necessary? Are there algorithms that run in time or ?

Our main result shows that under the same plausible cryptographic assumptions, the answer is no—if either the data universe or the set of queries can be superpolynomially large, then there is some family of statistical queries that cannot be accurately and efficiently answered while ensuring differential privacy.

1.1 Our Results

Our first result shows that if the data universe can be of superpolynomial size then there is some fixed family of polynomially many queries that cannot be efficiently answered under differential privacy. This result shows that the efficient algorithm for answering an arbitrary family of queries by adding independent noise is optimal up to the specific constant in the exponent.

Theorem 1.1 (Hardness for small query sets).

Assume the existence of indistinguishability obfuscation and one-way functions. Let be a computation parameter. For any polynomial , there is a sequence of pairs with and such that there is no polynomial time differentially private algorithm that takes a dataset and outputs an accurate answer to every query in up to an additive error of .

Our second result shows that, even if the data universe is required to be of polynomial size, there is a fixed set of superpolynomially many queries that cannot be answered efficiently under differential privacy. When we say that an algorithm efficiently answers a set of superpolynomially many queries, we mean that it efficiently outputs a summary such that there is an efficient algorithm for obtaining an accurate answer to any query in the set. For comparison, if , then there is a simple time differentially private algorithm that accurately answers superpolynomially many queries.222The algorithm, sometimes called the noisy histogram algorithm, works as follows. First, convert the dataset to a vector where is the fraction of ’s elements that are equal to . Then, output a vector where is equal to plus independent noise from an appropriately scaled Gaussian distribution. To answer a statistical query defined by a predicate , construct the vector and compute the answer . One can show that this algorithm is differentially private and for any fixed set of statistical queries , with high probability, the maximum error is . The running time is to construct and to evaluate each query. Our result shows that this efficient algorithm is optimal up to the specific constant in the exponent.

Theorem 1.2 (Hardness for small query sets).

Assume the existence of indistinguishability obfuscation and one-way functions. Let be a computation parameter. For any polynomial , there is a sequence of pairs with and such that there is no polynomial time differentially private algorithm that takes a dataset and outputs an accurate answer to every query in up to an additive error of .

Before we proceed to describe our techniques, we make a few remarks about these results. In both of these results, the constant in our result is arbitrary, and can be replaced with any constant smaller than . We also remark that, when we informally say that an algorithm is differentially private, we mean that it satisfies -differential privacy for some and . These are effectively the largest parameters for which differential privacy is a meaningful notion of privacy. That our hardness results apply to these parameters only makes our results stronger.

On Indistinguishability Obfuscation.

Indistinguishability obfuscation (iO) has recently become a central cryptographic primitive. The first candidate construction, proposed just a couple years ago [GGH13], was followed by a flurry of results demonstrating the extreme power and wide applicability of iO (cf., [GGH13, SW14, BZ14, HSW14, BPW16]). However, the assumption that iO exists is currently poorly understood, and the debate over the plausibility of iO is far from settled. While some specific proposed iO schemes have been attacked [CGH15, MSZ16], other schemes seem to resist all currently known attacks [BMSZ16, GMS16]. We also do not know how to base iO on a solid, simple, natural computational assumption (some attempts based on multilinear maps have been made [GLSW15], but they were broken with respect to all current multilinear map constructions).

Nevertheless, our results are meaningful whether or not iO exists. If iO exists, our results show that certain tasks in differential privacy are intractable. Interestingly, unlike many previous results relying on iO, these conclusions were not previously known to follow from even the much stronger (and in fact, false) assumption of virtual black-box obfuscation. If, on the other hand, iO does not exist, then our results still demonstrate a barrier to progress in differential privacy—such progress would need to prove that iO does not exist. Alternatively, our results highlight a possible path toward proving that iO does not exist. We note that other “incompatibility” results are known for iO; for example, iO and certain types of hash functions cannot simultaneously exist [BFM14, BST16].

1.2 Techniques

We prove our results by building on the connection between differentially private algorithms for answering statistical queries and traitor-tracing schemes discovered by Dwork et al. [DNR09]. Traitor-tracing schemes were introduced by Chor, Fiat, and Naor [CFN94] for the purpose of identifying pirates who violate copyright restrictions. Roughly speaking, a (fully collusion-resilient) traitor-tracing scheme allows a sender to generate keys for users so that 1) the sender can broadcast encrypted messages that can be decrypted by any user, and 2) any efficient pirate decoder capable of decrypting messages can be traced to at least one of the users who contributed a key to it, even if an arbitrary coalition of the users combined their keys in an arbitrary efficient manner to construct the decoder.

Dwork et al. show that the existence of traitor-tracing schemes implies hardness results for differential privacy. Very informally, they argue as follows. Suppose a coalition of users takes their keys and builds a dataset where each element of the dataset contains one of their user keys. The family will contain a query for each possible ciphertext . The query asks “What fraction of the elements (user keys) in would decrypt the ciphertext to the message ?” Every user can decrypt, so if the sender encrypts a message as a ciphertext , then every user will decrypt to . Thus, the answer to the statistical query will be .

Suppose there were an efficient algorithm that outputs an accurate answer to each query in . Then the coalition could use it to efficiently produce a summary of the dataset that enables one to efficiently compute an approximate answer to every query , which would also allow one to efficiently decrypt the ciphertext. Such a summary can be viewed as an efficient pirate decoder, and thus the tracing algorithm can use the summary to trace one of the users in the coalition. However, if there is a way to identify one of the users in the dataset from the summary, then the summary is not differentially private.

To instantiate this result, they need a traitor-tracing scheme. Observe that the data universe contains one element for every possible user key, and the set of queries contains one query for every ciphertext, and we want to minimize the size of these sets. Boneh and Zhandry constructed a traitor-tracing scheme where both the keys and the ciphertexts have length equal to the security parameter , which yields hardness for a data universe and query set each of size . The main contribution of this work is to show that we can reduce either the number of possible ciphertexts or the number of possible keys to while the other remains of size .

Suppose we want to reduce the number of possible ciphertexts to . How can we possibly have a secure traitor-tracing scheme with only polynomially many ciphertexts, when even a semantically secure private key encryption scheme requires superpolynomially many ciphertexts? The answer lies in an observation from [Ull13] that in order to show hardness for differential privacy, it suffices to have a traitor-tracing scheme with extremely weak security. First, in the reduction from differential privacy to breaking traitor-tracing the adversary has to produce the pirate decoder using only the coalition’s user keys and does not have access to an encryption oracle. Second, the probability that tracing fails only needs to be , rather than negligible. Both of these relaxations of the standard definition of traitor-tracing are crucial to making the ciphertext size , and as we show, these two relaxations are in fact sufficient. Alternatively, we can use these relaxations also allow us to reduce the key size to . We defer the reader to the constructions of Sections 6 and 7 for more details about how we achieve this goal.

1.3 Related Work

Theorem 1.1 should be contrasted with the line of work on answering width- marginal queries under differential privacy [GHRU13, HRS12, TUV12, CTUW14, DNT14]. A width- marginal query is defined on the data universe . It is specified by a set of positions of size , and a pattern and asks “What fraction of elements of the dataset have each coordinate set to ?” Specifically, Thaler, Ullman, and Vadhan [TUV12], building on the work of Hardt, Rothblum, and Servedio [HRS12] gave an efficient differentially private algorithm for answering width- marginal queries up to an additive error of . There are also computationally efficient algorithms that answer exponentially many queries from even simpler families like point queries and threshold queries [BNS13, BNSV15].

There have been several other attempts to explain the accuracy vs. computation tradeoff in differential privacy by considering restricted classes of algorithms. For example, Ullman and Vadhan [UV11] (building on Dwork et al. [DNR09]) show that, assuming one-way functions, no differentially private and computationally efficient algorithm that outputs a synthetic dataset can accurately answer even the very simple family of -way marginals. This result is incomparable to ours, since it applies to a very small and simple family of statistical queries, but necessarily only applies to algorithms that output synthetic data.

Gupta et al. [GHRU13] showed that no algorithm can obtain accurate answers to all marginal queries just by asking a polynomial number of statistical queries on the dataset. Thus, any algorithm that can be implemented using only statistical queries, even one that is not differentially private, can run in polynomial time.

Bun and Zhandry considered the incomparable problem of differentially private PAC learning [BZ16] and showed that there is a concept class that is efficiently PAC learnable and inefficiently PAC learnable under differential privacy, but is not efficiently PAC learnable under differential privacy, settling an open question of Kasvisiwanathan et al. [KLN11], who introduced the model of differentially private PAC learning.

There is also a line of work using fingerprinting codes to prove information-theoretic lower bounds on differentially private mechanisms [BUV14, SU15a, DSS15]. Namely, that if the data universe is of size , then there is no differentially private algorithm, even a computationally unbounded one, that can answer more than statistical queries. Fingerprinting codes are essentially the information-theoretic analogue of traitor-tracing schemes, and thus these results are technically related, although the models are incomparable.

Finally, we remark that techniques for proving hardness results in differential privacy have also found applications to the problem of interactive data analysis [HU14, SU15b]. The technical core of these results is to show that if an adversary is allowed to ask an online sequence of adaptively chosen statistical queries, then he can not only recover one element of the dataset, but can actually recover every element of the dataset. Doing so rules out any reasonable notion of privacy, and makes many non-private learning tasks impossible. The results are proven using variants of the sorts of traitor-tracing schemes that we study in this work.

2 Differential Privacy Preliminaries

2.1 Differentially Private Algorithms

A dataset is an ordered set of rows, where each row corresponds to an individual, and each row is an element of some the data universe . We write where is the -th row of . We will refer to as the size of the dataset. We say that two datasets are adjacent if can be obtained from by the addition, removal, or substitution of a single row, and we denote this relation by . In particular, if we remove the -th row of then we obtain a new dataset . Informally, an algorithm is differentially private if it is randomized and for any two adjacent datasets , the distributions of and are similar.

Definition 2.1 (Differential Privacy [Dmns06]).

Let be a randomized algorithm. We say that is -differentially private if for every two adjacent datasets and every subset ,

In this definition, may be a function of .

2.2 Algorithms for Answering Statistical Queries

In this work we study algorithms that answer statistical queries (which are also sometimes called counting queries, predicate queries, or linear queries in the literature). For a data universe , a statistical query on is defined by a predicate . Abusing notation, we define the evaluation of a query on a dataset to be

A single statistical query does not provide much useful information about the dataset. However, a sufficiently large and rich set of statistical queries is sufficient to implement many natural machine learning and data mining algorithms [Kea98], thus we are interesting in differentially private algorithms to answer such sets. To this end, let be a set of statistical queries on a data universe .

Informally, we say that a mechanism is accurate for a set of statistical queries if it answers every query in the family to within error for some suitable choice of . Note that , so this definition of accuracy is meaningful when .

Before we define accuracy, we note that the mechanism may represent its answer in any form. That is, the mechanism outputs may output a summary that somehow represents the answers to every query in . We then require that there is an evaluator that takes the summary and a query and outputs an approximate answer to that query. That is, we think of as the mechanism’s answer to the query . We will abuse notation and simply write to mean .333If we do not restrict the running time of the algorithm, then it is without loss of generality for the algorithm to simply output a list of real-valued answers to each queries by computing for every . However, this transformation makes the running time of the algorithm at least . The additional generality of this framework allows the algorithm to run in time sublinear in . Using this framework is crucial, since some of our results concern settings where the number of queries is exponential in the size of the dataset.

Definition 2.2 (Accuracy).

For a family of statistical queries on , a dataset and a summary , we say that is -accurate for on if

For a family of statistical queries on , we say that an algorithm is -accurate for given a dataset of size if for every ,

In this work we are typically interested in mechanisms that satisfy the very weak notion of -accuracy, where the constant could be replaced with any constant . Most differentially private mechanisms satisfy quantitatively much stronger accuracy guarantees. Since we are proving hardness results, this choice of parameters makes our results stronger.

2.3 Computational Efficiency

Since we are interested in asymptotic efficiency, we introduce a computation parameter . We then consider a sequence of pairs where is a set of statistical queries on . We consider databases of size where is a polynomial. We then consider algorithms that take as input a dataset and output a summary in where is a sequence of output ranges. There is an associated evaluator that takes a query and a summary and outputs a real-valued answer. The definitions of differential privacy and accuracy extend straightforwardly to such sequences.

We say that such an algorithm is computationally efficient if the running time of the algorithm and the associated evaluator run in time polynomial in the computation parameter . 444The constraint that the evaluator run in polynomial time sounds academic, but is surprisingly crucial. For any on , there is an extremely simple differentially private algorithm that runs in time and outputs a summary that is accurate for , yet the summary takes time to evaluate [NTZ13]. We remark that in principle, it could require at many as bits even to specify a statistical query, in which case we cannot hope to answer the query efficiently, even ignoring privacy constraints. In this work we restrict attention exclusively to statistical queries that are specified by a circuit of size , and thus can be evaluated in time , and so are not the bottleneck in computation. To remind the reader of this fact, we will often say that is a family of efficiently computable statistical queries.

2.4 Notational Conventions

Given a boolean predicate , we will write to denote the value if is true and if is false. Also, given a vector and an index , we will use to denote the vector in which the -th element of is replaced by some unspecified fixed element of denoted . We also say that a function is negligible, and write , if for every constant .

3 Weakly Secure Traitor-Tracing Schemes

In this section we describe a very relaxed notion of traitor-tracing schemes whose existence will imply the hardness of differentially private data release.

3.1 Syntax and Correctness

For a function and a sequence , an -traitor-tracing scheme is a tuple of efficient algorithms with the following syntax.

  • takes as input a security parameter , runs in time , and outputs secret user keys and a secret master key . We will write to denote the set of keys.

  • takes as input a master key and an index , and outputs a ciphertext . If then we say that is encrypted to index .

  • takes as input a ciphertext and a user key and outputs a single bit . We assume for simplicity that is deterministic.

Correctness of the scheme asserts that if are generated by , then for any pair , . For simplicity, we require that this property holds with probability over the coins of and , although it would not affect our results substantively if we required only correctness with high probability.

Definition 3.1 (Perfect Correctness).

An -traitor-tracing scheme is perfectly correct if for every , and every

3.2 Index-Hiding Security

Intuitively, the security property we want is that any computationally efficient adversary who is missing one of the user keys cannot distinguish ciphertexts encrypted with index from index , even if that adversary holds all other keys . In other words, an efficient adversary cannot infer anything about the encrypted index beyond what is implied by the correctness of decryption and the set of keys he holds.

More precisely, consider the following two-phase experiment. First the adversary is given every key except for , and outputs a decryption program . Then, a challenge ciphertext is encrypted to either or to . We say that the traitor-tracing scheme is secure if for every polynomial time adversary, with high probability over the setup and the decryption program chosen by the adversary, the decryption program has small advantage in distinguishing the two possible indices.

Definition 3.2 (Index Hiding).

A traitor-tracing scheme satisfies (weak) index-hiding security if for every sufficiently large , every and every adversary with running time ,

(1)

In the above, the inner probabilities are taken over the coins of and .

Note that in the above definition we have fixed the success probability of the adversary for simplicity. Moreover, we have fixed these probabilities to relatively large ones. Requiring only a polynomially small advantage is crucial to achieving the key and ciphertext lengths we need to obtain our results, while still being sufficient to establish the hardness of differential privacy.

3.2.1 The Index-Hiding and Two-Index-Hiding Games

While Definition 3.2 is the most natural, in this section we consider some related ways of defining security that will be easier to work with when we construct and analyze our schemes. Consider the following game.

  The challenger generates keys .
  The adversary is given keys and outputs a decryption program .
  The challenger chooses a bit
  The challenger generates an encryption to index ,
  The adversary makes a guess
Figure 1:

Let be the game where we fix the choices of and . Also, define

so that

Then the following is equivalent to (1) in Definition 3.2 as

(2)

In order to prove that our schemes satisfy weak index-hiding security, we will go through an intermediate notion that we call two-index-hiding security. To see why this is useful, In our constructions it will be fairly easy to prove that is small, but because can be positive or negative, that alone is not enough to establish (2). Thus, in order to establish (2) we will analyze the following variant of the index-hiding game.

  The challenger generates keys .
  The adversary is given keys and outputs a decryption program .
  Choose and independently.
  Let and
  Let
Figure 2:

Analogous to what we did with , we can define to be the game where we fix the choices of and , and define

so that

The crucial feature is that if we can bound the expectation of then we get a bound on the expectation of . Since is always positive, we can apply Markov’s inequality to establish (2). Formally, we have the following claim.

Claim 3.3.

Suppose that for every efficient adversary , , and index

Then for every efficient adversary , , and index

(3)

Using this claim we can prove the following lemma.

Lemma 3.4.

Let be a traitor-tracing scheme such that for every efficient adversary , , and index

Then satisfies weak index-hiding security.

Proof.

By applying Claim 3.3 to the assumption of the lemma, we have that for every efficient adversary ,

Now we have

(Markov’s Inequality)

To complete the proof, observe that this final condition is equivalent to the definition of weak index-hiding security (Definition 3.2). ∎

In light of this lemma, we will focus on proving that the schemes we construct in the following sections satisfying the condition

which will be easier than directly establishing Definition 3.2.

4 Hardness of Differential Privacy from Traitor Tracing

In this section we prove that traitor-tracing scheme satisfying perfect correctness and index-hiding security yields a family of statistical queries that cannot be answered accurately by an efficient differentially private algorithm. The proof is a fairly straightforward adaptation of the proofs in Dwork et al. [DNR09] and Ullman [Ull13] that various sorts of traitor-tracing schemes imply hardness results for differential privacy. We include the result for completeness, and to verify that our very weak definition of traitor-tracing is sufficient to prove hardness of differential privacy.

Theorem 4.1.

Suppose there is an -traitor-tracing scheme that satisfies perfect correctness (Definition 3.1) and index-hiding security (Definition 3.2). Then there is a sequence of of pairs where is a set of statistical queries on , , and such that there is no algorithm that is simultaneously,

  1. -differentially private,

  2. -accurate for on datasets , and

  3. computationally efficient.

Theorem 1.1 and 1.2 in the introduction follow by combining Theorem 4.1 above with the constructions of traitor-tracing schemes in Sections 6 and 7. The proof of Theorem 4.1 closely follows the proofs in Dwork et al. [DNR09] and Ullman [Ull13]. We give the proof both for completeness and to verify that our definition of traitor-tracing suffices to establish the hardness of differential privacy.

Proof.

Let be the promised traitor-tracing scheme. For every , we can define a distribution on datasets as follows. Run to obtain secret user keys and a master secret key . Let the dataset be where we define the data universe . Abusing notation, we’ll write .

Now we define the family of queries on as follows. For every ciphertext , we define the predicate to take as input a user key and output . That is,

Recall that, by the definition of a statistical query, for a dataset , we have

Suppose there is an algorithm that is computationally efficient and is -accurate for given a dataset . We will show that cannot satisfy -differential privacy. By accuracy, for every and every fixed dataset , with probability at least , outputs a summary that is -accurate for on . That is, for every , with probability at least ,

(4)

Suppose that is indeed -accurate. By perfect correctness of the traitor-tracing scheme (Definition 3.1), and the definition of , we have that since ,

(5)

Combining Equations (4) and (5), we have that if , , and is -accurate, then we have both

Thus, for every and that is -accurate, there exists an index such that

(6)

By averaging, using the fact that is -accurate with probability at least , there must exist an index such that

(7)

Assume, for the sake of contradiction that is -differentially private. For a given , let be the set of summaries such that (6) holds. Then, by (7), we have

By differential privacy of , we have

Thus, by our definition of , and by averaging over , we have

(8)

But this violates the index hiding property of the traitor tracing scheme. Specifically, if we consider an adversary for the traitor tracing scheme that runs on the keys to obtain a summary , then decrypts a ciphertext by computing and rounding the answer to , then by (8) this adversary violates index-hiding security (Definition 3.2).

Thus we have obtained a contradiction showing that is not -differentially private. This completes the proof. ∎

5 Cryptographic Primitives

5.1 Standard Tools

We will make use of a few standard cryptographic and information-theoretic primitives. We will define these primitives for completeness and to set notation and terminology.

Almost Pairwise Independent Hash Families.

A hash family is a family of functions . To avoid notational clutter, we will use the notation to denote the operation of choosing a random function from and will not explicitly write the seed for the function. We will use to denote the seed length for the function and require that can be evaluated in time .

Definition 5.1.

A family of functions is -almost pairwise independent if for every two distinct points , and every ,

For every , there exists a pairwise independent hash family such that for every .

Pseudorandom Generators.

A pseudorandom generator is a function such that In this definition, denotes the uniform distribution on . Pseudorandom generators exist under the minimal assumption that one-way functions exist.

Pseudorandom Function Families.

A pseudorandom function family is a family of functions . To avoid notational clutter, we will use the notation to denote the operation of choosing a random function from and not explicitly write the seed for the function. We will use to denote the description length for the function. We require that and that can be evaluated in time .

Security requires that oracle access to is indistinguishable from oracle access to a random function. Specifically, for all probabilistic polynomial-time algorithms ,

for some negligible function .

Under the minimal assumption that one-way functions exist, for every pair of functions that are at most exponential, for every , there is a family of pseudorandom functions such that .

A pseudorandom function family is -almost pairwise independent for .

5.2 Puncturable Pseudorandom Functions

A pseudorandom function family is puncturable if there is a deterministic procedure that takes as input and and outputs a new function such that

The definition of security for a punctured pseudorandom function states that for any , given the punctured function , the missing value is computationally unpredictable. Specifically, we define the following game to capture the desired security property.

  The challenger chooses
  The challenger chooses uniform random bit , and samples
  The challenger punctures at , obtaining .
  The adversary is given and outputs a bit .
Figure 3:
Definition 5.2 (Puncturing Secure PRF).

A pseudorandom function family is -puncturing secure if for every ,

5.3 Twice Puncturable PRFs

A twice puncturable PRF is a pair of algorithms .

  • is a randomized algorithm that takes a security parameter and outputs a function where and are parameters of the construction. Technically, the function is parameterized by a seed of length , however for notational simplicity we will ignore the seed and simply use to denote this function. Formally .

  • is a deterministic algorithm that takes a and a pair of inputs and outputs a new function such that

    Formally, .

In what follows we will always assume that and are polynomial in the security parameter and that .

In addition to requiring that this family of functions satisfies the standard notion of cryptographic pseudorandomness, we will now define a new security property for twice puncturable PRFs, called input matching indistinguishability. For any two distinct outputs , consider the following game.

  The challenger chooses such that .
  The challenger chooses independent random bits , and samples
  The challenger punctures at , obtaining .
  The adversary is given and outputs a bit .
Figure 4:

Notice that in this game, we have assured that every has a preimage under . We need this condition to make the next step of sampling random preimages well defined. Technically, it would suffice to have a preimage only for and , but for simplicity we will assume that every possible output has a preimage. When is a random function, the probability that some output has no preimage is at most which is negligible when . Since are assumed to be a polynomial in the security parameter, we can efficiently check if every output has a preimage, thus if is pseudorandom it must also be the case that every output has a preimage with high probability. Since we can efficiently check whether or not every output has a preimage under , and this event occurs with all but negligible probability, we can efficiently sample the pseudorandom function in the first step of .

Definition 5.3 (Input-Matching Secure PRF).

A function family is -input-matching secure if the function family is a secure pseudorandom function and additionally for every with ,

In Section A we will show that input-matching secure twice puncturable pseudorandom functions exist with suitable parameters.

Theorem 5.4.

Assuming the existence of one-way functions, if are polynomials such that , then there exists a pseudorandom function family that is twice puncturable and is -input-matching secure.

5.4 Indistinguishability Obfuscation

We use the following formulation of Garg et al.  [GGH13] for indistinguishability obfuscation:

Definition 5.5 (Indistinguishability Obfuscation).

A indistinguishability obfuscator for a circuit class is a probabilistic polynomial-time uniform algorithm satisfying the following conditions:

  1. preserves the functionality of . That is, for any , if we compute , then for all inputs .

  2. For any and any two circuits with the same functionality, the circuits and are indistinguishable. More precisely, for all pairs of probabilistic polynomial-time adversaries , if

    then

The circuit classes we are interested in are polynomial-size circuits - that is, when is the collection of all circuits of size at most .

When clear from context, we will often drop as an input to and as a subscript for .

6 A Weak Traitor-Tracing Scheme with Very Short Ciphertexts

In this section we construct a traitor-tracing scheme for users where the key length is polynomial in the security parameter and the ciphertext length is only . This scheme will be used to establish our hardness result for differential privacy when the data universe can be exponentially large but the family of queries has only polynomial size.

6.1 Construction

Let denote the number of users for the scheme. Let