ProfileBased Privacy for Locally Private Computations
Abstract
Differential privacy has emerged as a gold standard in privacypreserving data analysis. A popular variant is local differential privacy, where the data holder is the trusted curator. A major barrier, however, towards a wider adoption of this model is that it offers a poor privacyutility tradeoff.
In this work, we address this problem by introducing a new variant of local privacy called profilebased privacy. The central idea is that the problem setting comes with a graph of data generating distributions, whose edges encode sensitive pairs of distributions that should be made indistinguishable. This provides higher utility because unlike local differential privacy, we no longer need to make every pair of private values in the domain indistinguishable, and instead only protect the identity of the underlying distribution. We establish privacy properties of the profilebased privacy definition, such as postprocessing invariance and graceful composition. Finally, we provide mechanisms that are private in this framework, and show via simulations that they achieve higher utility than the corresponding local differential privacy mechanisms.
9 \AtAppendixtheoremsection
1 Introduction
A great deal of machine learning in the 21st century is carried out on sensitive data, and hence the field of privacy preserving data analysis is of increasing importance. Differential privacy [5], introduced in 2006, has become the dominant paradigm for specifying data privacy. A body of compelling results [1, 2, 8, 6, 11] have been achieved in the ”centralized” model, in which a trusted data curator has raw access to the data while performing the privacypreserving operations. However, such trust is not always easy to achieve, especially when the trust must also extend to all future uses of the data.
An implementation of differential privacy that has been particularly popular in industrial applications makes each user into their own trusted curator. Commonly referred to as Local Differential Privacy [3], this model consists of users locally privatizing their own data before submission to an aggregate data curator. Due to the strong robustness of differential privacy under further computations, this model preserves privacy regardless of the trust in the aggregate curator, now or in the future. Two popular industrial systems implementing local differential privacy include Google’s RAPPOR and Apple’s iOS data collection systems.
However, a major barrier for the local model is the undesirable utility sacrifices of the submitted data. A local differential privacy implementation achieves much lower utility than a similar method that assumes trusts in the curator. Strong lower bounds have been found for the local framework [4], leading to pessimistic results requiring massive amounts of data to achieve both privacy and utility.
In this work, we address this challenge by proposing a new restricted privacy definition, called Profilebased privacy. The central idea relies on specifying a graph of data generating distributions, where edges encode sensitive pairs of distributions that should be made indistinguishable. Our framework does not require that all features of the observed data be obscured; instead only the information connected to identifying the distributions must be perturbed. This sidesteps the utility costs of local differential privacy, where every possible pair of observations must be indistinguishable.
2 Preliminary: Privacy definitions
We begin with defining local differential privacy – a prior privacy framework that is related to our definition.
Definition 1.
A mechanism achieves local differential privacy if for every pair of individuals’ private records in we have:
(1) 
Concretely, local differential privacy limits the ability of an adversary to increase their confidence in whether an individual’s private value is versus even with arbitrary prior knowledge. These protections are robust to any further computation performed on the mechanism output.
3 Profilebased Privacy Definition
Before we present the definition and discuss its implications, it is helpful to have a specific problem in mind. We present one possible setting in which our profiles have a clear interpretation.
3.1 Example: Resource Usage Problem Setting
Imagine a shared workstation with access to several resources, such as network bandwidth, specialized hardware, or electricity usage. Different users might use this workstation, coming from a diverse pool of job titles and roles. An analyst wishes to collect and analyze the metrics of resource usage, but also wishes to respect the privacy of the workstation users. One choice of privacy framework is local differentially privacy, in which every value of a resource usage metric is considered sensitive and privatized. Under our alternative profilebased framework, a choice exists to select only the user identities as the sensitive information protected. This shifts the goal away from hiding all features of the resource usages, and permits measurements to be released more faithfully when not indicative of a user’s identity.
3.2 Definition of Local Profilebased Differential Privacy
Our privacy definition revolves around a notion of profiles, which represent distinct potential datagenerating distributions. To preserve privacy, the mechanism’s release must not give too much of an advantage in guessing the release’s underlying profile. However, other facets of the observed data can (and should) be preserved, permitting greater utility than local differential privacy.
Definition 2.
Given a graph consisting of a collection of data generating profiles over the space and collection of edges , a mechanism achieves profilebased differential privacy if for every edge connecting profiles and , and for all outputs we have:
(2) 
Inherent in this definition is an assumption on adversarial prior knowledge: the adversary knows each profile distribution, but has no further auxiliary information about . The protected secrets are the identities of the source distributions, and are not directly related to particular features of the data . These additional assumptions in the problem setting, however, open up avenues for increased performance. By not attempting to completely privatize the raw observations, information that is less relevant for guessing the sensitive profile identity can be preserved for downstream tasks.
The flexible specification of sensitive pairs via edges in the graph permits privacy design decisions that also impact the privacyutility tradeoff. When particular profile pairs can be declared less sensitive, the perturbations required to blur those profiles can be avoided. Such design decisions would be impractical in the dataoriented local differential privacy setting, where the space of pairs of data sets is intractably large.
The local profilebased differential privacy framework exists as an inverse to the goals seen in maximalleakageconstrained hypothesis testing [9], where their hypotheses act similarly to our profiles as data distributions. While they focus on protecting observationprivacy and maintaining distributionutility, we focus on maintaining observationutility and protecting distributionprivacy. Both settings are interesting and situational.
3.3 Discussion of the Resource Usage Problem
This privacy framework is quite general, and as such it helps to discuss its meaning in more concrete terms. Let us return to the resource usage setting. We’ll assume that each user has a personal resource usage profile known prior to the data collection process. The choice of edges in the graph opens up flexibility over what inferences are sensitive. If the graph has many edges, the broad identity of the workstation user will be hidden by forbidding many potential inferences. However, even with this protection not all the information about resource must be perturbed. For example, if all users require roughly the same amount of electricity at the workstation, then electrical usage metrics will not require much obfuscation. Contrast this with the standard local differential privacy scheme, in which every pair of distinct observed values must be obscured.
A more sparse graph might only connect profiles with the same job title or role. These sensitive pairs will prevent inferences about particular identities within each role. However, without connections across job titles, no protection is enforced against inferring the job title of the current workstation user. Thus such a graph declares user identities sensitive, while a user’s role is not sensitive. This permits the released data to be more faithful to the raw observations, since only the peculiarities of resource usage among users of the same role must be obscured, rather than the peculiarities of the different roles.
One important caveat of this definition is that the profile distributions must be known and are assumed to be released a priori, i.e. they are not considered privacy sensitive. If the user profiles cannot all be released, this can be mitigated somewhat by reducing the granularity of the graph. A graph consisting only of profiles for each distinct job role can still encode meaningful protections, since limiting inferences on job role can also limit inferences on highly correlated information like the user’s identity.
The tradeoff in profile granularity is subtle. More profiles permit more structure and opportunities for our definition to achieve better utility than local differential privacy, but also require a greater level of a priori knowledge.
4 Properties
Our privacy definition enjoys several similar properties to other differentialprivacyinspired frameworks. The postprocessing and composition properties are recognized as highly desired traits for privacy definitions [7].
PostProcessing
The postprocessing property specifies that any additional computation (without access to the private information) on the released output cannot result in worse privacy. Following standard techniques, our definition also shares this data processing inequality.
Theorem 3.
If a data sample is drawn from profile , and preserves profilebased privacy, then for any (potentially randomized) function , the release preserves profilebased privacy.
Composition
The composition property allows for multiple privatized releases to still offer privacy even when witnessed together. Our definition also gets a compositional property, although not all possible compositional settings behave nicely. We mitigate the need for composition by focusing on a local model where the data mainly undergoes one privatization.
Profilebased differential privacy enjoys additive composition if truly independent samples are drawn from the same profile. The proof of this follows the same reasoning as the additive composition of differential privacy.
Theorem 4.
If two independent samples and are drawn from profile , and preserves profilebased privacy and preserves profilebased privacy, then the combined release preserves profilebased privacy.
A notion of parallel composition can also be applied if two data sets come from two independent processes of selecting a profile. In this setting, information about one instance has no bearing on the other. This matches the parallel composition of differential privacy when applied to multiple independent individuals.
Theorem 5.
If two independent profiles and are selected, and two observations and and preserves profilebased privacy and preserves profilebased privacy, then the combined release preserves profilebased privacy.
However, this framework cannot offer meaningful protections against adversaries that know about correlations in the profile selection process. This matches the failure of differential privacy to handle correlations across individuals. The definition also does not compose if the same observation is reprocessed, as it adds correlations unaccounted for in the privacy analysis. Although such compositions would be valuable, it is less important when the privatization occurs locally at the time of data collection.
Placing these results in the context of reporting resource usage, we can bound the total privacy loss across multiple releases in two cases. Additive composition applies if a single user emits multiple independent measurements and each measurement is separately privatized. When two users independently release measurements, each has no bearing on the other and parallel composition applies. If correlations exist across measurements (or across the selection of users), no compositional result is provided.
5 Mechanisms
We now provide mechanisms to implement the profilebased privacy definition. Before getting into specifics, let us first consider the kind of utility goals that we can hope to achieve with our privacy definition.
First, because we are only making data from sensitive pairs of profiles indistinguishable, instead of making every individual private value indistinguishable, we expect better utility than local differential privacy – particularly when the sensitive pairs are close. A second important goal is that the output of our mechanism also needs to remain correlated with the input private value; thus, solutions that throw away the input and generate independent data from the profiles are undesirable, as they do not preserve any correlation with the input.
5.1 The OneBit Setting
We begin with a onebit setting – where the input to the mechanism is a single private bit – and build up to the more general discrete setting.
The simplest case of the onebit setting is where we have two profiles and represented by Bernoulli distributions and with parameters and respectively. Here, our goal is to design a mechanism that makes a bit drawn from or indistinguishable; e.g., what we want is a mechanism such that for any ,
(3) 
We observe that one easy way to achieve this is to draw a bit from a Bernoulli distribution that is independent of the original bit . However, this is not desirable as the output bit would lose any correlation with the input, and any and all information in the bit would be discarded.
We will instead use a mechanism that flips the bit with some probability . Lower values of improve the correlation between the output and the initial bit. The flipprobability is obtained by solving the following optimization problem:
(4)  
subject to  
When and (or vice versa), this reduces to the standard randomized response mechanism [12]; however, may be lower if and are closer – a situation where our utility is better than local differential privacy’s.
The mechanism described above only addresses two profiles. If we have a cluster of profiles representing a connected component of the profile graph, we can compute the necessary flipping probabilities across all edges in the cluster. To satisfy all the privacy constraints, it suffices to always use the maximum of all of these flipping probabilities when creating a release. This results in a naive method we will call the One Bit Cluster mechanism, which can be shown to preserve profile based privacy.
Theorem 6.
The One Bit Cluster mechanism achieves profilebased privacy.
The One Bit Cluster mechanism has two limitations. The first, of course, is that it applies only to single bit settings and Bernoulli profiles, instead of general categorical profile distributions. The second limitation is that by treating all pairs of connected profiles similarly, it is often overly conservative as it always perturbs according to the most challenging edge to privatize. When profiles are distant in the graph from a costly edge, it is generally possible to satisfy the privacy definition with lesser perturbations for these distant profiles.
We address the second drawback while remaining in the one bit setting with the Smooth One Bit mechanism, which uses ideas inspired by the smoothed sensitivity mechanism in differential privacy [10]. However, rather than smoothly calibrating the perturbations across the entire space of data sets, a profilebased privacy mechanism needs only to smoothly calibrate over the specified profile graph. This presents a far more tractable task than smoothly handling all possible data sets in differential privacy.
This involves additional optimization variables, , for each of the profiles in . Thus each profile is permittings its own chance of inverting the released bit. This task remains convex as before, and is still tractably optimized.
(5)  
subject to  
Theorem 7.
The Smooth One Bit mechanism achieves profilebased privacy.
5.2 The Categorical Setting
We now show how to generalize this model into the categorical setting. This involves additional constraints, as well as a (possibly) domain specific objective that maximizes some measure of fidelity between the input and the output.
Specifically, suppose we have categorical profiles each with categories; we introduce variables to optimize, with each profile receiving a transition matrix. To keep track of these variables, we introduce the following notation:

: a set of categorical profiles in dimensions. denote the components of .

: A set of by transition matrix that represents the mechanism’s release probabilities for profile . represents the th element of the matrix .

In an abuse of notation, is a constraint that applies elementwise to all components of the resulting vectors on each side.
With this notation, we can express our optimization task:
(6)  
subject to  
To address the tractability of the optimization, we note that each of the privacy constraints are linear constraints over our optimization variables. We further know the feasible solution set is nonempty, as trivial noninformative mechanisms achieve privacy. All that is left is to choose a suitable objective function to make this a readily solved convex problem.
To settle onto an objective will require some domainspecific knowledge of the tradeoffs between choosing which profiles and which categories to report more faithfully. Our general choice is a maximum across the offdiagonal elements, which attempts to uniformly minimize the probability of any data corruptions. This can be further refined in the presence of a prior distribution over profiles, to give more importance to the profiles more likely to be used.
Once the optimization is solved, the Smooth Categorical Mechanism merely applies the appropriate transition probabilities to the observed input.
Theorem 8.
The Smooth Categorical mechanism achieves profilebased privacy.
5.3 Utility Theorems
The following theorems present utility bounds which illustrate potential improvements upon local differential privacy; a more detailed numerical simulation is presented in Section 6.
Theorem 9.
If is a mechanism that preserves local differential privacy, then for any graph of sensitive profiles, also preserves profilebased differential privacy.
An immediate result of Theorem 9 is that, in general and for any measure of utility on mechanisms, the profilebased differential privacy framework will never require worse utility than a local differential privacy approach. However, in specific cases, stronger results can be shown.
Theorem 10.
Suppose we are in the singlebit setting with two Bernoulli profiles and with parameters and respectively. If , then the solution to (4) satisfies .
Observe that to attain local differential privacy with parameter by a similar bitflipping mechanism, we need a flipping probability of , while we get bounds of the form . Thus, profile based privacy does improve over local differential privacy in this simple case. The proof of Theorem 10 follows from observing that this value of satisfies all constraints in the optimization problem (4).
6 Evaluation
We next evaluate our privacy mechanisms and compare them against each other and the corresponding local differential privacy alternatives. In order to understand the privacyutility tradeoff unconfounded by model specification issues, we consider synthetic data in this paper.
6.1 Experimental Setup
We look at three experimental settings – BernoulliCouplet, BernoulliChain and CategoricalChain3.
Settings.
In BernoulliCouplet, the profile graph consists of two nodes connected by a single edge . Additionally, each profile is a Bernoulli distribution with a parameter .
In BernoulliChain, the profile graph consists of a chain of nodes, where successive nodes in the chain are connected by an edge. Each profile is still a Bernoulli distribution with parameter . We consider two experiments in this category – BernoulliChain6, where there are six profiles corresponding to six values of that are uniformly distributed across the interval , and BernoulliChain21, where there are profiles corresponding to uniformly distributed on .
Finally, in CategoricalChain, the profile graph comprises of three nodes connected into a chain . Each profile however, corresponds to a 4dimensional categorical distribution, instead of Bernoulli.
0.2  0.3  0.4  0.1  
0.3  0.3  0.3  0.1  
0.4  0.4  0.1  0.1 
Baselines.
For BernoulliCouplet and BernoulliChain, we use Warner’s Randomized Response mechanism [12] as a local differentially private baseline. For CategoricalChain, the corresponding baseline is the ary version of randomized response.
For BernoulliCouplet, we use our Smooth One Bit mechanism to evaluate our framework. For CategoricalChain, we use the Smooth Categorical mechanism.
6.2 Results
Figure 2(a) plots the flipping probability for BernoulliCouplet as a function of the difference between profile parameters . We find that as expected, as the difference between the profile parameters grows, so does the flipping probability and hence the noise added. However, in all cases, this probability stays below the corresponding value for local differential privacy – the horizontal black line – thus showing that profilebased privacy is an improvement.
Figure 3 plots the probability that the output is as a function of for each profile in BernoulliChain6 and BernoulliChain21. We find that as expected for low , the probability that the output is is close to for both the local differential privacy baseline and our method, whereas for higher , it is spread out more evenly, (which indicates higher correlation with the input and higher utility). Additionally, we find that our Smooth One Bit mechanism performs better than the baseline in both cases.
Figures 2(b)2(c) plot the utility across different outputs in the CategoricalChain setting. We illustrate its behavior through a small setting with 3 profiles, each with 4 categories. We can no longer plot the entirety of these profiles, so at each privacy level we measure the maximum absolute error for each output. Thus, in this setting, each privacy level is associated with 4 costs of the form given in (7). This permits the orthogonal information property to be seen.
(7) 
Our experiments show the categories less associated with the profile selection have lower associated costs than the more informative categories. However, the Local differential privacy baseline fails to exploit any of this structure and performs worse.
7 Conclusion
In conclusion, we provide a novel definition of local privacy – profile based privacy – that can achieve better utility than local differential privacy. We prove properties of this privacy definition, and provide mechanisms for two discrete settings. Simulations show that our mechanisms offer superior privacyutility tradeoffs than standard local differential privacy.
Acknowledgements.
We thank ONR under N00014161261, UC Lab Fees under LFR 18548554 and NSF under 1804829 for research support.
References
 [1] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. J. Mach. Learn. Res., 12:1069–1109, July 2011.
 [2] Kamalika Chaudhuri, Anand Sarwate, and Kaushik Sinha. Nearoptimal differentially private principal components. In Advances in Neural Information Processing Systems, pages 989–997, 2012.
 [3] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 429–438. IEEE, 2013.
 [4] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 429–438. IEEE, 2013.
 [5] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006.
 [6] James Foulds, Joseph Geumlek, Max Welling, and Kamalika Chaudhuri. On the theory and practice of privacypreserving bayesian data analysis. In UAI, 2016.
 [7] Daniel Kifer and Ashwin Machanavajjhala. A rigorous and customizable framework for privacy. In Proceedings of the 31st ACM SIGMODSIGACTSIGAI symposium on Principles of Database Systems, pages 77–88. ACM, 2012.
 [8] Daniel Kifer, Adam Smith, Abhradeep Thakurta, Shie Mannor, Nathan Srebro, and Robert C. Williamson. Private convex empirical risk minimization and highdimensional regression. In In COLT, pages 94–103, 2012.
 [9] Jiachun Liao, Lalitha Sankar, Flavio P Calmon, and Vincent YF Tan. Hypothesis testing under maximal leakage privacy constraints. In Information Theory (ISIT), 2017 IEEE International Symposium on, pages 779–783. IEEE, 2017.
 [10] Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 1113, 2007, pages 75–84, 2007.
 [11] YuXiang Wang, Stephen Fienberg, and Alex Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In International Conference on Machine Learning, pages 2493–2502, 2015.
 [12] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
Appendix A Proof of Theorems
Theorem 11.
If a data sample is drawn from profile , and preserves profilebased privacy, then for any (potentially randomized) function , the release preserves profilebased privacy.
Proof:
Let and . This is a result following from a standard data processing inequality.
(8)  
(9)  
(10)  
(11) 
Theorem 12.
If two independent samples and are drawn from profile , and preserves profilebased privacy and preserves profilebased privacy, then the combined release preserves profilebased privacy.
Proof: The proof of this statement relies on the conditional independence of the two releases and , conditioned on being from the same profile . Let be another profile such that there is an edge in . By marginalizing over the two independent variables , we may bound the combined privacy loss.
(12)  
(13)  
(14)  
(15)  
(16) 
Observation 1.
This proof does not hold if and are not independent given . This may occur if the same observational data is privatized twice. We do not provide a composition result for this case.
Theorem 13.
If two independent profiles and are selected, and two observations and and preserves profilebased privacy and preserves profilebased privacy, then the combined release preserves profilebased privacy.
Proof: For the purposes of this setting, let and be two random variables representing the choice of profile in the first and second selections.
Since the two profiles and their observations are independent, the two releases and contain no information about each other. That is, . Similarly we have .
Let and be profiles such that the edges and are in .
(17)  
(18)  
(19)  
(20)  
(21)  
(22) 
A similar derivation conditioning on results in a ratio bounded by . Thus to get a single bound for the combined release , we take the maximum .
Observation 2.
This proof does not hold if the profile selection process is not independent. We do not provide a composition result for this case.
Theorem 1.
The One Bit Cluster mechanism achieves profile based privacy.
Proof: By direct construction, it is known that the flipping probabilities generated for single edges will satisfy the privacy constraints. What remains to be shown for the privacy analysis is that taking will satisfy the privacy constraints for all the edges simulataneously.
To show this, we will demonstrate a monotonicity property: if a flipping probability guarantees a certain privacy level, then so too do all the probalities in the interval . By taking the maximum across all edges, this algorithm exploits the monotonicity to ensure all the constraints are met simultaneously.
When computing the privacy level, we have two output values and thus two output ratios to consider:
(23)  
(24) 
Without loss of generality, assume . (If they are equal, then all possible privacy levels are achieved trivially.) This means following two quantities are positive and equal to the absolute values above when .
(25)  
(26) 
Our next task is to show that these quantities are convex when the flipping chance is varied. Taking just the term for now, we compute the derivatives.
(27)  
(28)  
(29)  
(30) 
The final inequality arises from our assumption that . A similar computation on the term also finds that the derivative is always negative.
This monotonicity implies that increasing (up to 0.5 at most) only decreases the probability ratios. In the limit when , the ratios are precisely 1 (and the logarithm is 0). Thus if achieves a certain privacy level, all satisfying achieve a privacy level at least as strong.
The One Bit Cluster mechanism takes the maximum across all edges, ensuring the final flipping probability is no less than the value needed by each edge to achieve probability ratios within . Therefore each edge constraint is satisfied by the final choice of flipping probability, and the mechanism satisfies the privacy requirements.
Theorem 14.
The Smooth One Bit mechanism achieves profilebased privacy.
Theorem 15.
The Smooth Categorical mechanism achieves profilebased privacy.
The Smooth One Bit mechanism and Smooth Categorical mechanism satisfy a privacy analysis directly from the constraints of the optimization problem. These optimizations are done without needing any access to a sensitive observation, and as such pose no privacy risk. Implicitly, the solution to the optimization problem is verified to satisfy the constraints before being used.
Theorem 16.
If is a mechanism that preserves local differential privacy, then for any graph of sensitive profiles, also preserves profilebased differential privacy.
Proof: The proof of this theorem lies in that the strong protections given by local differential differential privacy to the observed data also extend to protecting the profile identities. Let , the output of a locally differentially private algorithm that protects any two distinct data observations and . As local differential privacy mechanisms do not use profile information, the distribution of depends only on and ignores . To prove the generality of this analysis over any graph , we will show the privacy constraint is satisfied for any possible edge of two arbitrary profiles.
(31)  
(32)  
(33)  
(34) 
If the final inequality did not hold, one would be able to find two values and such that the output violates the local differential privacy constraint, which contradicts our assumption on .
Theorem 17.
Suppose we are in the singlebit setting with two Bernoulli profiles and with parameters and respectively. If , then the solution to (35) satisfies .
Proof:
Direct computation shows the desired constraints are met with this value for .
(35)  
subject to  
First, we note that by our assumption and , we immediately have two of our constraints trivially satisfied given , since and .
Two constraints of interest remain:
(36)  
(37) 
We know that these ratios are monotonic in , so to solve these inequalities, it suffices to find the values of where we have equality on these two constraints. Any values of larger than this (and less than ) will therefore satsify the inequality.
Since both constraints must be satisfied simultaneously, we can complete our statement by taking the maximum of the two points given by our constraints along with knowing .