On the Relationship Between Inference and Data Privacy in Decentralized IoT Networks

On the Relationship Between Inference and Data Privacy in Decentralized IoT Networks

Meng Sun, and Wee Peng Tay,  This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 1 grant RG20/17 (S).The authors are with the Department of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, e-mails: MSUN002@e.ntu.edu.sg, wptay@ntu.edu.sg
Abstract

In a decentralized Internet of Things (IoT) network, a fusion center receives information from multiple sensors to infer a public hypothesis of interest. To prevent the fusion center from abusing the sensor information, each sensor sanitizes its local observation using a local privacy mapping, which is designed to achieve both inference privacy of a private hypothesis and data privacy of the sensor raw observations. Various inference and data privacy metrics have been proposed in the literature. We introduce the concepts of privacy implication and non-guarantee to study the relationships between these privacy metrics. We propose an optimization framework in which both local differential privacy (data privacy) and information privacy (inference privacy) metrics are incorporated. In the parametric case where sensor observations’ distributions are known a priori, we propose a two-stage local privacy mapping at each sensor, and show that such an architecture is able to achieve information privacy and local differential privacy to within the predefined budgets. For the nonparametric case where sensor distributions are unknown, we adopt an empirical optimization approach. Simulation and experiment results demonstrate that our proposed approaches allow the fusion center to accurately infer the public hypothesis while protecting both inference and data privacy.

Inference privacy, data privacy, information privacy, local differential privacy, decentralized detection, Internet of Things
\DeclareDocumentCommand

¶ g d() g \IfNoValueTF#3 \IfNoValueTF#1 \IfNoValueTF#2 PP(#2)\IfNoValueTF#2 P_#1P_#1(#2)\IfNoValueTF#1 P( #2  |   #3 )P_#1( #2  |   #3 ) \DeclareDocumentCommand\E g o g \IfNoValueTF#3 \IfNoValueTF#1 \IfNoValueTF#2 EE[#2]\IfNoValueTF#2 E_#1E_#1[#2]\IfNoValueTF#1 E[ #2  |   #3 ]E_#1[ #2  |   #3 ]

I Introduction

With the proliferation of Internet of Things (IoT) devices like smart phones and home voice recognition assistants, protecting the privacy of users has attracted considerable attention in recent years [1, 2, 3, 4, 5]. Data collected by IoT devices to provide services that lead to better healthcare, more efficient air conditioning, and safer cities [6, 7], may be used for more nefarious purposes like tracking an individual without her explicit consent. An individual’s privacy has been enshrined as a fundamental right through the laws of many countries [8, 9], and privacy protection mechanisms are increasingly being adopted by IoT product makers. For example, Apple Inc. have recently started to implement local differential privacy mechanisms into their iCloud product [10].

We consider an IoT network consisting of multiple sensors, each making a private observation, which is first distorted through a privacy mapping before being sent to a fusion center. The information received from all the sensors is used by the fusion center to perform inference on a public hypothesis of interest. Privacy for this IoT network can be categorized into two classes: data privacy and inference privacy. Data privacy refers to the protection of each sensor’s raw private observation from the fusion center, i.e., upon receiving information from all the sensors, it is difficult for the fusion center to infer the original sensor observations. Protecting data privacy alone is not sufficient to prevent privacy leakage. A data privacy mechanism obfuscates the raw data while still allowing statistical information to be extracted from the data. Given multiple information sources, each with its local data privacy mechanism, it is possible to perform a correlation attack [11] leading to de-anonymization and other types of privacy leakage as shown in the examples in [12].

Inference privacy refers to preventing the fusion center from making certain statistical inferences it has not been authorized to perform. For example in using on-body wearables for fall detection, the fusion center is authorized to perform fall detection, but not authorized to detect if a person is exercising or performing another activity. Prevention of statistical inference of the latter activities is inference privacy, while preventing the fusion center from reconstructing the raw sensor data up to a certain fidelity is data privacy. It can be seen from this example that distortion of the raw sensor data to achieve data privacy does not necessarily remove all statistical information required to infer if the person is performing a private activity, unless the sensor data is so heavily distorted that even fall detection becomes difficult. On the other hand, inference privacy also does not guarantee data privacy as inference privacy mechanisms target to protect only specific statistical inferences. For example, blurring certain parts of an image may prevent inference of certain objects in the image, but does not necessarily distort the whole image significantly.

The main focus of this paper is to derive insights into the relationships between various data and inference privacy metrics, and to design a privacy-preserving decentralized detection architecture for IoT networks where the level of data and inference privacy can be chosen. We aim to achieve a good tradeoff between data privacy, inference privacy and the detection accuracy of of the public hypothesis at the fusion center.

I-a Related Work

Data privacy in cloud services and applications can be achieved using homomorphic encryption[13, 14, 15, 16], which allows a cloud server to compute on encrypted data without decryption. The encrypted result is then made available to the requester, who is able to decrypt it. By comparison, in decentralized detection, the fusion center needs to play the roles of both the cloud server and requester, making it impossible to apply homomorphic encryption techniques here. Other data privacy works consider local differential privacy[17, 18, 19, 20], which corrupts each sensor’s local observation so that the fusion center cannot infer it. In [21], the authors analyzed the tradeoff between local differential privacy budget and the utility of statistical estimators used at the fusion center. The paper [22] analyzed the tradeoff between utility and data privacy, and compared the performance of different data privacy metrics, including local differential privacy, identifiability, and mutual information.

We call a hypothesis a public hypothesis if its inference or detection is to be achieved by the fusion center. We call a hypothesis a private hypothesis, if its true state is not authorized to be inferred by the fusion center. The authors of [23] proposed three inference privacy metrics to measure the exposure of the private hypothesis: information privacy, differential privacy (as applied to the private hypothesis instead of the sensor data), and average information leakage. They showed that information privacy is the strongest among the three, while differential privacy does not guarantee information privacy. Methods using the information privacy metric, both nonparametric [24, 5, 25, 4, 26, 27] and parametric [23], have been proposed in the literature. Average information leakage is used by [28] and [29] to restrict the leakage of sensitive information. The references [30, 31] consider the tradeoff between prediction accuracy of sensitive information or parameters and data utility.

Inference privacy and data privacy generally do not imply each other. In [20], maximum leakage is used as the privacy metric to limit inference privacy leakage. With Type II error exponent as the utility metric, the authors conclude that it leads to data privacy leakage. On the other hand, data privacy constraints does not prevent the fusion center from making statistical inference. This is because data privacy metrics do not distinguish between the public and private hypotheses. If the data privacy budget is chosen in such a way that the private hypothesis is difficult to infer, it also means that the utility of inferring the public hypothesis will be severely impacted. A more technical discussion of the relationship between inference and data privacy metrics is provided in Section III. The reference [22] studied the relationship between various data privacy metrics under a distortion utility but did not consider any inference privacy metrics, whereas [23] compared only inference privacy metrics. In this paper, we attempt to bridge this gap by studying the relationship between different data and inference privacy metrics.

Several works have considered both inference and data privacy constraints. The paper [32] proposed an iterative optimization method to protect against average information leakage (inference privacy) and mutual information privacy (data privacy). However, it is unclear if these are the best inference and data privacy metrics for a decentralized IoT network. For a decentralized sensor network, [33] proposed the use of local differential privacy to achieve both data and inference privacy (which they call inherent and latent privacy, respectively). However, the proposed approach is computationally expensive as it involves a brutal force search. In [27], the author proposed a two-stage approach, with one stage implementing an inference privacy mechanism, and the other stage a local differential privacy mechanism. In this paper, we adopt a similar two-stage approach. In addition, we study the relationship between possible data and inference privacy metrics, which was not done in [27].

I-B Our Contributions

In this paper, we develop a joint inference and data privacy-preserving framework for a decentralized IoT network [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45]. Our main contributions are as follows.

  1. Despite the many privacy metrics proposed in the literature, to the best of our knowledge, the interplay between inference privacy and data privacy and the relationship between different privacy metrics have not been adequately investigated. In this paper, we introduce the concept of privacy implication and non-guarantee, and show how one privacy metric is related to another in this framework. We argue that in a practical IoT network, both information privacy and local differential privacy metrics should be incorporated in each sensor’s privacy mapping to provide suitable inference and data privacy guarantees, respectively.

  2. In the parametric case where the sensor observations’ distributions are known a priori, we propose a local privacy mapping for each sensor that consists of two local privacy mappings concatenated together. One local privacy mapping implements an information privacy mechanism while the other implements a local differential privacy mechanism. We show that both information privacy and local differential privacy are preserved in post-processing, and local differential privacy is immune to pre-processing. Simulations demonstrate that the proposed algorithms can protect both information privacy and local differential privacy, while maximizing the detection accuracy of the public hypothesis.

  3. In the nonparametric case where the sensor observations’ distributions are unknown a priori, we adopt an empirical risk optimization framework modified from [24] to now include both information privacy and local differential privacy constraints. Simulation results demonstrate that our proposed approach can achieve a good utility-privacy tradeoff.

This paper is an extension of our conference paper [26], which utilized a nonparametric approach to learn sensor decision rules with both local differential privacy and information privacy constraints. In this paper, we consider both the parametric and nonparametric cases, and provide further theoretical insights into the relationships between different privacy metrics.

The rest of this paper is organized as follows. In Section II, we present our system model. In Section III, we introduce the concept of privacy implication and non-guarantee, review the definition of various privacy metrics, and show the relationships between them. We propose a parametric approach with local differential privacy and information privacy constraints in Section IV, while a non-parametric approach is discussed in Section V. Simulation results are shown in Section VI, and we conclude in Section VII.

Notations: We use capital letters like to denote random variables or vectors, lowercase letters like for deterministic scalars, and boldface lowercase letters like for deterministic vectors. The vector has all zero entries, and has all ones. We use to denote the complement of the set . We assume that all random variables are defined on the same underlying probability measure space with probability measure . We use to denote the probability mass function of , and to denote the conditional probability mass function of given . We use to denote mutual information. We use to denote natural logarithm. We say that two vectors and are neighbors if they differ in only one of their vector components [17, 18, 19, 20], and we denote this by .

Ii System Model

Fig. 1: An IoT network with public hypothesis and private hypothesis .

We consider sensors making observations generated by a public hypothesis and a private hypothesis , as shown in Fig. 1. Each sensor , makes a noisy observation . Each sensor then summarizes its observation using a local decision rule or privacy mapping and transmits to a fusion center with probability . Both and are assumed to be discrete alphabets. Let denote the observations of all sensors, and denote the transmitted information from all sensors.

The fusion center infers the public hypothesis from . However, it can also use to infer , even though it has not been authorized to do so. At the same time, it may also try to recover from . In this paper, for simplicity, we consider the case where is a binary hypothesis (our work is easily extended to the multiple hypothesis case), and is a random vector where each component is binary, i.e., is a -ary hypothesis. Our goal is to design privacy mappings at sensors in order to make it difficult for the fusion center to both infer (inference privacy) and to recover (data privacy), while allowing it to infer with reasonable accuracy. In this paper, we do not make any assumptions regarding the conditional independence of sensor observations, which is common in many of the works in decentralized detection [39, 40, 41, 42, 43, 44, 45].

We consider to be the public hypothesis, to be the private hypotheses. In the example of fall detection, whether a fall happens is the public hypothesis . Each binary , , in the private hypothesis can correspond to detecting if the person is performing different activities like running, climbing stairs, squatting, and so on.

The utility of the network is the probability of inferring correctly by the fusion center. Inference privacy is measured by the “difficulty” of inferring . One of our objectives is to determine which inference privacy metric is most suitable for the IoT network in Fig. 1. Furthermore, since some sensors’ observations may be uncorrelated with , the raw observations from these sensors are transmitted to the fusion center to maximize the utility. There is then leakage of data privacy for these sensors. Therefore, we also require that the local privacy mappings at each sensor incorporate a data privacy mechanism.

Iii Relationships Between Privacy Metrics

In this section, we consider different privacy metrics proposed in the literature and study their relationships to provide insights into the best inference and data privacy metrics for a decentralized IoT network. A privacy budget is associated with each type of privacy metric, with a smaller corresponding to a more stringent privacy guarantee. We consider the following inference and data privacy metrics. Note that we use the joint distribution in Definitions 2 and 1 although Definition 1 (inference privacy) depends only on while Definition 2 (data privacy) depends only on . This is done to make it easier to present Definitions 4 and 3, which allow us to relate inference and data privacy metrics.

Definition 1 (Inference privacy metrics).

Let . We say that satisfies each of the following types of inference privacy if the corresponding conditions hold.

  • -differential privacy [23]: for all such that , and ,

  • -average information leakage [23]: .

  • -information privacy [23]: for all and ,

Definition 2 (Data privacy metrics).

Let . We say that satisfies each of the following types of data privacy if the corresponding conditions hold.

  • -local differential privacy [21]: for each sensor , and all , and ,

  • -mutual information privacy [22]: .

  • -identifiability [22]: for all such that , and ,

To relate one privacy metric to another, we introduce the concept of privacy implication and non-guarantee in the following definitions.

Definition 3 (Privacy implication).

We say that Type A privacy implies Type B privacy, if for all sequences of probability distributions such that satisfies -Type A privacy with , then satisfies -Type B privacy with .

Definition 4 (Privacy non-guarantee).

We say that Type A privacy does not guarantee Type B privacy, if there exists a sequence of probability distributions , such that satisfies -Type A privacy with , and -Type B privacy with .

Fig. 2: Relationships between different privacy metrics for an IoT network with fixed . An arrow means “implies” while means “does not guarantee”.

The following theorem elucidates the relationships between different privacy metrics, which are summarized in Fig. 2. Some of these relationships are results proven in [22, 23], and are reproduced here for completeness.

Theorem 1.

Consider the decentralized IoT network in Fig. 1 with sensors and . Let . Then, the following holds for .

  1. -information privacy implies -differential privacy for all .

  2. -information privacy implies -average information leakage for all .

  3. -differential privacy implies -information privacy. If , then differential privacy does not guarantee information privacy.

  4. -differential privacy implies -average information leakage. If , then differential privacy does not guarantee average information leakage.

  5. Average information leakage does not guarantee information privacy and differential privacy.

  6. -local differential privacy implies -information privacy.

  7. Information privacy does not guarantee local differential privacy.

  8. Information privacy does not guarantee mutual information privacy.

  9. Mutual information privacy does not guarantee information privacy.

  10. -mutual information privacy implies -average information leakage.

  11. -local differential privacy implies -mutual information privacy.

  12. Mutual information privacy does not guarantee local differential privacy.

  13. -local differential privacy yields -identifiability, where with the maximum taken over all neighboring . Therefore, -local differential privacy implies -identifiability if is restricted to have uniform distribution on . Otherwise, local differential privacy does not guarantee identifiability.

  14. -identifiability yields -local differential privacy. Therefore, -identifiability implies -identifiability if is restricted to have uniform distribution on . Otherwise, identifiability does not guarantee local differential privacy.

Proof:

See Appendix A. ∎

From Theorem 1, we see that information privacy implies the other types of inference privacy metrics in Definition 1, while local differential privacy implies the other types of data privacy metrics in Definition 2 if sensor observations are uniformly distributed. Although local differential privacy implies information privacy for a fixed number of sensors, this is no longer true if is not fixed or known in advance. Furthermore, even if is known a priori, Theorem 1 suggests that to achieve -information privacy based solely on preserving local differential privacy, the order of magnitude of the local differential privacy budget has to be not more than . Note that since the definition of local differential privacy does not distinguish between the public hypothesis or the private hypothesis , this implies that also satisfies -information privacy. If is large, [24, Theorem 1(i)] then implies that the Type I and II errors for detecting the public hypothesis also becomes large, which is therefore undesirable. Hence, we propose to design the sensors’ privacy mappings using both information privacy and local differential privacy constraints, where the local differential privacy budget can be chosen to be sufficiently large to achieve a reasonable utility for while maintaining strong information privacy for .

In the subsequent sections, we propose frameworks for designing the local privacy mappings for sensors in a decentralized IoT network under both the parametric and nonparametric cases. These privacy mappings are designed to achieve both information privacy and local differential privacy at the fusion center.

Iv Parametric Case: Concatenated Privacy Mappings

In this section, we consider the parametric case where is known a priori. We first study decentralized detection that preserves only data privacy using the local differential privacy metric. Then we include information privacy as an additional constraint to achieve inference privacy. We prove that information privacy is immune to post-processing, and local differential privacy is immune to both post-processing and pre-processing. Finally, we propose a local privacy mapping consisting of two concatenated privacy mappings that implement information privacy and local differential privacy mechanisms separately.

Iv-a Data Privacy using Local Differential Privacy

Theorem 1, since local differential privacy is the strongest privacy metric amongst those we studies, we first consider the case where local differential privacy is adopted as the privacy metric for the IoT network in Fig. 1. Let denote the set of such that \cref@addtoresetequationparentequation

(1a)
(1b)
(1c)

Let denote the decision rule used by the fusion center to infer the public hypothesis from the received sensor information . Our goal is to

(2)

where is the local differential privacy budget.

To optimize (2), we use person by person optimization (PBPO). For fixed , (2) is a convex optimization over [46], which can be solved with standard approaches. Then for each , we fix and where and optimize for . This procedure is then repeated until a convergence criterion is met.

Theorem 2.

Suppose . Consider optimizing creftype 2 over with and , fixed. The optimal solution is

(3)

where

with .

Proof:

Let . We have

(4)

We rewrite creftype 2 as the following linear programming problem:

(5)

Without loss of generality, assume satisfy the constraints of (5). From creftype 4, to minimize , we have for and for . Thus, we can simplify (5) to

s.t.

It can be shown that the solution to the above linear program is

which proves the theorem. ∎

Iv-B Joint Inference and Data Privacy

Fig. 3: Each sensor ’s privacy mapping consists of two privacy mappings concatenated together.

From Theorem 1, as information privacy is one of the strongest inference privacy metrics, we adopt the information privacy metric when designing our privacy mechanism. To achieve joint inference and data privacy, we consider

(P0)

where and are the information privacy budget and local differential privacy budget, respectively.

Since (P0) a NP-complete problem[47], we seek to find suboptimal solutions rather than solving (P0) directly. Similar to the work in [27], we break the privacy mapping in (P0) into two concatenated stages as shown in Fig. 3, where sensor observations are first mapped to , which is then mapped to , i.e., the mappings and satisfy

(6)

The local privacy mapping for each sensor is given by

(7)

We propose the following two architectures:

  1. Information-LocaL differential privacy (ILL): the mapping from to preserves information privacy, while the mapping from to preserves local differential privacy.

  2. Local differential-Information Privacy (LIP): the mapping from to preserves local differential privacy, while the mapping from to preserves information privacy.

In the following Propositions 2 and 1, we show that this two-stage approach achieves joint inference and data privacy. But first, we discuss how to optimize for the privacy mappings in practice.

In the ILL architecture, we find mappings and satisfying

(P1)

To solve the problem creftype P1, we first consider the information privacy subproblem: \cref@addtoresetequationparentequation

(8a)
(8b)

From [24, Theorem 2], to meet the constraint creftype 8b, it suffices to ensure that

(9)

where with

By using the constraint creftype 9, we reduce the constraints in creftype 8b to a single (but weaker) constraint, which is easier to optimize in practice. A PBPO variant of (8) similar to that used for solving creftype 2 can then be used to find the privacy mapping . We refer the reader to [48, Section IV] for more details on the optimization procedure for a similar formulation.

In the second stage, we consider the local differential privacy subproblem:

(10)

If , the PBPO solution follows from Theorem 2. If , we can use a standard linear program solver [49] for creftype 10 (see the discussion leading to creftype 5 on how to formulate this linear program).

Similarly, for the LIP architecture, we consider the following optimization problem:

(P2)

Solving creftype P2 can be done in a similar fashion as creftype P1.

We next show that the concatenation of information privacy mapping with local differential privacy mapping achieves joint information and local privacy in both the ILL and LIP architectures.

Proposition 1.

Let . Suppose that satisfies -information privacy and satisfies -local differential privacy. Then, the following holds.

  1. For any randomized mapping , satisfies -information privacy.

  2. For any randomized mapping , satisfies -local differential privacy.

Proof:
  1. For any and , we have

    Since for all , we obtain .

  2. Consider any sensor . For any and , we have . Therefore, for any , we then have

    for a fixed .

The proposition is now proved. ∎

Proposition 1 shows that joint information privacy for and local differential privacy for are preserved in the ILL architecture. In the LIP architecture, it is clear that information privacy for is preserved since this is an explicit constraint in creftype P2. Local differential privacy preservation follows from [50, Proposition 2.1], which is reproduced below for completeness.

Proposition 2.

Let . Suppose that satisfies -local differential privacy. Then for any randomized mapping , satisfies -local differential privacy.

Proof:

For any sensor , , , we have

since . The proposition is now proved. ∎

V Nonparametric Case: Empirical Risk Optimization

In many IoT applications, knowing the joint distribution of and the sensor observations is impractical due to difficulties in accurately modeling this distribution. Therefore, we adopt a nonparametric approach to design the privacy mapping based on a given set of independent and identically distributed (i.i.d.) training data points .

Following [24], let be a loss function, be a reproducing kernel Hilbert space with kernel , kernel inner product , and associated norm . We restrict the rule used by the fusion center to infer and based on to be of the form , where is the feature map. We seek to minimize the empirical -risk of deciding while preserving information privacy.

We consider the following optimization problem: \cref@addtoresetequationparentequation

(11a)
s.t. (11b)
(11c)

where

, is called the information privacy threshold,

and

Note that is the empirical -risk of detecting while is the empirical (normalized) -risk of distinguishing between and . For convenience, we call (11) the Empirical information and local differential PrIvaCy (EPIC) optimization.

For a detailed explanation of how the above optimization framework is derived, we refer the reader to [24]. Briefly, we seek to find such that the empirical risk for detecting under any decision rule adopted by the fusion center is above the information privacy threshold . The mapping is also required to satisfy -local differential privacy in the constraint creftype 11c.

From [24, Theorem 2], for each , by choosing appropriately, we can achieve -information privacy for any under mild technical assumptions. However, this trades off the detection error rate for . Therefore, we adopt the same two-step procedure in [24]:

  1. Determine the largest information privacy threshold achievable under additional constraints on to ensure that the error rate of inferring remains reasonable. This is achieved through an iterative block Gauss-Seidel method.

  2. Set a , which we call the information privacy threshold ratio, set in (11b) and use an iterative block Gauss-Seidel method to solve (11).

For the details of this two-step procedure, we again refer the reader to [24]. The only difference with the procedure in [24] is that now we have the additional linear inequality constraints (11c), which can be easily handled since each step in the block Gauss-Seidel method remains as a convex optimization problem.

Vi Numerical Results

In this section, we carry out simulations to verify the performance of our proposed approaches. We also perform experiments and comparisons on the OPPORTUNITY Activity Recognition Data Set [51]. In our simulations, we consider binary public hypothesis and private hypothesis . To evaluate the performance, we compute the Bayes probability errors for detecting and since these are the minimum detection errors any detector can achieve so that our results are oblivious to the choice of learning method adopted by the fusion center. The Bayes error of detecting reflects the utility of our method, while the Bayes error of detecting reflects the inference privacy of the private hypothesis . Data privacy of the sensor ’s observation is quantified by the mutual information .

Vi-a Parametric Case Study

Consider a network of sensors and a fusion center. Suppose that and . We set the correlation coefficient between the public hypothesis and private hypothesis to be . We assume that each sensor has identical joint distribution as shown in Fig. 4.

Fig. 4: Joint distribution of sensor observation, public hypothesis and private hypothesis . The correlation coefficient between and is .
Fig. 5: Bayes error for detecting and under LIP and ILL for fixed privacy threshold ratio and varying local differential privacy budget .

In Fig. 5, we let the information privacy budget be fixed at and , and vary the local differential privacy budget . We see that if is small, ILL is better at inferring the public hypothesis while achieving a similar detection error for the private hypothesis when compared to LIP. This is because ILL first sanitizes the sensor observations for information privacy before applying a local differential privacy mapping, which allows it better control over sanitization of statistical information needed to infer but keeping information for inferring . On the other hand, if is large, LIP infers with better accuracy. We also compare with the approach that uses only a local differential privacy constraint (i.e., the information privacy constraint in creftype P0 is removed), which we call LDP in the left drawing in Fig. 5. Without any constraint on , we see that LDP gives poor information privacy protection for .

Fig. 6: Bayes error for detecting and under LIP and ILL for fixed local differential privacy budget and varying .

In Fig. 6, we fix , while varying . We see that when is small, the Bayes error of detecting is large regardless of the value of . This aligns with our discussion after Theorem 1 that we should not use local differential privacy to achieve inference privacy for the private hypothesis as this approach also leads to a poor inference performance for the public hypothesis .

(a) Mutual information and .
(b) Mutual information and .
Fig. 7: Mutual information with and varying for ILL, LIP, InP and LDP.

We next consider the case where sensor 1’s observations are independent of with marginal conditional distribution under same as the joint distribution shown in Fig. 4. All other sensors follow the distribution in Fig. 4. In Fig. 7, we fix and vary to illustrate the mutual information between different quantities. We also compare with the approach that uses only an information privacy constraint (i.e., the local differential privacy constraint in creftype P0 is removed), which we call InP. From Fig. 6(a), we observe that both ILL and LIP yield sanitized information that have a high mutual information with the public hypothesis , and low mutual information with the private hypothesis . However, with LDP the mutual information and are both much higher compared to other methods, since it does not protect the information privacy of .

In Fig. 6(b), we compare the mutual informations and under different privacy architectures. We see that under ILL and LIP are much lower than that under InP. In particular, InP does not achieve good data privacy for since the information privacy constraint only removes statistical information in related to , which in this case is none as is independent of . This example illustrates the need to include both inference and data privacy constraints in our privacy mapping design. We also see that under both ILL and LIP is lower than that under InP, but converges to that of InP as becomes bigger.

Vi-B Nonparametric Case Study: Simulations

In this subsection, we consider the nonparametric case where the underlying sensor distributions are unknown. We perform simulations to provide insights into the performance of our proposed EPIC approach in creftype 11.

For simplicity, we use the count kernel in our simulations, which can be computed with a time complexity of . We choose the logistic loss function as the loss function in our simulations.

Consider a network of sensors and a fusion center. Each sensor observation is generated according to Table I, where is uniformly distributed over . The sensor observation space is , and the local decision space is chosen to be . Conditioned on , sensor observations are independent of each other. We generate i.i.d. training samples, and apply our proposed approach on the training data to learn the privacy mapping .

TABLE I: Sensor observation for different realizations of .
Fig. 8: Bayes error for detecting and , and mutual information between and with different local differential privacy budget .

We first compare the performance of three methods: our proposed EPIC, a detection method with only information privacy metric [24], and a detection method with only local differential privacy metric (i.e., solving (11a) without (11b)).

Fig. 8 demonstrates how , the local differential privacy budget, affects the inference privacy, data privacy and utility of these methods. In the simulation, we fix the information privacy threshold ratio when setting in (11b), and the correlation coefficient between and is . We observe that when is small, the performance of EPIC is close to the performance of E-LDP, where the Bayes error rates of both hypotheses are close to . This is in line with Theorem 16: a small local differential privacy budget implies information privacy for both hypotheses. With the increase of , the performance of EPIC approaches the performance of NPO, where the error rate of is low, while that for is high. However, with E-LDP, the error rate of also decreases with increasing , which leads to inference privacy leakage. When analyzing the data privacy leakage, we find that stays high with NPO, whereas EPIC achieves a reasonable by choosing to be around 5.

Fig. 9: Bayes error probability of detecting and with varying correlation coefficient between and .

Fig. 9 shows how the correlation between and affects their Bayes error detection rate. For EPIC, we set , and for LDP, we find a local differential privacy budget for each correlation coefficient tested that achieves the same error rate for as in EPIC. We observe that for the same correlation coefficient, the error rate for is higher in LDP compared to that in EPIC. This demonstrates our claim that local differential privacy should not be used to imply information privacy, as it can severely impact the detection error rate for as well.

Vi-C Nonparametric Case Study: OPPORTUNITY Data Set

We test our nonparametric EPIC framework on the OPPORTUNITY Activity Recognition Data Set [51] available at UCI Repository[52], and compare its performance with RUCA[25], DCA[53] and MDR[54]. In EPIC, we set the local decision space of each sensor to be .

In the OPPORTUNITY Activity Recognition Data Set, measurements from motion sensors including on-body sensors, sensors attached to objects, and ambient sensors like switches, are recorded while a person performs a series of typical daily activities. In this experiment, our public hypothesis is whether the person is standing or walking, while the private hypothesis is whether the person is touch a drawer or dishwasher. We used data from the ‘S2-Drill’ dataset, and sklearn [55] to select sensors that are the most correlated with our chosen labels. Since the sensor reading is continuous, unsupervised discretization was applied to quantize each continuous sensor reading to levels. We randomly sampled instances of training data, and instances of testing data.

As comparison benchmarks, we compare our method to the following:

  1. NPO [24], which includes only information privacy.

  2. The centralized approaches RUCA[25], DCA[53] and MDR[54], which require that all sensors send their observations to a central data curator that then applies an overall privacy mapping. Note that since the mapping in RUCA, DCA and MDR are deterministic, they do not provide any local differential privacy protection.

  3. Sensors do not apply any privacy mapping and sends their raw observations, i.e., . In this case, no local differential privacy protection is available, while some information privacy maybe possible depending on the underlying distribution . This serves as a benchmark to show the intrinsic error probability achievable.

Similar to [24], to estimate the privacy budgets achieved by each method, we compute

(12)
(13)

as estimates for the information privacy and local differential privacy budgets respectively. Here, is the empirical probability of the event . Note that a smaller implies stronger information privacy and a smaller implies stronger local differential privacy. We see that for RUCA, MDR, and the case .

From Table II, we observe that EPIC achieves the lowest information privacy and local differential privacy budgets compared to all the other benchmarks. Compared to NPO, it has similar information privacy budget but significantly lower local differential privacy budget since NPO does not consider any data privacy constraints. Due to the local differential privacy constraint, we see that EPIC has the highest error rate for detecting amongst all the methods, which is the price it pays for having the least privacy leakage.

Detection Method
EPIC ()
NPO (