Alleviating Privacy Attacks via Causal Learning

Alleviating Privacy Attacks via Causal Learning

Shruti Tople
Microsoft Research &Amit Sharma
Microsoft Research &Aditya Nori
Microsoft Research

Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. Causal models are known to be invariant to the training distribution and hence generalize well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on datasets with moderately complex Bayesian networks. We observe that neural network-based associational models exhibit upto 80% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks.

1 Introduction

Machine learning algorithms, especially deep neural networks (DNNs) have found diverse applications in various fields such as healthcare (Esteva et al., 2019), gaming (Mnih et al., 2013), and finance (Tsantekidis et al., 2017; Fischer and Krauss, 2018). However, a line of recent research has shown that deep learning algorithms are susceptible to privacy attacks that leak information about the training dataset (Fredrikson et al., 2015; Rahman et al., 2018; Song and Shmatikov, 2018; Hayes et al., 2017). Particularly, one such attack called membership inference reveals whether a particular data sample was present in the training dataset (Shokri et al., 2017). The privacy risks due to membership inference elevate when the DNNs are trained on sensitive data such as in healthcare applications. For example, patients providing medical records to build a model that detects HIV would not want to reveal their participation in the training dataset.

Membership inference attacks are shown to exploit overfitting of the model on the training dataset (Yeom et al., 2018). Existing defenses propose the use of generalization techniques such as adding learning rate decay, dropout or using adversarial regularization techniques (Nasr et al., 2018b; Salem et al., 2018). All these approaches assume that the test data is from the same distribution as the training dataset. In practice, a model trained using data from one distribution is often used on a (slightly) different distribution. For example, hospitals in one region may train a model to detect HIV and share it with hospitals in different regions. However, generalizing to a new context is a challenge for any machine learning model. We extend the scope of membership privacy to different distributions and show that the risk from membership attack increases further on DNNs as the test distribution is changed. That is, the abiltity of an adversary to distinguish a member from a non-member improves with change in test distributions.

To alleviate privacy attacks, we propose using models that depend on the causal relationship between input features and the output. Causal learning has been extensively used to guarantee fairness and explainability properties of the predicted output (Kusner et al., 2017; Nabi and Shpitser, 2018; Datta et al., 2016). However, the connection of causality to privacy is yet unexplored. To the best of our knowledge, we provide the first analysis of privacy benefits of causal models. By definition, causal relationships are invariant across input distributions (Peters et al., 2016), and hence make the predictions of causal models independent of the observed data distribution, let alone the observed dataset. Hence, causal models generalize better even with change in the distributions.

In this paper, we show that the generalizability property of causal models directly ensures better privacy guarantees for the input data. Concretely, we prove that with reasonable assumptions, a causal model always provides stronger (i.e., smaller value) differential privacy guarantees than a corresponding associational model trained on the same features and the same amount of added noise to the training dataset. Consequently, we show that membership inference attacks are ineffective (equivalent to a random guess) on causal models trained on infinite samples. Empirical attack accuracies on four different datasets confirm our theoretical claims. We find that K training samples are sufficient to reduce the attack accuracy of a causal model to a random guess. In contrast, membership attack accuracy for neural network-based associational models increase as test distributions are changed. The attack accuracy reaches nearly 80% when the target associational model is trained on K training samples and used to predict test data that belong to a different distribution than the training data. Our results show that causal learning approaches are a promising direction for training models on sensitive data. Section 2 describes the properties of causal models. Section 3 proves the connections of causality to differential privacy and robustness to membership attacks. Section 4 provides empirical results. To summarize, we make the following contributions:

  • For the same amount of added noise, models learned using causal structure provide stronger -differential privacy guarantees than corresponding associational models.

  • Causal models are provably more robust to membership inference attacks than typical associational models such as neural networks.

  • We simulate practical settings where the test distribution may not be the same as the training distribution and find that the membership inference attack accuracy of causal models is close to a “random guess” (i.e., 50%) while associational models exhibit upto attack accuracy.

2 Properties of Causal Models

Causal models are shown to generalize well since the output of these models depend only on the causal relationship between the input features and the outcomes instead of the associations between them. From prior work, we know that the causal relationship between the features is invariant to the their distribution (Peters et al., 2016). Using this property, we study its effects on the privacy of data.

2.1 Background: Causal Model

Intuitively, a causal model identifies a subset of features that have a causal relationship with the outcome and learns a function from the subset to the outcome. To construct a causal model, one may use a structural causal graph based on domain knowledge that defines causal features as parents of the outcome under the graph. Alternatively, one may exploit the strong relevance property from Pellet and Elisseeff (2008), use score-based learning algorithms (Scutari, 2009) or recent methods for learning invariant relationships from training datasets from different distributions (Peters et al., 2016; Bengio et al., 2019), or learn based on a combination of randomized experiments and observed data. Note that this is different from training probabilistic graphical models, wherein an edge conveys an associational relationship. Further details on causal models are in Pearl (2009); Peters et al. (2017).

For ease of exposition, we assume the structural causal graph framework throughout. Consider data from a distribution where is a -dimensional vector and . Our goal is to learn a function that predicts . Figure 1 shows causal graphs that denote the different relationships between and . Nodes of the graph represent variables and a directed edge represents a direct causal relationship from a source to target node. Denote , the parents of in the causal graph. Figure 1a shows the scenario where contains variables that are correlated to in , but not necessarily connected to either or . These correlations may change in the future, thus a generalizable model should not include these features. Similarly, Figure 1b shows parents and children of . The d-separation principle states that a node is independent of its ancestors conditioned on all its parents (Pearl, 2009). Thus, is independent of and conditional on . Therefore, including them in a model does not add predictive value (and further, avoids problems when the relationships between and may also change). Finally, for completeness, the exhaustive set of variables to include is known as the Markov Blanket, which includes ’s parents, (), children () and parents of children. Conditioned on its Markov blanket (Figure 1c), is independent of all other variables in the causal graph. When has no descendants in the graph, then the effective Markov blanket includes only its parents, .

Figure 1: A causal predictive model includes only the parents of (a) and (b). Panel (c) shows the generalization to a Markov Blanket.

The key insight is that building a model for predicting using the Markov Blanket ensures that the model generalizes to other distributions of , and also to changes in other causal relationships between , as long as the causal relationship of to is stable. We call such a model as a causal model, the features in () as the causal features, and assume that all the causal features for are observed. In contrast, we call a model that uses all available features as an associational model.

2.2 Generalization to new distributions

We state the generalization property of causal models and show how it results in a stronger differential privacy guarantee. We first define In-distribution and Out-of-distribution generalization error. Throughout, refers to the loss on a single input and refers to the expected value of the loss over a distribution . We refer as the ground-truth labeling function and as the hypothesis function or simply the model. Then, is any loss function quantifying the difference between two models and .

Definition 1.

In-Distribution Generalization Error (). Consider a dataset . Then for a model trained on , the in-distribution generalization error is given by:

Definition 2.

Out-of-Distribution Generalization Error (). Consider a dataset sampled from a distribution . Then for a model trained on , the out-of-distribution generalization error with respect to another distribution is given by:

Definition 3.

Discrepancy Distance () (Def. 4 in Mansour et al. (2009)). Let be a set of hypotheses, . Let define a loss function over for any such hypothesis . Then the discrepancy distance over any two distributions and is given by:


Intuitively, the term denotes the distance between the two distributions. Higher the distance, higher is the chance of an error when transferring from one distribution to another. Now, we will state the theorem on the generalization property of causal models.

Theorem 1.

Consider a dataset , a structural causal graph that connects to , and causal features where is a Markov Blanket of under . Let be a distribution with arbitrary such that the causal relationship from to is preserved i.e., . Let represent the set of causal models that use only causal features and represent the set of associational models that use all the features, such that . Then, for any symmetric loss function that obeys the triangle inequality, the upper bound of from to (called ) for a causal model is less than or equal to the upper bound of an associational model , with probability at least .


Since , the optimal causal model that minimizes loss over is the same as the loss-minimizing model over . That is, . However, for an associational model, and thus there is an additional loss term when generalizing to data from . The rest of the proof follows from triangle inequality of the loss function and the standard bounds for from past work. Detailed proof is in Appendix Section A.1. ∎

Corollary 1.

For a model trained on a dataset , and for any two input instances and such that , the worst-case generalization error for a causal model on any is less than or equal to that for an associational model. [Proof in Appendix Section A.2]


3 Main Result: Privacy Guarantees with Causality

We now present our main result on the privacy guarantees and attack robustness of causal models.

3.1 Differential Privacy Guarantees

Differential privacy (Dwork et al., 2014) provides one of the strongest notion of privacy guarantees to hide the participation of an individual sample in the dataset. To state informally, it ensures that the presence or absence of a single data point in the input dataset does not change the output by much.

Definition 4 (Differential Privacy).

A mechanism with domain and range satisfies -differential privacy if for any two datasets that differ only in one input and for a set , the following holds:

Based on the generalization property, we show that causal models provide stronger differential privacy guarantees than corresponding associational models. The standard approach to designing a differentially private algorithm is by calculating the sensitivity of that algorithm and adding noise proportional to the sensitivity. Sensitivity captures the change in the output of an algorithm due to the change in a single data point in the input. Higher the sensitivity, larger is the amount of noise required to make an algorithm differentially private with reasonable guarantees. We first provide the formal definition of sensitivity and then show that the sensitivity of causal models is lower than or equal to associational models.

Definition 5 (Sensitivity (From Def. 3.1 in Dwork et al. (2014)).

Let be a function that maps a dataset to a vector in . Let , be two datasets such that differs from in one data point. Then the -sensitivity of a function is defined as:

Lemma 1.

Given two neighboring datasets and , such that where , and . Let a model be specified by a set of parameters . Let be a model learnt using as training data and be the model learnt using as training data. Then, the sensitivity of a causal learning function that outputs learnt empirical hypothesis and is lower than or equal to the sensitivity of an associational learning function that outputs and using a loss function that is strongly convex over , symmetric and obeys the triangle inequality.


We can write the empirical loss minimizers for the datasets and as:


From Corollary 1, for and , we have:


Since and and the above is true for any and ,


Since is a strongly convex function over , and since , and that minimize and respectively should also be closer to each other than and  (Boyd and Vandenberghe, 2004) (using Eqn. 9).


Hence, sensitivity of a causal model is lower than an associational model i.e., . ∎

Theorem 2.

Let and be the differentially private algorithms corresponding to causal learning and associational learning algorithms and respectively. Let and provide -DP and -DP guarantees respectively. Then, for noise sampled from the same distribution, for both algorithms, we have .


According to the Def. 3.3 of Laplace mechanism from Dwork et al. (2014), we have,


Since is sampled from the same noise distribution,


From Lemma 1, and hence . ∎

While we prove the general result above, our central claim comparing differential privacy for causal and associational models also holds true for models developed using recent work (Papernot et al., 2017) that provides a tighter data-dependent differential privacy guarantee. The key idea is to produce an output label based on voting from teacher models, each trained on a disjoint subset of the training data. We state the theorem below and provide the proof in Appendix B. Given datasets from different domains, the below theorem provides a constructive proof to generate a differentially private causal algorithm, following the method from Papernot et al. (2017).

Theorem 3.

Let be a dataset generated from possibly a mixture of different distributions such that remains the same. Let be the votes for the jth class from M teacher models. Let be the mechanism that produces a noisy max, . Then the privacy budget for a causal model is lower than that for the associational model with the same accuracy.

3.2 Robustness to Membership Attacks

Deep learning models have shown to memorize or overfit on the training data during the learning process (Carlini et al., 2018). Such overfitted models are susceptible to membership inference attacks that can accurately predict whether a target input belongs to the training dataset or not (Shokri et al., 2017). There are multiple variants of the attack depending on the information accessible to the adversary. An adversary with access to a black-box model only sees the confidence scores for the predicted output whereas one with the white-box has access to the model parameters and observe the output at each layer in the model (Nasr et al., 2018a). In the black-box setting, a membership attack is possible whenever the distribution of output scores for training data is different from the test data, and has been connected to model overfitting (Yeom et al., 2018). For the white-box setting, if an adversary knows the true label for the target input, then they may guess the input to be a member of the training set whenever the loss is lower, and vice-versa. Alternatively, if the adversary knows the distribution of the training inputs, they may learn a “shadow” model based on synthetic inputs and use the shadow model’s output to build a membership classifier for any new input (Salem et al., 2018).

Most of the existing membership inference attacks have been demonstrated for test inputs from the same data distribution as the training set. When test inputs are expected from the same distribution, methods to reduce overfitting (such as adversarial regularization) can help reduce privacy risks (Nasr et al., 2018b). However, in practice, this is seldom the case. For instance, in our example of a model trained to detect HIV, the test inputs may come from different hospitals. Models trained to reduce the generalization error for a specific test distribution are still susceptible to membership inference when the distribution of features is changed. This is due to the problem of covariate shift that introduces a domain adaptation error term (Mansour et al., 2009). That is, the loss-minimizing model that predicts changes with a different distribution, and thus allows the adversary to detect differences in losses for the test versus training datasets.As we show below, causal models alleviate the risk of membership inference attacks. From Yeom et al. (2018), we first define a membership attack as:

Definition 6.

Let model be trained on a dataset of size . Let be an adversary with access to and a input . The advantage of an adversary in membership inference is the difference between true and false positive rate in guessing whether the the input belongs to the training set.


where if the input is in the training set and otherwise.

Lemma 2.

[From Yeom et al. (2018)] Let be a -differentially private mechanism based on a model . The membership advantage for an adversary is bounded by .

Theorem 4.

Given a structural causal network that connects to , let be a dataset sampled from , and let be any distribution such that . Then, a causal model trained on yields lower membership advantage than an associational model even when the test dataset is from a different distribution .


From Theorem 2 above, we can construct an -DP mechanism based on a causal model, and a -DP mechanism based on an associational model, where . Further, this construction works for different input distributions. From Lemma 2, the membership advantage of an adversary is,


Thus, worst case advantage for a causal model is always lower than that of an associational model. ∎

Corollary 2.

Let be a causal model trained using empirical risk minimization on a dataset with sample size . As , membership advantage .

The proof is the based on the result from Theorem 1 that for a causal model. Crucially, membership advantage does not go to zero as for associational models, since in general. Detailed proof is in Appendix Section C.

Attribute Inference attacks. We prove similar results on the benefits of causal models for attribute inference attacks in Appendix Section  D.

4 Implementation and Evaluation

Benchmark Datasets.

Dataset Child Alarm
Akt CKNI_12_45
No. of classes 5 3 3 3
Nodes 20 37 11 32
Arcs 25 46 17 66
Parameters 230 509 178 10083
Table 1: Details of the benchmark datasets

To avoid errors in learning causal structure from data, we perform evaluation on datasets for which the causal structure and the true conditional probabilities of the variables are known from prior research. We select Bayesian network datasets— Child, Sachs, Alarm and Water that range from 230-10k parameters (Table 1(1). Nodes represent the number of input features and arcs denote the causal connections between these features in the network. Each causal connection is specified using a conditional probability table ; we consider these probability values as the parameters in our models. To create a prediction task, we select a variable in each of these networks as the output . The number of classes in Table 1 denote all the possible values for an output variable. For example, the variable BP (blood pressure) in the alarm dataset takes 3 values i.e, LOW, NORMAL, HIGH. The causal model uses only parents of whereas the associational model (DNN) uses all nodes except as features.

Implementation. We sample data using the causal structure and probabilities from the Bayesian network, and use a 60:40% split for train-test datasets. We learn a causal model and a deep neural network (DNN) on each training dataset. We implement the attacker model to perform membership inference attack using the output confidences of both these models, based on past work (Salem et al., 2018). The input features for the attacker model comprises of the output confidences from the target model, and the output is membership prediction (member / non-member) in the training dataset of the target model. In both the train and the test data for the attacker model, the number of members and non-members are equal. The creation of the attacker dataset is described in Figure 4 in Appendix. Note that the attack accuracies reported are an upper bound since we assume that the adversary has white-box access to the ML model.

To train the causal model, we use the bnlearn library in R language that supports maximum likelihood estimation of the parameters in ’s conditional probability table. For prediction, we use the parents method to predict the class of any specific variable. To train the DNN model and the attacker model, we build custom estimators in Python using Tensorflow v1.2 30. The DNN model is a multilayer perceptron (MLP) with 3 hidden layers of 128, 512 and 128 nodes respectively. The learning rate is set to 0.0001 and the model is trained for 10000 steps. The attacker model has 2 hidden layers with 5 nodes each, a learning rate of 0.001, and is trained for 5000 steps. Both models use Adam optimizer, ReLU for the activation function, and cross entropy as the loss function. We chose these parameters to ensure model convergence.

4.1 Experimental Setup

We evaluate the DNN and the causal model sample sizes ranging from 1K to 1M dataset sizes. We refer Test 1 as the test dataset which is drawn from the same distribution as the training data and Test 2 is generated from a completely different distribution except for the relationship of the output class to its parents. To generate Test 2, we alter the true probabilities uniformly at random (later, we consider adding noise to the original value). Our goal with generating Test 2 is to capture the realistic behaviour of shift in distribution for input features. We refer the causal and DNN model as the target on which the attack is perpetrated.

Figure 2: Results for Child dataset with XrayReport as the output. ( \subreffig:child_orig) is the target model accuracy. ( \subreffig:child_attack) is the attack accuracy for different dataset sizes on which the target model is trained and ( \subreffig:all_dist) is the attack accuracy for test distribution with varying amount of noise for total dataset size of 100K samples.
Figure 3: (\subreffig:orig_all)Target accuracy, (\subreffig:all_attack) Attack accuracy, (\subreffig:learned_graph) Attack accuracy for true, learned causal model and DNN.

4.2 Results

We present results on the accuracy of target models (causal and DNN models) and the membership attack accuracy for different dataset sizes and test distributions.
Accuracy comparison of DNN and Causal models. Figure 1(a) shows the target model accuracy comparison for the DNN and the causal model trained on the Child dataset with XrayReport as the output variable. We report the accuracy of the target models only for a single run since in practice the attacker would have access to the outputs of only a single model. We observe that the DNN model has a large difference between the train and the test accuracy (both Test 1 and Test 2) for smaller dataset sizes (1K and 2K). This indicates that the model overfits on the training data for these dataset sizes. However, after K samples, the model converges such that the train and Test 1 dataset have the same accuracy. The accuracy for the Test 2 distribution stabilizes for a total dataset size of K samples. In contrast, for the causal model, the train and Test 1 accuracy are similar for the causal model even on smaller dataset sizes. However, after convergence at around K samples, the gap between the accuracy of train and Test 2 dataset is the same for both the DNN and the causal model. Figure 2(a) shows the accuracy comparison for all four datasets with similar results.
Attack Accuracy comparison of DNN and Causal models. A naive attacker classifier would predict all the samples to be members and therefore achieve 0.5 prediction accuracy. Thus, we consider 0.5 as the baseline attack accuracy which is equal to a random guess. Figure 1(b) shows the attack accuracy comparison for Test 1 (same distribution) and Test 2 (different distribution) datasets. Attack accuracy of the Test 1 dataset for the causal model is slightly above a random guess for smaller dataset sizes, and then converges to 0.5. In comparison, attack accuracy for the DNN on Test 1 dataset is over 0.6 for smaller samples sizes and reaches 0.5 after 10K datapoints. This confirms past work that an overfitted DNN is susceptible to membership inference attacks even for test data generated from the same distribution as the training data (Yeom et al., 2018). On Test 2, the attack accuracy is always higher for the DNN than the causal model, indicating our main result that associational models “overfit” to the training distribution, in addition to the training dataset. Membership inference accuracy for DNNs is as high as 0.8 for total dataset size of 50K while that of causal models is below 0.6. Further, attack accuracy for DNN increases with sample size whereas attack accuracy for the causal model reduces to 0.5 for total dataset size over 100k even when the gap between the train and test accuracies is the same as DNNs ( as shown in Figure 1(a)). These results show that causal models generalize better than DNNs across input distributions. Figure 2(b) shows a similar result for all four datasets. The attack accuracy for DNNs and the causal model is close to 0.5 for the Test 1 dataset while for the Test 2 dataset the attack accuracy is significantly higher for DNNs than causal model. This empirically confirms our claim that in general, causal models are robust to membership inference attacks across test distributions as compared to associational models.
Attack Accuracy for Different Test Distributions. To understand the change in attack accuracy as changes, we also generate test data from different distributions by adding varying amount of noise to the true probabilities. We range the noise value between 0 to 2 and add it to the individual probabilities which are then normalized to sum up to 1. Figure 1(c) shows the comparison of attack accuracy for the causal model and the DNN on the child dataset for a total sample size of 100K samples. We observe that the attack accuracy increases with increase in the noise values for the DNN. Even for a small amount of noise, attack accuracies increase sharply. In contrast, attack accuracies stay close to 0.5 for the causal model, demonstrating the robustness to membership attacks.
Results with learnt causal model. Finally, we perform experiments to understand the effect of privacy guarantees on causal structures learned from data that might be different from the true causal structure. We evaluate the attack accuracy for learned causal models on the Sachs, Child and Alarm dataset111We exclude the Water dataset as in bnlearn library gives error due to the extreme probabilities.. For these datasets, a simple hill-climbing algorithm returned the true causal parents. Hence, we evaluated attack accuracy for models with hand-crafted errors in learning the structure i.e., misestimation of causal parents, see Figure 2(c). Specifically, we include two non-causal features as parents of the output variable along with the true causal features. The attack risk increases as a learnt model deviates from the true causal structure, however it still exhibits lower attack accuracy than the corresponding associational model. Table 2 in Appendix gives a fine-grained analysis.

5 Related Work

Privacy attacks and defenses on ML models.  Shokri et al. (2017) demonstrate the first membership inference attacks on black box neural network models with access only to the confidence values. Similar attacks have been shown on several other models such as GANs (Hayes et al., 2017), text prediction generative models (Carlini et al., 2018; Song and Shmatikov, 2018) and federated learning models (Nasr et al., 2018b). However, prior research does not focus on the severity of these attacks with change in the distribution of the test dataset. We discussed in Section 2 that existing defenses based on regularization (Nasr et al., 2018b) are not practical when models are evaluated on test inputs from different distributions. Another line of defense is to add differentially private noise while training the model. However, the values necessary to mitigate membership inference attacks in deep neural networks require addition of large amount of noise that degrades the accuracy of the output model (Rahman et al., 2018). Thus, there is a trade-off between privacy and utility when using differential privacy for neural networks. In contrast, we show that causal models require lower amount of noise to achieve the same differential privacy guarantees and hence retain accuracy closer to the original model. Further, as training sample sizes become sufficiently large, as shown in Section 4, causal models are, by definition, robust to membership inference attacks across distributions.
Causal learning and privacy.There is a substantial literature on learning causal models from data; for a review see (Peters et al., 2017; Pearl, 2009). Kusner et al. (2015) proposed a method to privately reveal parameters from a causal learning algorithm, using the framework of differential privacy. Instead of a specific causal algorithm, our focus is on the privacy benefits of causal models for general predictive tasks. While recent work applies causal models to study properties of machine learning models such as providing explanations (Datta et al., 2016) or fairness (Kusner et al., 2017), the relation of causality to privacy is yet unexplored. With this paper, we present the first result which shows the privacy benefits of causal models.

6 Conclusion and Future Work

We conclude that causal learning is a promising approach to train models which are robust to privacy attacks such as membership inference and model inversion. As our future work, we want to relax our assumption of a known causal structure and investigate the privacy guarantees of causal models where the causal features and the relationship between them is not known apriori (Peters et al., 2017).


  • [1] Bayesian Network Repository. Note: Cited by: §4.
  • Y. Bengio, T. Deleu, N. Rahaman, R. Ke, S. Lachapelle, O. Bilaniuk, A. Goyal, and C. Pal (2019) A meta-transfer objective for learning to disentangle causal mechanisms. arXiv preprint arXiv:1901.10912. Cited by: §2.1.
  • S. Boyd and L. Vandenberghe (2004) Convex optimization. Cambridge university press. Cited by: §3.1.
  • N. Carlini, C. Liu, J. Kos, Ú. Erlingsson, and D. Song (2018) The secret sharer: measuring unintended neural network memorization & extracting secrets. arXiv preprint arXiv:1802.08232. Cited by: §3.2, §5.
  • A. Datta, S. Sen, and Y. Zick (2016) Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on, pp. 598–617. Cited by: §1, §5.
  • C. Dwork, A. Roth, et al. (2014) The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: Appendix B, §3.1, §3.1, Definition 5.
  • A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S. Thrun, and J. Dean (2019) A guide to deep learning in healthcare. Nature medicine 25 (1), pp. 24. Cited by: §1.
  • T. Fischer and C. Krauss (2018) Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research 270 (2), pp. 654–669. Cited by: §1.
  • M. Fredrikson, S. Jha, and T. Ristenpart (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. Cited by: Appendix D, §1.
  • J. Hamm, Y. Cao, and M. Belkin (2016) Learning privately from multiparty data. In International Conference on Machine Learning, pp. 555–563. Cited by: Appendix B.
  • J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro (2017) LOGAN: membership inference attacks against generative models. arXiv preprint arXiv:1705.07663. Cited by: §1, §5.
  • M. J. Kusner, J. Loftus, C. Russell, and R. Silva (2017) Counterfactual fairness. In Advances in Neural Information Processing Systems, pp. 4066–4076. Cited by: §1, §5.
  • M. J. Kusner, Y. Sun, K. Sridharan, and K. Q. Weinberger (2015) Private causal inference. arXiv preprint arXiv:1512.05469. Cited by: §5.
  • Y. Mansour, M. Mohri, and A. Rostamizadeh (2009) Domain adaptation: learning bounds and algorithms. arXiv preprint arXiv:0902.3430. Cited by: §3.2, Definition 3.
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Cited by: §1.
  • R. Nabi and I. Shpitser (2018) Fair inference on outcomes. In Proceedings of the… AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, Vol. 2018, pp. 1931. Cited by: §1.
  • M. Nasr, R. Shokri, and A. Houmansadr (2018a) Comprehensive privacy analysis of deep learning: stand-alone and federated learning under passive and active white-box inference attacks. arXiv preprint arXiv:1812.00910. Cited by: §3.2.
  • M. Nasr, R. Shokri, and A. Houmansadr (2018b) Machine learning with membership privacy using adversarial regularization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 634–646. Cited by: §1, §3.2, §5.
  • N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar (2017) Semi-supervised knowledge transfer for deep learning from private training data. In ICLR, Cited by: Appendix B, Appendix B, §3.1.
  • J. Pearl (2009) Causality. Cambridge university press. Cited by: §A.1, §2.1, §2.1, §5.
  • J. Pellet and A. Elisseeff (2008) Using markov blankets for causal structure learning. Journal of Machine Learning Research 9 (Jul), pp. 1295–1342. Cited by: §A.1, §A.2, §2.1.
  • J. Peters, P. Bühlmann, and N. Meinshausen (2016) Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (5), pp. 947–1012. Cited by: §1, §2.1, §2.
  • J. Peters, D. Janzing, and B. Schölkopf (2017) Elements of causal inference: foundations and learning algorithms. MIT press. Cited by: §2.1, §5, §6.
  • M. A. Rahman, T. Rahman, R. Laganiere, N. Mohammed, and Y. Wang (2018) Membership inference attack against differentially private deep learning model.. Transactions on Data Privacy. Cited by: §1, §5.
  • A. Salem, Y. Zhang, M. Humbert, M. Fritz, and M. Backes (2018) Ml-leaks: model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246. Cited by: §1, §3.2, §4.
  • M. Scutari (2009) Learning bayesian networks with the bnlearn r package. arXiv preprint arXiv:0908.3817. Cited by: §2.1.
  • S. Shalev-Shwartz and S. Ben-David (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press. External Links: Document Cited by: item 1.
  • R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on, pp. 3–18. Cited by: §1, §3.2, §5.
  • C. Song and V. Shmatikov (2018) The natural auditor: how to tell if someone used your words to train their model.. arXiv preprint arXiv:1811.00513. Cited by: §1, §5.
  • [30] Tensorflow. Note: Cited by: §4.
  • A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis (2017) Using deep learning to detect price change indications in financial markets. In 2017 25th European Signal Processing Conference (EUSIPCO), pp. 2511–2515. Cited by: §1.
  • S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha (2018) Privacy risk in machine learning: analyzing the connection to overfitting. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282. Cited by: §1, §3.2, §3.2, §4.2, Definition 7, Lemma 2.

Appendix A Generalization Properties of Causal Model

a.1 Generalization over Different Distributions

See 1


For any function that was trained on , from Def. 2 we write:


where the last equation is to due to Def.1 of the in-distribution generalization error.

Let us denote the optimal loss-minimizing hypotheses over and as and .


Using the triangle inequality of the loss function, we can write:




Thus, combining Eq. 15,  17 and  18, we obtain,


where the last inequality is due to the definition of discrepancy distance (Definition 3). Equation 19 divides the out-of-distribution generalization error of a hypothesis in four parts:

  1. denotes the in-distribution error of . This can be bounded by typical generalization bounds, such as the uniform error bound that depends only on the VC dimension and sample size of  (Shalev-Shwartz and Ben-David, 2014). Using a uniform error bound based on the VC dimension, we obtain, with probability at least ,


    Since , VC-dimension of causal models is not greater than that of associational models. Thus,

  2. denotes the distance between the two distributions. Given two distributions, the discrepancy distance does not depend on , but only on the hypothesis class .

  3. and measure the error due to the true labeling function being outside the hypothesis class .

  4. denotes the loss (or difference) between the loss-minimizing function over and the loss-minimizing hypothesis over .

We next show that for a given distribution and another distribution such that where is the Markov Blanket for , the last term of Equation 19 vanishes for a causal model but may remain non-zero for an associational model.

Causal Model.

Given a structural causal network, let us first construct a model using only variables in ’s Markov Blanket. By property of the structural causal network, includes all parents of and thus there are no backdoor paths, using Rule 2 of do-calculus from  Pearl (2009):


where the last equality is since is also the Markov Blanket in , ( is independent of given in ), and thus includes all parents of . Since the conditional distribution of given features is the same across and , the loss-minimizing models should also be the same. Defining and , we obtain,


And therefore, the term .

Associational Model.

In contrast, an associational model may use a subset of , , that may not include all variables in the Markov Blanket, or may include the Markov Blanket but also include other extraneous variables. When at least one parent is excluded, the subset no longer corresponds to the causal do distribution.


In such a case, . Since , it is possible in some cases that an associational model includes all the causal features of . Therefore, .

Loss Bounds.

Further, since is the Markov Blanket and thus the set of strongly relevant features as well (Pellet and Elisseeff, 2008). From the literature we know that, , hence,


And similarly,


Hence, using Eq.  23,  25,  26, we write equation 19 for a causal model as:


Using Eqns. 20 and 21, we obtain, with probability at least :


Similarly, for an associational model, we obtain, with probability at least :


Finally, using Definition 3 for discrepancy distance, we can state . Therefore, from Eq.  27 and  31, we claim with probability ,


a.2 Generalization over a Single Datapoint

See 1


Using the triangle inequality for the loss function, we obtain,


Combining the two inequalities,


where , analogous to the for distributions.

For a causal model, we know that . In addition, since is also the strongly relevant feature set (Pellet and Elisseeff, 2008), and same for and loss on .



where since .

Next, we show that these bounds are tight. That is, there exists an associational model whose generalization error on is exactly the RHS on Eqn. 37 and thus higher than the bound for any causal model. Below we prove by constructing one such .

For simplicity in construction, let us select and . Thus whereas . We obtain the -generalization bound for a causal model,


Let us now construct an associational model, , such that:


A simple construction for the above equalities is to select and , and correspondingly and such that and . Further, can be selected such that and .

Then, using Eqn. 39,