Differential Privacy-enabled Federated Learning for Sensitive Health Data

Differential Privacy-enabled Federated Learning for Sensitive Health Data

Olivia Choudhury
IBM Research Cambridge
Cambridge, MA, USA
&Aris Gkoulalas-Divanis
IBM Watson
Cambridge, MA, USA
&Theodoros Salonidis
IBM T.J. Watson Research Center
Yorktown Heights, NY, USA
&Issa Sylla
IBM Research Cambridge
Cambridge, MA, USA
&Yoonyoung Park
IBM Research Cambridge
Cambridge, MA, USA
&Grace Hsu
Massachusetts Institute of Technology
Cambridge, MA, USA
&Amar Das
IBM Research Cambridge
Cambridge, MA, USA
Research performed during internship at IBM Research Cambridge

Leveraging real-world health data for machine learning tasks requires addressing many practical challenges, such as distributed data silos, privacy concerns with creating a centralized database from person-specific sensitive data, resource constraints for transferring and integrating data from multiple sites, and risk of a single point of failure. In this paper, we introduce a federated learning framework that can learn a global model from distributed health data held locally at different sites. The framework offers two levels of privacy protection. First, it does not move or share raw data across sites or with a centralized server during the model training process. Second, it uses a differential privacy mechanism to further protect the model from potential privacy attacks. We perform a comprehensive evaluation of our approach on two healthcare applications, using real-world electronic health data of 1 million patients. We demonstrate the feasibility and effectiveness of the federated learning framework in offering an elevated level of privacy and maintaining utility of the global model.

1 Introduction

Learning from real-world health data has proven effective in multiple healthcare applications, resulting in improved quality of care Gultepe et al. (2013); Johnson et al. (2016a); Chakrabarty et al. (2019), generating medical image diagnostic tools Erickson et al. (2017); Menze et al. (2014); Gibson et al. (2018), predicting disease risk factors Poplin et al. (2018); Zheng et al. (2017); Kourou et al. (2015), and analyzing genomic data for personalized medicine Libbrecht and Noble (2015); Choudhury et al. (2018); Beam and Kohane (2018). Healthcare data is often segregated across data silos, which limits its potential for insightful analytics. Data access is further restricted due to regulatory policies mandated by the US Health Insurance Portability and Accountability Act (HIPAA)111https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html and EU General Data Protection Regulation (GDPR) Agencia Espanola Proteccion Datos (2019). Traditional or centralized machine learning algorithms require aggregating such distributed data into a central repository for the purpose of training a model. However, this incurs practical challenges, such as regulatory restrictions on sharing patient-level sensitive data, high resources required for transferring and aggregating the data, as well as a high risk associated with introducing a single point of failure. Leveraging such data while complying with data protection policies requires re-thinking data analytics methods for healthcare applications.

Federated learning (FL) offers a new paradigm for training a global machine learning model from data distributed across multiple data silos, eliminating the need for raw data sharing McMahan et al. (2016). Once a global model is shared across sites, each site trains the model based on its local data. The parameter updates of the local models are subsequently sent to an aggregation server and incorporated into the global model. This is repeated until a convergence criterion of the global model is satisfied. The merit of FL has been recently demonstrated in several real-world applications, including image classification Wang et al. (2019) and language modeling McMahan et al. (2016). It is especially relevant in healthcare applications, where data is rife with personal, highly-sensitive information, and data analysis methods must comply with regulatory requirements Brisimi et al. (2018).

In this paper, we propose, implement, and evaluate a FL framework for analyzing distributed healthcare data. FL provides a first level of privacy protection by training a global model without sharing raw data among sites. However, in certain cases, FL may be vulnerable to inference attacks  Bagdasaryan et al. (2018); Bonawitz et al. (2017); Geyer et al. (2017). Although state-of-the-art approaches to address such attacks are based on differential privacy Dwork et al. (2006a), their performance has not been investigated for healthcare applications. To this end, we extend the FL framework with a distributed differential privacy mechanism and investigate its performance.

We perform a comprehensive empirical evaluation using two real-world health datasets comprising electronic health records and administrative claims data of 1 million patients. We analyze the effect of different levels of privacy on the performance of the FL model and derive insights about the effectiveness of FL with and without differential privacy in healthcare applications. More specifically, we show that FL without differential privacy can provide model performance close to the hypothetical case where all data is centralized. We then study how -differential privacy impacts the performance of the produced global FL model for a given level of privacy. The results show that although differential privacy offers a strong level of privacy, it deteriorates the predictive capability of the produced global models due to the excessive amount of noise added during the distributed FL training process.

2 Methods

2.1 Federated learning

For the purpose of illustration, we consider classification algorithms that are amenable to gradient descent optimization. For a general binary classification problem, let the features, denoted by (for the feature), be drawn from a feature space . The corresponding labels are drawn from the label space . Let the features corresponding to positive labels be denoted by and those corresponding to negative labels by , that is

For any and , the objective of this classification is to construct a function such that

As discussed later, for our use cases of predicting adverse drug reaction and in-hospital mortality, we denote cases of ADR and in-hospital mortality as labels , and cases of non-ADR and non-mortality as . For a FL setup with sites, let , with feature set and corresponding label set be the local training data at site , where . Based on the use case, a global model is shared with each site, which trains the model on its local data , where . During local model training, based on the given learning rate, number of epochs, and batch, we compute average gradient () with respect to its current model parameter . We then compute weighted average to aggregate the parameter updates from the local models. The process is repeated until a convergence criterion, such as minimization of loss function, is satisfied. For further details on implementing a FL model for healthcare applications, we refer the readers to Choudhury et al. (2019).

2.2 Differential Privacy

Differential privacy Dwork et al. (2006b); Dwork (2011); Dwork et al. (2014) is a widely-used standard for privacy guarantee of algorithms operating on aggregated data. A randomized algorithm satisfies -differential privacy if for all datasets and , that differ by a single record, and for all sets , where is the range of ,

where , a privacy parameter, is a non-negative numbers. This implies that any single record in the dataset does not have a significant impact on the output distribution of the algorithm.

There are several methods for generating an approximation of that satisfies differential privacy. Based on the different approaches of adding noise, they can be categorized into input perturbation, output perturbation, exponential mechanism, and objective perturbation Sarwate and Chaudhuri (2013). Existing work on the application of differential privacy in machine learning have primarily focused on models that are trained on a centralized dataset Abadi et al. (2016); Ji et al. (2014); Sarwate and Chaudhuri (2013). As a recent advancement in privacy-preserving FL, the authors in Geyer et al. (2017) adopted output perturbation or sensitivity method. However, prior research has shown the effectiveness of objective perturbation, with theoretical guarantee, in outperforming the output perturbation approach Chaudhuri et al. (2011). Hence, in this work, we explore the potential of differential privacy, based on objective perturbation, in the context of FL. For the task of classification, we add noise to the objective function of the optimization to obtain a differentially private approximation. At each site, the noise is added to the objective function of the model to produce a minimizer of the perturbed objective. For a comprehensive background on differential privacy with objective perturbation, we refer the readers to Chaudhuri and Monteleoni (2009); Chaudhuri et al. (2011).

3 Evaluation

3.1 Use cases and data preparation

Developing FL models in a privacy-preserving manner is very important, especially in the context of healthcare, where patient data are extremely sensitive. To evaluate our proposed approach, we consider two major tasks for improving the health outcome of patients: (a) prediction of adverse drug reaction (ADR), and (b) prediction of mortality rate. ADR is a major cause of concern amongst medical practitioners, pharmaceutical industry, and the healthcare system333https://www.fda.gov/drugs/informationondrugs/ucm135151.htm. As healthcare data is distributed across data silos, obtaining a sufficiently large dataset to detect such rare events poses a challenge for centralized learning models. For the purpose of ADR prediction, we used Limited MarketScan Explorys Claims-EMR Data (LCED), which comprises administrative claims and electronic health records (EHRs) of over 1 million commercially insured patients. It consists of patient-level sensitive features, such as demographics, habits, diagnosis codes, outpatient prescription fills, laboratory results, and inpatient admission records. We selected patients who received a nonsteroidal anti-inflammatory drug (NSAID) to predict the development of peptic ulcer disease following the initiation of the drug. The selected cohort comprised samples.

For the second use case, we considered the task of modeling in-hospital patient mortality. An accurate and timely prediction of this outcome, particularly for patients admitted to an intensive care unit (ICU), can significantly improve quality of care. For this task, we used the Medical Information Mart for Intensive Care (MIMIC III) data Johnson et al. (2016b). MIMIC III is a benchmark data set, from where we derived multivariate time series from over ICU stays and labels to model mortality rate. As discussed in Harutyunyan et al. (2019), we selected physiological variables, including demographic details, each comprising different sample statistic features on different subsequences of a given time series, resulting in features per times series. The cohort consisted of ICU stays.

3.2 Experimental setup

For the tasks of ADR and mortality prediction, we used three classification algorithms, amenable to distributed solution using gradient descent, namely perceptron, support vector machine (SVM), and logistic regression. To evaluate the models, prior to and after employing privacy-preserving mechanism, we measure their utility in terms of F1 score. The models were trained on 70% of the data with 5-fold cross-validation. We considered 10 sites for the federated setup. All experiments were executed on an Intel(R) Xeon(R) E5-2683 v4 2.10 GHz CPU equipped with 16 cores and 64 GB of RAM. The results were reported following 10 rounds of iteration.

3.3 Comparative analysis

To establish benchmark results, we first evaluated the performance of the centralized learning models for the tasks of predicting ADR and in-hospital mortality. We then analyzed the performance of FL models trained on distributed data. For -differential privacy, we measured the privacy-utility trade-off for a given range of the privacy parameter . Figure 1 (a) and (b) present the utility, measured by F1 score, when differential privacy is applied using typical values of parameter , for LCED and MIMIC data, respectively. As increases, the level of privacy degrades, thereby improving the utility or predictive capability of the models. This is consistent across all three classification algorithms and for both datasets.

Figure 1: Effect of varying in -differential privacy for (a) ADR prediction using LCED, and (b) in-hospital mortality prediction using MIMIC data.
Figure 2: Comparison of F1 score with (a) LCED, and (b) MIMIC data, between centralized learning, FL, and FL with -differential privacy (Federated with DP), where .

We also compare and contrast the performance of centralized learning, FL, and FL with -differential privacy in terms of utility, for . This indicates the level of utility that can be attained for an acceptable range of the privacy parameter . As shown in Figure 2 (a) and (b), FL achieves comparable performance to centralized learning for both tasks of ADR and mortality prediction. Although differential privacy guarantees a given level of privacy, as set by parameter , it leads to a significant deterioration of the utility of the federated model.

It must be noted that existing studies advocating the use of differential privacy in a FL setup have not been performed on real data. Moreover, as discussed in Geyer et al. (2017), the model performance can only be preserved for a very large number of sites, in the order of 1000, but takes a severe hit in the case of fewer sites. Such an assumption of large-scale setup is not realistic for healthcare applications, where sites are typically hospitals or providers, and each site may not have sufficient data for deep learning models to be applicable. This drives the need to leverage data from other sites for constructing more accurate models. Hence, although differential privacy is widely adopted for preserving the privacy of machine learning models, it can yield lower utility in a FL setup, particularly in real-world healthcare applications. This necessitates exploring alternative privacy-preserving approaches that can achieve high model performance while offering sufficient privacy.

4 Conclusion

The availability of electronic health data poses several opportunities to leverage machine learning for deriving insightful data analytics. In this paper, we implemented a federated learning approach to mitigate the challenges associated with centralized learning. We further explored the potential of differential privacy in preserving the privacy of federated learning models, while maintaining high model performance in the context of real-world health applications. Through experimental evaluation, we show that although differential privacy is being readily adopted in a federated setup, it can lead to a significant loss in model performance for healthcare applications. This necessitates the research and proposal of alternative approaches towards offering privacy in federated learning for healthcare applications.


  • M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §2.2.
  • Agencia Espanola Proteccion Datos (2019) K-Anonymity as a Privacy Measure. Note: https://www.aepd.es/media/notas-tecnicas/nota-tecnica-kanonimidad-en.pdf Cited by: §1.
  • E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov (2018) How to backdoor federated learning. arXiv preprint arXiv:1807.00459. Cited by: §1.
  • A. L. Beam and I. S. Kohane (2018) Big data and machine learning in health care. Jama 319 (13), pp. 1317–1318. Cited by: §1.
  • K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth (2017) Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §1.
  • T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W. Shi (2018) Federated learning of predictive models from federated Electronic Health Records. International journal of medical informatics 112, pp. 59–67. Cited by: §1.
  • A. Chakrabarty, S. Zavitsanou, T. Sowrirajan, F. J. Doyle III, and E. Dassau (2019) Getting IoT-ready: The face of next generation artificial pancreas systems. In The Artificial Pancreas, pp. 29–57. Cited by: §1.
  • K. Chaudhuri, C. Monteleoni, and A. D. Sarwate (2011) Differentially private empirical risk minimization. Journal of Machine Learning Research 12 (Mar), pp. 1069–1109. Cited by: §2.2.
  • K. Chaudhuri and C. Monteleoni (2009) Privacy-preserving logistic regression. In Advances in neural information processing systems, pp. 289–296. Cited by: §2.2.
  • O. Choudhury, A. Chakrabarty, and S. J. Emrich (2018) Hecil: A hybrid error correction algorithm for long reads with iterative learning. Scientific reports 8 (1), pp. 9936. Cited by: §1.
  • O. Choudhury, Y. Park, T. Salonidis, A. Gkoulalas-Divanis, I. Sylla, and A. Das (2019) Predicting Adverse Drug Reactions on Distributed Health Data using Federated Learning. American Medical Informatics Association (AMIA). Cited by: §2.1.
  • C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor (2006a) Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503. Cited by: §1.
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006b) Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Cited by: §2.2.
  • C. Dwork, A. Roth, et al. (2014) The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: §2.2.
  • C. Dwork (2011) A firm foundation for private data analysis. Communications of the ACM 54 (1), pp. 86–95. Cited by: §2.2.
  • B. J. Erickson, P. Korfiatis, Z. Akkus, and T. L. Kline (2017) Machine learning for medical imaging. Radiographics 37 (2), pp. 505–515. Cited by: §1.
  • R. C. Geyer, T. Klein, and M. Nabi (2017) Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §1, §2.2, §3.3.
  • E. Gibson, W. Li, C. Sudre, L. Fidon, D. I. Shakir, G. Wang, Z. Eaton-Rosen, R. Gray, T. Doel, Y. Hu, et al. (2018) NiftyNet: a deep-learning platform for medical imaging. Computer methods and programs in biomedicine 158, pp. 113–122. Cited by: §1.
  • E. Gultepe, J. P. Green, H. Nguyen, J. Adams, T. Albertson, and I. Tagkopoulos (2013) From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. Journal of the American Medical Informatics Association 21 (2), pp. 315–325. Cited by: §1.
  • H. Harutyunyan, H. Khachatrian, D. C. Kale, G. Ver Steeg, and A. Galstyan (2019) Multitask learning and benchmarking with clinical time series data. Scientific data 6 (1), pp. 96. Cited by: §3.1.
  • Z. Ji, Z. C. Lipton, and C. Elkan (2014) Differential privacy and machine learning: a survey and review. arXiv preprint arXiv:1412.7584. Cited by: §2.2.
  • A. E. Johnson, M. M. Ghassemi, S. Nemati, K. E. Niehaus, D. A. Clifton, and G. D. Clifford (2016a) Machine learning and decision support in critical care. Proceedings of the IEEE. Institute of Electrical and Electronics Engineers 104 (2), pp. 444. Cited by: §1.
  • A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark (2016b) MIMIC-III, a freely accessible critical care database. Scientific data 3, pp. 160035. Cited by: §3.1.
  • K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis (2015) Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal 13, pp. 8–17. Cited by: §1.
  • M. W. Libbrecht and W. S. Noble (2015) Machine learning applications in genetics and genomics. Nature Reviews Genetics 16 (6), pp. 321. Cited by: §1.
  • H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al. (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629. Cited by: §1.
  • B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. (2014) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §1.
  • R. Poplin, A. V. Varadarajan, K. Blumer, Y. Liu, M. V. McConnell, G. S. Corrado, L. Peng, and D. R. Webster (2018) Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering 2 (3), pp. 158. Cited by: §1.
  • A. D. Sarwate and K. Chaudhuri (2013) Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE signal processing magazine 30 (5), pp. 86–94. Cited by: §2.2.
  • S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 37 (6), pp. 1205–1221. Cited by: §1.
  • T. Zheng, W. Xie, L. Xu, X. He, Y. Zhang, M. You, G. Yang, and Y. Chen (2017) A machine learning-based framework to identify type 2 diabetes through electronic health records. International journal of medical informatics 97, pp. 120–127. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description