Reliability Check via Weight Similarity in Privacy-Preserving Multi-Party Machine Learning

[

Abstract

Multi-party machine learning is a paradigm in which multiple participants collaboratively train a machine learning model to achieve a common learning objective without sharing their privately owned data. The paradigm has recently received a lot of attention from the research community aimed at addressing its associated privacy concerns. In this work, we focus on addressing the concerns of data privacy, model privacy, and data quality associated with privacy-preserving multi-party machine learning, i.e., we present a scheme for privacy-preserving collaborative learning that checks the participants’ data quality while guaranteeing data and model privacy. In particular, we propose a novel metric called weight similarity that is securely computed and used to check whether a participant can be categorized as a reliable participant (holds good quality data) or not. The problems of model and data privacy are tackled by integrating homomorphic encryption in our scheme and uploading encrypted weights, which prevent leakages to the server and malicious participants, respectively. The analytical and experimental evaluations of our scheme demonstrate that it is accurate and ensures data and model privacy.

P
\patchcmd\ps@pprintTitle

Preprint submitted to \@journal January 15, 2021 \csdefWGMwgm \csdefQEqe \csdefEPep \csdefPMSpms \csdefBECbec \csdefDEde

mode = title]Reliability Check via Weight Similarity in Privacy-Preserving Multi-Party Machine Learning

\cormark

[1]

\cormark

[1]

\cortext

[cor1]Corresponding author

rivacy preservation, machine learning, stochastic gradient descent, multi-party learning.

1 Introduction

Recently, machine learning techniques have achieved tremendous success in many applications (e.g., cancer detection [7, 18], face recognition [5, 9], speech recognition [21, 1], playing the game of Go [29], etc.), with their performances equaling or surpassing that of humans. Central to this success is the demand for large amounts of data used in the training of the models. However, because of legal issues, competitive advantage and privacy concerns, collecting data from multiple sources is a challenging task. For one, centrally gathered data might be permanently stored and used without the data owner’s knowledge, which is contrary for example to a regulation by EU [17] that grants data owners the right to ask companies to permanently delete their data. This results in data holding institutions’ reluctance to share their data. Hence, a great hindrance to the success of machine learning as models get trained on limited amounts of data.

The recent advances of multi-party machine learning [28, 26, 2, 14, 34, 27] have shown a promising potential in addressing the data scarcity challenge. In multi-party machine learning, multiple participants collaboratively train a common model while keeping their data private. The participants share the same model architecture and a common learning objective. The general training convention is that, first, global parameters are initialized and stored by a server. Each participant then downloads the global parameters for training its local model based on its private data and uploads the intermediate gradients or weights to the server for updating the global parameters. This is repeated until the common learning objective is achieved. Much as the paradigm prevents an attacker from having a direct access to participants’ data, there are still several ways it can be compromised to:

  1. Reveal a participant’s training data through, e.g., using a generative adversarial network by an attacker to deceive the participant into revealing detailed information during the training process [16], uploaded gradients during the collaborative training process [26], reverse-engineering the participant’s local model [25], etc.

  2. Reveal the trained model, which can lead to the recovery of information about the training data through model inversion attacks [33, 11], and

  3. Reduce the effectiveness and the training efficiency through injection of poor quality data during the training process.

In literature, the training data and model leakage problems are mainly addressed through differential privacy and homomorphic encryption. Specifically, with differential privacy, leakage is prevented by adding noise to the data item. A parameter referred to as the privacy budget is used to control the amount of the added noise and the achieved privacy level. In [28], a differential technique is used to add noise to the uploaded gradients during collaborative learning. To prevent privacy leakages due to high privacy budget consumption, Gong et al. [14] proposed a dynamic allocation of privacy budgets during multi-party learning. With homomorphic encryption, parameters are encrypted, and the algebraic operations for parameter updates are performed in an encrypted form. Additive homomorphic encryption algorithms are used for preventing information leakage to the central server in [26, 2]. Besides differential privacy and homomorphic encryption, Zhang et al. [32] utilized the threshold secret sharing technique during global parameter updates in collaborative learning, i.e., the global parameter updates are only effected when the number of uploaded intermediate parameters reach a certain threshold. In [27], a symmetric encryption technique is used to prevent information leakage to the central server during multi-party training. Here, the central server simply acts as a relay point to convey parameters to the next participant to continue with the training process using its local dataset. The same work proposed the upload of intermediate weights instead of gradients. They proved that unlike gradients, weights reveal no information about the training data.

The performance challenge attributed to poor data quality in privacy-preserving collaborative learning has remained an open problem, and to the best of our knowledge, the only work that has attempted to address it was conducted by Zhao et al. in [34]. In their work, a participant is categorized as a reliable participant (RP) or as an unreliable participant (UP). RPs hold good quality and similar data while UPs, which are assumed to be few in number, hold poor quality data. During the learning process, intermediate parameters are uploaded by the participants, and the server generates a utility score for each participant using a common validation dataset. The utility score shows the accuracy of each participant’s parameters, and it is used to determine the participants whose parameters are included during the global parameter update. Although this approach is demonstrated to be effective in minimizing the influence of UPs, it has some limitations. First, the semi-honest server used in the work knows the validation dataset that is utilized to compute the utility scores for the participants and that can provide a clue about the general training dataset. Second, the semi-honest server is not prevented from accessing the trained model, which can lead to the recovery of information regarding the training data used to train the model. In a public setting, e.g., a cloud setting, these exposures can be detrimental. Thus, fixing these limitations through a novel scheme is a necessity.

Motivated by the preceding observations, we design a privacy-preserving multi-party machine learning scheme with the following features: (1) the proposed scheme leaks no private information to an honest-but-curious server or any participant (RP or UP) involved in the learning process. (2) the proposed scheme leaks no information about the trained model to the honest-but-curious server. And, (3) the proposed scheme minimizes the disruptions caused by UPs during the collaborative learning process. To defend against the honest-but-curious server and the participants while minimizing the effects of poor quality data from UPs, we utilize two main techniques. First, the additively homomorphic Paillier algorithm [24], which allows algebraic addition and subtraction operations to be correctly performed on ciphertexts. This enables the server to update the global parameters using the ciphertexts uploaded by the participants. Next, we propose a metric called the weight similarity (more on this in section 4.1). In order to compute the weight similarity scores, we introduce an additional entity in our scheme called the model initiator. The model initiator is simply an RP who initiates the collaborative learning process. Similarity scores are computed between the model initiator’s parameters and the other participants’ parameters. Depending on whether the score is above a threshold, a participant’s parameters might be included or excluded during the global parameter updates. This way, the disruptive influence of the UPs can be minimized during multi-party machine learning. Finally, to defend against information leakage to malicious participants, each participant uploads encrypted intermediate weights instead of gradients in the proposed scheme. Weights leak no information as proved by Phong and Phuong proved in [27]. We summarize our contributions as follows.

  • To the best of our knowledge, we are the first to investigate the similarities between intermediate weights uploaded by participants in multi-party machine learning.

  • We design a novel privacy-preserving multi-party machine learning scheme that integrates homomorphic encryption and weight similarity scores to prevent leakages to the central server and the participants while minimizing the disruptive influence of UPs during the collaborative learning process.

  • We evaluate the performance of our proposed scheme on real-world datasets. The results demonstrate that our proposed scheme guarantees privacy and achieves high accuracy while being robust to UPs.

The rest of the paper is organized as follows: in section 2, we present the other related works. Section 3 discusses the preliminary concepts used in the work. In section 4, we present our proposed multi-party machine learning system with reliability check. The section also contains the discussion of our proposed weight similarity metric. The experiments are presented in section 5 and section 6 concludes the work.

2 Other Related Works

In [13], Gilad-Bachrach et al. proposed a system referred to as CryptoNets for performing predictions on encrypted data. To make a prediction on a data item, the homomorphically encrypted data item is fed to a model that is already trained. The fed data goes through the feed-forward process of machine learning and the prediction results are returned in an encrypted form. In [22], Ma et al. presented a scheme for predictions on encrypted data using non-interactive neural networks. In their scheme, an already-trained model is split into two parts and each part is given to a server. To perform a prediction on an encrypted data item, the data item is also split into two parts and each server receives a part. The two servers interact to generate a prediction for the fed data item. Our work differs from [13, 22] in the sense that, in our work, we aim at securely and accurately training the weights that can be used to perform predictions on data items, which is not the case in [13, 22] in which the weights are already trained and are simply used to perform predictions on encrypted data items.Â

Cao et al. [6] presented a scheme for synchronous and parallel privacy-preserving collaborative learning in which the participants only send their local cost values to the server during global parameter updates. The server utilizes the received cost values to identify the participant with the best local model at a training round and requests its parameters. The server then updates the global parameters using the requested parameters. The authors claim their scheme prevents information leakage. However, their work does not exploit the full benefits of collaborative learning, since it only depends on the best performer at each training round.

An integration of homomorphic encryption with proxy re-encryption for privacy-preserving multi-party learning is presented in [31]. In this work, each participant has a unique key, and every parameter encrypted by the participants during collaborative learning has to be transformed using a proxy key for updating the global parameters. Their system requires an additional server and incurs high communication overhead. Aspects of data quality are not considered in this work as well.

Bonawitz et al. [3] presented a scheme that securely aggregates data using a secret sharing scheme for privacy-preserving machine learning. However, their scheme has challenges with communication overhead. Other related works such as [12, 30] have employed differential privacy to hide statistical information during multi-party training. However, UPs are not considered in their designs.

3 Preliminaries

3.1 Homomorphic Encryption (HE)

HE is a form of encryption that allows algebraic operations to be correctly performed on ciphertexts with the result remaining in an encrypted form [14]. Several HE schemes have already been proposed [10], however, in this work we adopt the Paillier scheme [24] which is an additive HE scheme. The scheme is proved to be secure and has been widely used in privacy-preserving multi-party machine learning works. We summarize its properties as follows:Â

The Paillier scheme comprises three (3) algorithms: Key generation, Encryption, and Decryption algorithms. The Key generation algorithm generates the public and private keys. The public key is generated as (), where , are two large primes with , and . Meanwhile, the private key is generated as (), where and .

The Encryption algorithm is used to generate a ciphertext . For a message , is produced as , where .

The Decryption algorithm recovers from . Given , can be recovered as . The details can be viewed in [24].

The Paillier scheme supports unlimited homomorphic addition operations and limited multiplication operations. Thus, an addition operation on two encrypted messages and results in an encrypted sum of the two messages, i.e.,

(1)

where represents an encryption operation. Also, an encrypted message raised to power produces an encrypted product of and , i.e.,

(2)

where, is an unencrypted constant.

3.2 Machine Learning

This section presents a brief review of the machine learning algorithms considered in this work: logistic regression and neural networks [15].

Logistic Regression

Given a data item as with as the input and as the truth value, regression is learning a function such that [23]. In logistic regression, for a binary classification problem, the output value is bound between 0 and 1 through an activation function . Therefore, the function can be represented as , where is the weight coefficient vector. The activation function used in logistic regression is defined as, which is shown in Figure 1(a). The cost function used in logistic regression is the cross-entropy function [23]. In this case, we simply write the cost function over the data item as .

(a)
(b)
Figure 1: (a) A logistic activation function. (b) A neural network example.

Neural Networks

Neural networks generalize regressions to learn more complicated relationships in datasets. Figure 1(b) is an example neural network with four layers: an input layer, 2 hidden layers, and an output layer. Each node in the network is referred to as a neuron and is associated with an activation function . Examples of common activation functions used in neural networks are:

The +1 node represents bias. The nodes in neural networks are connected through the weight vectors W. Neural networks have an additional function called the cost function. Common examples of cost functions are cross-entropy function, squared error cost function, etc [27]. For a data item , we use the same representation for the cost function in neural networks.

Thus, in any machine learning algorithm, the task is to determine the weight parameter that minimizes the cost function for a given dataset.

3.3 Stochastic Gradient Descent (SGD)

SGD is an algorithm widely used in machine learning for approaching the global minimum of a function [23]. Given the weight parameters W, the SGD updates the parameters as:

(3)

where is the learning rate.

In practice, for efficiency, instead of selecting a single data item at a time, multiple data items are selected inform of a matrix (X,Y). The matrix (X,Y) is referred to as a mini-batch. Thus, the parameter updates are based on mini-batches as:

(4)

3.4 Threat Model

In our proposed system, we assume that the model initiator is honest-but-curious, i.e., it follows the steps the way they are but it might attempt to infer from the encrypted information. The server is also honest-but-curious and non-colluding, i.e., on top of being honest-but-curious, it does not collude with any participant to reveal information. And, the participants are malicious, i.e., a participant might attempt to infer another participant’s private data or intentionally upload false parameters to the server.

3.5 Similarity Computation

Several similarity measurement techniques such as euclidean distance, jaccard similarity, cosine similarity, etc., have already been used in different machine learning algorithms. In this section, we review the cosine similarity which is of interest in this work. The cosine similarity between two F-dimensional vectors and is computed as:

(5)

with in plaintext form. In an encrypted form, the cosine similarity can be computed as:

(6)

using the properties of Paillier scheme discussed earlier. The cosine similarity computation outputs a value in the range [-1, 1], with 1 indicating total similarity between the two vectors and -1 indicating total dissimilarity between the vectors.

4 Our Proposed Multi-Party Machine Learning System

In this section, we present our proposed privacy-preserving multi-party machine learning system with reliability check. We discuss the system entities and their roles and provide security, efficiency and effectiveness analysis of the system elements. But first, we discuss the weight similarity metric.

4.1 Weight Similarity

For reliability check, we propose a new metric called the weight similarity to measure the similarity between the participants’ and the model initiator’s datasets. To give a background on the weight similarity, let us look at the following example.

Consider two functions and . The gradients of the two functions and can be computed as and , respectively. Next, consider another function . Substituting with and gives us:

(7)

Therefore, if then and .

The above example depicts the relationship between the data items, gradients and weights in multi-party machine learning systems. The parameters and depict the data items of say two participants and . The functions and are the cost functions of and , respectively. Also, the gradients and depict the gradients generated by the participants and , respectively. The function depicts the weight update function of machine learning. Recall that, in a multi-party machine learning system, all the participants have the same model architecture. Thus, if the data items of the participants and are similar, there is a high likelihood that their generated gradients are similar. Thus, a parallel weight update by the participants and (i.e., a weight parameter updated by two different participants independently) using their respective gradients results in two similar weights. We exploit this property to identify RPs and UPs in our proposed system whose architecture is described in the next subsection.

4.2 System Architecture

Figure 2: A high-level architecture of our multi-party privacy-preserving machine learning with reliability check showing the involved entities.

The architecture of our proposed multi-party privacy-preserving machine learning with reliability check is depicted in Figure 2. The system consists of the following entities: a central server, a model initiator, and multiple participants (which includes RPs and UPs). The model initiator collaborates with the server to initialize the system. Thereafter, the privacy-preserving learning process during which the model initiator and participants upload their intermediate weight parameters to the server begins. The server uses these intermediate weights to update the earlier set global parameters. During the update, the server uses weight similarity scores to filter out (exclude) the weight parameters from UPs. The weight similarity score is collaboratively computed by the entities in a privacy-preserving manner. The description of the entities is as follows:

Server

The central server stores encrypted global parameters and makes them available to the model initiator and the participants for download. It then receives encrypted weight parameters from both the model initiator and the participants. It also receives encrypted weight similarity computation components and blinded similarity scores from the model initiator and the participants, respectively. The server updates the global parameters with the received weight parameters. Depending on the similarity score and the set threshold value, the server might include (or exclude) a participant’s weight parameters during the global parameter updates, i.e., the server filters out weight parameters from UPs when updating the global parameters. We assume the server to be honest-but-curious, i.e., it follows the algorithm the way it is, but it is curious about the data.

Model Initiator and Participants

The model initiator is an RP who sets the initial global parameters. The model initiator and the participants aim to learn a common model and thus, they share an identical model architecture and learning objective. They also share a homomorphic private key that is kept secret from the server. Therefore, E(.) depicts a homomorphic encryption operation, which protects the privacy of the exchanged parameters. The model initiator and the participants each keep their local datasets but only exchange encrypted intermediate parameters with the server. The exchanges happen at every communication round, which is decided by the server. For example, the server might schedule the model initiator and the participants to upload their intermediate parameters after every 10 local epochs. Thus, the training is done synchronously.

The model initiator runs two phases (initialization phase and learning phase), while the participants run only one phase (learning phase)1. During the initialization phase, the model initiator sets the initial parameters which it encrypts as E() and sends them to the server. Upon reception, the server sets E() as the global parameters E() and makes them available to the participants for download. The process is illustrated in Figure 3.

Figure 3: Interaction between the model initiator and the server during the initialization phase.

The model initiator and the participants then enter the learning phase and perform local training with their private datasets using the SGD and upload their encrypted intermediate weight parameters to the server at every communication round. The model initiator and the participants compute an additional component used by the server to establish a weight similarity score for each participant’s uploaded weights. The server uses the similarity score to determine if a participant’s parameters should be included in the global parameter updates. Next, the server updates the global parameters and makes them available to the participants, and the model initiator for the local training to continue. We present the detailed procedures in subsequent sections.

4.3 The Model Initiator Side Procedure

Figure 4: Learning process of the Model Initiator showing its interactions with the Server.

We show the pseudocode executed on the model initiator side in Algorithm 1. The model initiator has its own local dataset for conducting training using the standard SGD, however, the dataset is insufficient. As stated in the previous subsection, first, the model initiator initializes the weight parameters as . It then encrypts the parameters using the HE algorithm discussed in section 3.1 as E() and sends them to the server from where they are globalized as E() and made available for download by the participants. Note that the initialization phase is executed once. Â

Starting from , the model initiator then trains its local model by running the standard SGD on its insufficient local dataset to generate the intermediate weight parameter , which it encrypts as E(). It also generates a weight similarity computation component 2 which it encrypts as E() and forms the base part of Equation 6. The model initiator sends E() and E() to the server at each communication round.

Next, the model initiator downloads the updated global parameters E() from the server during the subsequent communication rounds, decrypts them using its private key and continues with the training process using its local dataset until the model improvement is minimal and all the participants have stopped. A diagrammatic illustration of the learning process (especially after the first communication round) is presented in Figure 4.

Security Analysis I

Theorem 1 (Security against an honest-but-curious model initiator): An honest-but-curious model initiator learns no information about the participants’ private data from the received global parameters in Algorithm 1 (line 5).

Proof: The updated global parameters are computed from the intermediate weight parameters of the participants and thus, an honest-but-curious model initiator learns no information about the private data of the participants.

Remark 1 (Regarding the privacy-preservation via weights): In [27], Phong and Phuong proved that participants’ private data cannot be retrieved from intermediate weight parameters. Therefore, since the model initiator only receives the global parameters computed from the intermediate weights of the participants, it cannot retrieve the private data of any participant.

1: Initialize the parameters as
2: Encrypt the initial parameters as E() and send them to the server
3: Download global parameters E() from the server and decrypt it as (except at the beginning when is used), perform local SGD based on the local dataset to generate the parameters and
4: Encrypt as E() and as E(), and send them to the server
5: Receive updated parameters E() from the server and decrypt them as
6: Repeat steps 3-5 until the validation loss is acceptably small and the other participants have dropped out of the training
7: Stop the training process
Algorithm 1 Model Initiator Side

4.4 The Server Side Procedure

The pseudocode of our scheme on the server side is shown in Algorithm 2. As mentioned in the preceding subsections, the server first receives the initial weight parameters E() from the model initiator. It then sets E() as the global parameter E() which it makes available to the participants for download. At each communication round, the server receives the intermediate parameters E()  and E() from the model initiator. E() is used during the global parameter update while E() is used for secure weight similarity score computation.

The server then initiates a secure computation of the weight similarity score between the model initiator’s and the participants’ intermediate weight parameters. To achieve this, the server first blinds E() by computing E() = E(), where . It then sends E() to the participants. The server waits for the participants computation and then receives an encrypted intermediate weight parameter E() and a blinded weight similarity score from each participant.

The server then computes the final weight similarity score as:

(8)

After computing the weight similarity scores for all the participants, the server updates the global parameters E() by averaging using Equation 9.

(9)

where is the number of participants whose weight similarity scores are above the threshold value , i.e., E() is only included in the parameter update if and only if the following condition holds for its corresponding weight similarity score,

(10)

where, is a threshold value, which can be fixed or increased dynamically at each communication round. The server finally makes the updated global parameters available for download by the participants and the model initiator to continue with their training processes.

1: Receive the initial parameters E() from the model initiator, sets E() as E() and make them available to the participants
2: Receive E() and E() from the model initiator
3: Compute (E()=E(), where
4: Send E() to the participants
5: Receive and E() from each participant
6: Compute similarity score for each participant as
7: Forall then
     Compute E()=E()E()E()=E()
    end Forall
8: Update the global parameters as E()=E(), where is the number of participants whose weight similarity score with the model initiator’s is above the threshold
9: Send E() to all the participants including the model initiator
10: Repeat steps 2-9 until the model initiator stops the training process
Algorithm 2 Server Side

Security Analysis II

Theorem 2 (Security against an honest-but-curious server): An honest-but-curious server learns no information about the trained model and the private datasets used in the training.

Proof: An honest-but-curious server only computes on the encrypted parameters from the model initiator and the participants. Therefore, it obtains no information about the model, and the local datasets of the model initiator and the participants since the encryption scheme used is secure.

Remark 2 (Regarding the Computation of Weight Similarity Score): The computation of weight similarity scores aims at identifying RPs. The computation involves an exchange of encrypted intermediate parameters between the model initiator and the participants through the server. Since the server does not have access to the private key, the exchanged intermediate parameters are kept secure from the server. It is only the final weight similarity score that gets revealed to the server.

4.5 The Participant Side Procedure

Shown in Algorithm 3 is the pseudocode of our scheme the participants execute. Like with the model initiator, each participant runs the standard SGD on its own local dataset. Each participant first downloads the global parameter E() from the server and decrypts it with the shared homomorphic private key. Each participant then runs the standard SGD on its local training dataset to generate the intermediate weight parameter at each communication round.

To compute the weight similarity score, each participant receives E() from the server. Next, each participant generates an encrypted and blinded weight similarity score by computing according to Equation 63. Each participant then encrypts as E() and decrypts E() as . Next, each participant sends E() and to the server.

Each participant then waits for the server to update the global parameters and then downloads the updated global parameters E() to continue with the training process using its local dataset. This is repeated until the accuracy improvement is minimal. However, a participant can decide to quit at any time. An illustration of the process is shown in Figure 5.

Figure 5: Learning process of a participant showing its interactions with the Server.
1: Download the global parameters E() from the server and decrypt it as
2: Perform local SGD based on the local dataset to generate
3: Receive E() from the server and compute E()=
4: Encrypt as E()
5: Decrypt E() as
6: Send E() and to the server
7: Repeat steps 1-6 until the validation is acceptably small
8: Drop out of the training process
Algorithm 3 Participant Side

Security Analysis III

Theorem 3 (Security against a malicious participant): A malicious participant learns no information about the model initiator’s parameters during the weight similarity score computation procedure in Algorithm 3 (line 3). Also, a malicious participant learns no information about the private data of the model initiator and the other participants from the updated global parameters received from the server.

Proof: The similarity computation component forwarded to the participant is blinded by the server with a non-zero random value in our system. Therefore, a malicious participant obtains no information regarding the true parameter values of the model initiator. The proof for the data privacy of the model initiator and the other participants is similar to the one of the model initiator procedure.

Remark 3 (Regarding Computation of the Weight Similarity Score Component by the Participant): The computation and decryption of the blinded weight similarity scores by the participants aim at enabling the server to securely update the global parameters using intermediate weight parameters from only RPs without having access to the private key. However, this comes with additional computation and communication overhead. We leave the question of revealing the weight similarity score to the server with minimal computation and communication overhead open for future considerations.

4.6 Effectiveness and Efficiency Analysis

Effectiveness

Here, we analyze the effectiveness of our proposed metric, which minimizes the disruptions caused by participants with noise data during privacy-preserving multi-party machine learning. Generally, the training of a machine learning model is a process for fine-tuning weights from random weights. The fine-tuning is guided by gradients that drive the learning process towards the local optimal solutions. In similar datasets, these gradients are similar and the update directions of the weights are almost the same. However, in the presence of noise, some update directions might be reversed, which contributes to weight similarities and dissimilarities between weight parameters of multiple participants in collaborative machine learning. Thus, setting a suitable weight similarity threshold from the beginning of the training minimizes the training disruption that would arise from the noise data of UPs. This threshold value can be fixed or dynamically raised as the learning tends towards the optimal solution. Our proposed approach does not guarantee any accuracy improvement on centralized training in which the dataset from all the participants is gathered centrally for training a model, but it guarantees improved convergence and reduced inaccuracies.

Efficiency

The efficiency of our proposed system can be analyzed from four perspectives. From the perspective of the model initiator, the privacy-preserving operation, and the similarity computation component and its encryption can be designed to run in parallel especially after the generation of the intermediate weight parameters to speed up the process. Thus, the similarity computation operation has a reduced impact on the training efficiency of the model initiator.

From the perspective of the server, we employ the additively homomorphic Paillier algorithm that supports only addition operations with limited multiplication operations, and thus, it is more efficient as compared to those that perform arbitrary algebraic operations.

From the perspective of the participants, similar to the model initiator, the privacy-preserving operation, and the weight similarity score generation and its decryption can be performed in parallel after the generation of the intermediate weight parameters. This minimizes the impact of the weight similarity computation on the training efficiency of the participants.

From the general view, all the participants and the model initiator can perform their local training using their local datasets simultaneously, which is equivalent to data parallelism. The main computational overhead only arises from the privacy-preserving operation using the Paillier algorithm. The computation demand for weight similarity score generation can be minimized through parallel computations.

5 Experiments

To demonstrate the applicability of our proposed system, in this section, we present the performance evaluation of our scheme with experiments on real-world datasets (MNIST [20] and CIFAR-10 [19]). We employ a desktop computer with an Intel(R) Core(TM) i5-6500 3.20GHz CPU, a GeForce GT 710 GPU, and 16GB RAM, running on Ubuntu 20.04 operating system for all the experiments. The Paillier algorithm library in [8] with key length of 1024 bits is used in the experiment.

We mainly compare our scheme with the Phung and Phuong scheme [27] (referred to as the PP system for simplicity in this section), and two baselines (centralized and stand-alone). In the centralized baseline, the model is trained on a centralized good quality dataset and is ought to achieve the best performance. Since there is no collaboration between different participants, no privacy-preserving mechanism is included during the training. Meanwhile, in the stand-alone baseline, the model is trained using only the free-noise local dataset. And since there is no collaboration involved, no privacy-preserving mechanism is considered during the model training as well. A reliability check is not considered in the PP system [27]. Thus, during the experimentation, weight parameters from all the participants are considered indiscriminately.

There are two categories of participants simulated in our proposed system, i.e., reliable participants (RP) and unreliable participants (UP). The local dataset of a RP is similar to the local dataset of the model initiator which is noise-free. Meanwhile, the local dataset of a UP is noisy, i.e., a fraction of UP’s local dataset is filled with noise. In this case, all the participants and the model initiator execute the same model architecture.

5.1 Experiments with the MNIST Dataset

In this section, we perform experiments with the MNIST dataset using a logistic regression model and a multi-layer perceptron (MLP) model.Â

Datasets

The MNIST dataset comprises 28 x 28 gray-scale handwritten digit images with a training set of 60,000 images and a test set of 10,000 images. In our experiment, we simulate four (4) RPs, two (2) UPs, and a model initiator, and the training and test set images are divided amongst the participants and the model initiator. Each RP is allocated 10,000 samples of the training set and 1,660 samples of the test set. Similarly, the model initiator is allocated 10,000 samples of the training set and 1,660 samples of the test set. For the UPs, each participant’s training set comprises 5,000 samples of the MNIST’s training set and 5,000 samples of noise data, and the test set comprises 850 samples of MNIST’s test set and 810 samples of noise data. In this case, we employ the notMINST dataset [4] as the noise data. The notMNIST dataset consists of gray-scale images of “A” to “J” letters formatted as 28 x 28 images, with a training set of 500,000 images and a test set of 19,000 images. The noise data of 5,000 for the training set and 810 for the test set for each UP is randomly sampled from the notMNIST’s training and test sets, respectively. All the images are normalized and centered during the experiment.

Using a Logistic Regression Model

We implemented a logistic regression model in Python using Theano 1.0.4. We set the random seeds as
numpy.random.seed(139) and random.seed(1234). The SGD is used with a fixed learning rate of 0.13 and a batch size of 128. All the participants and the model initiator run the same code. In a communication round, each participant and the model initiator runs 5 local epochs before encrypting and uploading intermediate parameters to the server. We set a fixed weight similarity score threshold of 0.05 during the training.

Results: The results of using the logistic regression model are shown in Figure 6. Figure 6(a) demonstrates the similarity between the participants’ and the model initiator’s weight parameters. The model initiator’s and the RPs’ weight parameters are highly similar, while the model initiator’s and the UPs’ weight parameters are less similar. Figure 6(b) and Figure 6(c) depict the training convergence against the number of communication rounds. Our proposed scheme achieves faster convergence as compared to the baseline schemes and the PP system which indiscriminately combines weight parameters from all the participants including the UPs during the global parameter updates. On the contrary, our system only considers weight parameters from RPs during the global parameter updates, i.e., it only uses the weight parameters from participants whose weight similarity score with the model initiator is above the threshold.

(a)
(b)
(c)
Figure 6: Results of logistic regression with MNIST dataset. (a) Weight similarities of the participants’ weights to the model initiator’s weight. (b) Test errors against communication round. (c) Accuracies against communication round.
(a) No. of local epochs = 5
(b) No. of local epochs = 10
(c) No. of local epochs = 50
Figure 7: Results of MLP with MNIST dataset showing convergence against the number of communication rounds at different number of local epochs.

Using an MLP

We also implemented an MLP model in Python and TensorFlow 2.1.0 with two hidden layers each with 64 neurons. We used ReLU as the activation function for the hidden layers and Sigmoid for the output layer. We set the random seeds as numpy.random.seed(12), random.seed(1234) and tensorflow.set_random_seed(12345). We used the SGD with a fixed learning rate of and a batch size of 64. All the participants and the model initiator run the same code. In each communication round, we vary the number of epochs in the local training. We also dynamically varied the similarity score in our experiment in the range of 0.1-0.7 in intervals of 0.1 after every 100 communication rounds. This is because, as the model tends towards convergence, the parameters from RPs become more similar.

Results: The results of using an MLP are shown in Figure 7. Our proposed system converges faster than the baseline schemes and the PP system. However, when the number of local epochs is increased to 50, there is a slight drop in the accuracy of our system. This could be due to the improper mixing of parameters. The PP system which does not filter out UPs converges slowly and it is the least accurate. The centralized baseline achieves the best accuracy as expected. However, the stand-alone baseline is slightly less accurate because of the inadequate amount of training data items. In terms of accuracy, our scheme is only slightly bettered by the centralized baseline scheme.

5.2 Experiments with the CIFAR-10 Dataset

We also perform experiments with the CIFAR-10 dataset using a logistic regression model and an MLP model.

Datasets

The CIFAR-10 dataset consists of 60,000 RGB images of 10 different classes formatted as 32 x 32 x 3. 50,000 of these images form the training set while 10,000 of the images form the test set. In this experiment, we simulate three (3) RPs, a single UP, and a model initiator. Each RP holds 11,000 samples of the CIFAR-10 training set as its local training set and 2,200 samples of the CIFAR-10 test set as its local test set. Similarly, the model initiator holds 11,000 samples of the CIFAR-10 training set as its local training set and 2,200 samples of the CIFAR-10 test set as its local test set. The UP holds the remaining 6,000 samples of the CIFAR-10 training set and 5,000 samples of a noise dataset as its local training set, and 1,200 samples of the CIFAR-10 test set and 1,000 samples of a noise dataset as its local test set. As in the first case, here, we employ the notMNIST dataset as the noise data. However, notMNIST dataset is formatted as 28 x 28, thus, to correctly use it with the CIFAR-10 dataset, we padded it with zeros to obtain the same dimensionality as that of CIFAR-10. Therefore, the noise data of 5,000 and 1,000 for training and test sets are extracted from the padded notMNIST dataset. All the images are normalized and centered.

Using a Logistic Regression Model

Using Python and Theano 1.0.4, we implemented a logistic regression model to demonstrate the applicability of our system on the CIFAR-10 dataset. In the experiment, random seeds are set as numpy.random.seed(15) and random.seed(123). We used the SGD in the learning process with a fixed learning rate of 0.01 and a batch size of 64 data items. All the participants and the model initiator execute the same model on their local datasets and upload their encrypted intermediate results to the server after every 5 local epochs. We set a fixed similarity score threshold of 0.03 for global parameter updates in this experiment.

Results: The results of the above experimental settings are shown in Figure 8. Figure 8(a) depicts the weight similarity score between participants’ and the model initiator’s weight parameters against the communication rounds. As expected, the UP’s and the model initiator’s weight parameters are the least similar. The RPs’ parameters are highly similar to the model initiator’s parameters since their local models generate similar gradients that affect the weight updates similarly. Figure 8(b) and Figure 8(c) depict the training convergence of the model against the number of communication rounds. Our proposed system converges faster and achieves an accuracy closer to the centralized baseline. The stand-alone baseline has limited data and hence lower accuracy. The PP system does not filter out UPs during the computation of global parameters and as a result, it converges slowly and achieves the least accuracy.

(a)
(b)
(c)
Figure 8: Results of logistic regression with CIFAR-10 dataset. (a) Weight similarities of the participants’ weights to the model initiator’s weight. (b) Test errors against communication round. (c) Accuracies against communication round.
(a) No. of local epochs = 5
(b) No. of local epochs = 10
(c) No. of local epochs = 50
Figure 9: Results of MLP with CIFAR-10 dataset showing convergence against the number of communication rounds at different number of local epochs.

Using an MLP

We also evaluated our system using the CIFAR-10 dataset by implementing a two hidden layer MLP model each with 64 and 128 neurons, respectively. The implementation was done with Python and TensorFlow 2.1.0. The ReLU activation function was used for the hidden layers and the sigmoid for the output layer. In the experiment, the random seeds are set as follows: numpy.random.seed(15), random.seed(123) and tensorflow.set_random_seed(12345). The SGD was used for learning with a batch size of 32 data items and a learning rate of . All the participants and the model initiator execute the same model on their local datasets and upload their encrypted intermediate results to the server with a varying number of local epochs. We dynamically varied the similarity score threshold in the range 0.05 to 0.95 in intervals of 0.15 after every 100 communication rounds during the global parameter updates by the server.

Results: The results of the MLP model with the CIFAR-10 dataset are presented in Figure 9. In Figure 9, we depict the training convergence of our system, PP system, and the baseline schemes against the communication rounds at different number of local epochs. In all the cases, our proposed system converges faster than all the other systems and attains an accuracy only bettered by the centralized baseline. However, at the local epoch of 50, the accuracy of our system slightly drops and as stated earlier, this could be due to improper mixing of parameters. The centralized baseline as expected achieves the best accuracy. The stand-alone baseline is less accurate as compared to ours and this is mainly because of the limited amount of the training data. The PP system does not filter out UPs and it indiscriminately includes parameters from all the participants during the global parameter updates which results in its slow convergence and the least accuracy.

5.3 Results of Similarity Computation

In Table 1, we present the execution time of the weight similarity score computation by the entities in our proposed scheme. We observe that the model initiator incurs the most computation overhead during the joint weight similarity score computation, which is mainly due to the encryption of its similarity computation component. The server simply performs a blinding operation and hence has the least computation overhead. The overhead for participants is associated with the multiplication operations they perform on the encrypted similarity computation components of the model initiator and the decryption of the final similarity score. Generally, the computation overhead is more for the neural network models as compared to the logistic regression models. This is because of the larger number of parameters associated with neural network models as compared to the logistic regression models.


Dataset and Model
MNIST CIFAR-10
Logistic Regression Neural Network Logistic Regression Neural Network
Model Initiator(s) 3.14 8.33 3.14 8.86

Participant(s)
2.36 6.98 2.38 7.52

Server(s)
0.19 0.63 0.19 0.65

Table 1: Run time for computation of weight similarity scores

6 Conclusion

In this work, we propose a multi-party privacy-preserving machine learning scheme that takes into account the data quality of the participants. The scheme utilizes the proposed weight similarity metric to filter out unreliable participants and integrates homomorphic encryption to prevent leakages to the server. In addition, participants upload their intermediate weights instead of gradients to prevent leakages to malicious participants. Therefore, our scheme is beneficial for privacy-preserving machine learning in environments where data quality matters. Possibilities for several future investigations are opened through this work. For instance, this scheme is designed to run synchronously, an asynchronous design can be a future possibility. Further investigation of the weight similarity under different untrainable parameters is another future possibility. Reducing the computation burden on the model initiator and the participants during weight similarity computation can be given more attention in the future.

Footnotes

  1. However, the model initiator’s and the participants’ learning phases are not entirely identical.
  2. .
  3. Where, which forms the power part of Equation 6 is an element of , is the similarity computation component of a participant and .

References

  1. S. Agarwalla and K. K. Sarma (2016) Machine learning based sample extraction for automatic speech recognition using dialectal assamese speech. Neural Networks 78, pp. 97–111. Cited by: §1.
  2. Y. Aono, T. Hayashi, L. T. Phong and L. Wang (2016) Privacy-preserving logistic regression with distributed data sources via homomorphic encryption. IEICE Transactions on Information and Systems 99 (8), pp. 2079–2089. Cited by: §1, §1.
  3. K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal and K. Seth (2017) Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §2.
  4. Y. Bulatov (2011) Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnist-dataset. html 2. Cited by: §5.1.1.
  5. B. Cao, N. Wang, X. Gao and J. Li (2018) Asymmetric joint learning for heterogeneous face recognition. In Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, pp. 6682–6689. Cited by: §1.
  6. T. Cao, T. Truong-Huu, H. Tran and K. Tran (2020) A federated learning framework for privacy-preserving and parallel training. arXiv preprint arXiv:2001.09782. Cited by: §2.
  7. A. Cruz-Roa, H. Gilmore, A. Basavanhally, M. Feldman, S. Ganesan, N. N. Shih, J. Tomaszewski, F. A. González and A. Madabhushi (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Scientific Reports 7, pp. 46450. Cited by: §1.
  8. CSIRO’s Data61 (2013) Python paillier library. GitHub. Note: \urlhttps://github.com/data61/python-paillier Cited by: §5.
  9. J. Deng, J. Guo, N. Xue and S. Zafeiriou (2019) Arcface: additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699. Cited by: §1.
  10. C. Fontaine and F. Galand (2007) A survey of homomorphic encryption for nonspecialists. EURASIP Journal on Information Security 2007 (1), pp. 013801. Cited by: §3.1.
  11. M. Fredrikson, S. Jha and T. Ristenpart (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. Cited by: item 2.
  12. R. C. Geyer, T. Klein and M. Nabi (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §2.
  13. R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig and J. Wernsing (2016) CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pp. 201–210. Cited by: §2.
  14. M. Gong, J. Feng and Y. Xie (2020) Privacy-enhanced multi-party deep learning. Neural Networks 121, pp. 484–496. Cited by: §1, §1, §3.1.
  15. T. Hastie, R. Tibshirani and J. Friedman (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. Cited by: §3.2.
  16. B. Hitaj, G. Ateniese and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, pp. 603–618. Cited by: item 1.
  17. Intersoft Consulting General data protection regulation (gdpr). Note: Accessed: Jun. 29, 2020 External Links: Link Cited by: §1.
  18. S. Kharya and S. Soni (2016) Weighted naive bayes classifier: a predictive model for breast cancer detection. International Journal of Computer Applications 133 (9), pp. 32–37. Cited by: §1.
  19. A. Krizhevsky, V. Nair and G. Hinton (2014) The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html 55. Cited by: §5.
  20. Y. LeCun, C. Cortes and C. Burges (2010) MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2. Cited by: §5.
  21. W. Lim, D. Jang and T. Lee (2016) Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. Cited by: §1.
  22. X. Ma, X. Chen and X. Zhang (2019) Non-interactive privacy-preserving neural network prediction. Information Sciences 481, pp. 507–519. Cited by: §2.
  23. P. Mohassel and Y. Zhang (2017) Secureml: a system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. Cited by: §3.2.1, §3.3.
  24. P. Paillier (1999) Public-key cryptosystems based on composite degree residuosity classes. In International Conference on the Theory and Applications of Cryptographic Techniques, pp. 223–238. Cited by: §1, §3.1, §3.1.
  25. M. Pathak, S. Rane and B. Raj (2010) Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems, Vancouver, British Colombia, Canada, pp. 1876–1884. Cited by: item 1.
  26. L. T. Phong, Y. Aono, T. Hayashi, L. Wang and S. Moriai (2018) Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security 13 (5), pp. 1333–1345. Cited by: item 1, §1, §1.
  27. L. T. Phong and T. T. Phuong (2019) Privacy-preserving deep learning via weight transmission. IEEE Transactions on Information Forensics and Security 14 (11), pp. 3003–3015. Cited by: §1, §1, §1, §3.2.2, §4.3.1, §5.
  28. R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, Colorado, USA, pp. 1310–1321. Cited by: §1, §1.
  29. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai and A. Bolton (2017) Mastering the game of go without human knowledge. nature 550 (7676), pp. 354–359. Cited by: §1.
  30. K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. Quek and H. V. Poor (2020) Federated learning with differential privacy: algorithms and performance analysis. IEEE Transactions on Information Forensics and Security. Cited by: §2.
  31. X. Zhang, X. Chen, J. K. Liu and Y. Xiang (2019) DeepPAR and deepdpa: privacy preserving and asynchronous deep learning for industrial iot. IEEE Transactions on Industrial Informatics 16 (3), pp. 2081–2090. Cited by: §2.
  32. X. Zhang, S. Ji, H. Wang and T. Wang (2017) Private, yet practical, multiparty deep learning. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, pp. 1442–1452. Cited by: §1.
  33. Y. Zhang, R. Jia, H. Pei, W. Wang, B. Li and D. Song (2020) The secret revealer: generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 253–261. Cited by: item 2.
  34. L. Zhao, Q. Wang, Q. Zou, Y. Zhang and Y. Chen (2019) Privacy-preserving collaborative deep learning with unreliable participants. IEEE Transactions on Information Forensics and Security 15, pp. 1486–1500. Cited by: §1, §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
425847
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description