[
Abstract
Multiparty machine learning is a paradigm in which multiple participants collaboratively train a machine learning model to achieve a common learning objective without sharing their privately owned data. The paradigm has recently received a lot of attention from the research community aimed at addressing its associated privacy concerns. In this work, we focus on addressing the concerns of data privacy, model privacy, and data quality associated with privacypreserving multiparty machine learning, i.e., we present a scheme for privacypreserving collaborative learning that checks the participants’ data quality while guaranteeing data and model privacy. In particular, we propose a novel metric called weight similarity that is securely computed and used to check whether a participant can be categorized as a reliable participant (holds good quality data) or not. The problems of model and data privacy are tackled by integrating homomorphic encryption in our scheme and uploading encrypted weights, which prevent leakages to the server and malicious participants, respectively. The analytical and experimental evaluations of our scheme demonstrate that it is accurate and ensures data and model privacy.
Preprint submitted to \@journal January 15, 2021 \csdefWGMwgm \csdefQEqe \csdefEPep \csdefPMSpms \csdefBECbec \csdefDEde
mode = title]Reliability Check via Weight Similarity in PrivacyPreserving MultiParty Machine Learning
[1]
[1]
[cor1]Corresponding author
rivacy preservation, machine learning, stochastic gradient descent, multiparty learning.
1 Introduction
Recently, machine learning techniques have achieved tremendous success in many applications (e.g., cancer detection [7, 18], face recognition [5, 9], speech recognition [21, 1], playing the game of Go [29], etc.), with their performances equaling or surpassing that of humans. Central to this success is the demand for large amounts of data used in the training of the models. However, because of legal issues, competitive advantage and privacy concerns, collecting data from multiple sources is a challenging task. For one, centrally gathered data might be permanently stored and used without the data owner’s knowledge, which is contrary for example to a regulation by EU [17] that grants data owners the right to ask companies to permanently delete their data. This results in data holding institutions’ reluctance to share their data. Hence, a great hindrance to the success of machine learning as models get trained on limited amounts of data.
The recent advances of multiparty machine learning [28, 26, 2, 14, 34, 27] have shown a promising potential in addressing the data scarcity challenge. In multiparty machine learning, multiple participants collaboratively train a common model while keeping their data private. The participants share the same model architecture and a common learning objective. The general training convention is that, first, global parameters are initialized and stored by a server. Each participant then downloads the global parameters for training its local model based on its private data and uploads the intermediate gradients or weights to the server for updating the global parameters. This is repeated until the common learning objective is achieved. Much as the paradigm prevents an attacker from having a direct access to participants’ data, there are still several ways it can be compromised to:

Reveal a participant’s training data through, e.g., using a generative adversarial network by an attacker to deceive the participant into revealing detailed information during the training process [16], uploaded gradients during the collaborative training process [26], reverseengineering the participant’s local model [25], etc.

Reduce the effectiveness and the training efficiency through injection of poor quality data during the training process.
In literature, the training data and model leakage problems are mainly addressed through differential privacy and homomorphic encryption. Specifically, with differential privacy, leakage is prevented by adding noise to the data item. A parameter referred to as the privacy budget is used to control the amount of the added noise and the achieved privacy level. In [28], a differential technique is used to add noise to the uploaded gradients during collaborative learning. To prevent privacy leakages due to high privacy budget consumption, Gong et al. [14] proposed a dynamic allocation of privacy budgets during multiparty learning. With homomorphic encryption, parameters are encrypted, and the algebraic operations for parameter updates are performed in an encrypted form. Additive homomorphic encryption algorithms are used for preventing information leakage to the central server in [26, 2]. Besides differential privacy and homomorphic encryption, Zhang et al. [32] utilized the threshold secret sharing technique during global parameter updates in collaborative learning, i.e., the global parameter updates are only effected when the number of uploaded intermediate parameters reach a certain threshold. In [27], a symmetric encryption technique is used to prevent information leakage to the central server during multiparty training. Here, the central server simply acts as a relay point to convey parameters to the next participant to continue with the training process using its local dataset. The same work proposed the upload of intermediate weights instead of gradients. They proved that unlike gradients, weights reveal no information about the training data.
The performance challenge attributed to poor data quality in privacypreserving collaborative learning has remained an open problem, and to the best of our knowledge, the only work that has attempted to address it was conducted by Zhao et al. in [34]. In their work, a participant is categorized as a reliable participant (RP) or as an unreliable participant (UP). RPs hold good quality and similar data while UPs, which are assumed to be few in number, hold poor quality data. During the learning process, intermediate parameters are uploaded by the participants, and the server generates a utility score for each participant using a common validation dataset. The utility score shows the accuracy of each participant’s parameters, and it is used to determine the participants whose parameters are included during the global parameter update. Although this approach is demonstrated to be effective in minimizing the influence of UPs, it has some limitations. First, the semihonest server used in the work knows the validation dataset that is utilized to compute the utility scores for the participants and that can provide a clue about the general training dataset. Second, the semihonest server is not prevented from accessing the trained model, which can lead to the recovery of information regarding the training data used to train the model. In a public setting, e.g., a cloud setting, these exposures can be detrimental. Thus, fixing these limitations through a novel scheme is a necessity.
Motivated by the preceding observations, we design a privacypreserving multiparty machine learning scheme with the following features: (1) the proposed scheme leaks no private information to an honestbutcurious server or any participant (RP or UP) involved in the learning process. (2) the proposed scheme leaks no information about the trained model to the honestbutcurious server. And, (3) the proposed scheme minimizes the disruptions caused by UPs during the collaborative learning process. To defend against the honestbutcurious server and the participants while minimizing the effects of poor quality data from UPs, we utilize two main techniques. First, the additively homomorphic Paillier algorithm [24], which allows algebraic addition and subtraction operations to be correctly performed on ciphertexts. This enables the server to update the global parameters using the ciphertexts uploaded by the participants. Next, we propose a metric called the weight similarity (more on this in section 4.1). In order to compute the weight similarity scores, we introduce an additional entity in our scheme called the model initiator. The model initiator is simply an RP who initiates the collaborative learning process. Similarity scores are computed between the model initiator’s parameters and the other participants’ parameters. Depending on whether the score is above a threshold, a participant’s parameters might be included or excluded during the global parameter updates. This way, the disruptive influence of the UPs can be minimized during multiparty machine learning. Finally, to defend against information leakage to malicious participants, each participant uploads encrypted intermediate weights instead of gradients in the proposed scheme. Weights leak no information as proved by Phong and Phuong proved in [27]. We summarize our contributions as follows.

To the best of our knowledge, we are the first to investigate the similarities between intermediate weights uploaded by participants in multiparty machine learning.

We design a novel privacypreserving multiparty machine learning scheme that integrates homomorphic encryption and weight similarity scores to prevent leakages to the central server and the participants while minimizing the disruptive influence of UPs during the collaborative learning process.

We evaluate the performance of our proposed scheme on realworld datasets. The results demonstrate that our proposed scheme guarantees privacy and achieves high accuracy while being robust to UPs.
The rest of the paper is organized as follows: in section 2, we present the other related works. Section 3 discusses the preliminary concepts used in the work. In section 4, we present our proposed multiparty machine learning system with reliability check. The section also contains the discussion of our proposed weight similarity metric. The experiments are presented in section 5 and section 6 concludes the work.
2 Other Related Works
In [13], GiladBachrach et al. proposed a system referred to as CryptoNets for performing predictions on encrypted data. To make a prediction on a data item, the homomorphically encrypted data item is fed to a model that is already trained. The fed data goes through the feedforward process of machine learning and the prediction results are returned in an encrypted form. In [22], Ma et al. presented a scheme for predictions on encrypted data using noninteractive neural networks. In their scheme, an alreadytrained model is split into two parts and each part is given to a server. To perform a prediction on an encrypted data item, the data item is also split into two parts and each server receives a part. The two servers interact to generate a prediction for the fed data item. Our work differs from [13, 22] in the sense that, in our work, we aim at securely and accurately training the weights that can be used to perform predictions on data items, which is not the case in [13, 22] in which the weights are already trained and are simply used to perform predictions on encrypted data items.Â
Cao et al. [6] presented a scheme for synchronous and parallel privacypreserving collaborative learning in which the participants only send their local cost values to the server during global parameter updates. The server utilizes the received cost values to identify the participant with the best local model at a training round and requests its parameters. The server then updates the global parameters using the requested parameters. The authors claim their scheme prevents information leakage. However, their work does not exploit the full benefits of collaborative learning, since it only depends on the best performer at each training round.
An integration of homomorphic encryption with proxy reencryption for privacypreserving multiparty learning is presented in [31]. In this work, each participant has a unique key, and every parameter encrypted by the participants during collaborative learning has to be transformed using a proxy key for updating the global parameters. Their system requires an additional server and incurs high communication overhead. Aspects of data quality are not considered in this work as well.
Bonawitz et al. [3] presented a scheme that securely aggregates data using a secret sharing scheme for privacypreserving machine learning. However, their scheme has challenges with communication overhead. Other related works such as [12, 30] have employed differential privacy to hide statistical information during multiparty training. However, UPs are not considered in their designs.
3 Preliminaries
3.1 Homomorphic Encryption (HE)
HE is a form of encryption that allows algebraic operations to be correctly performed on ciphertexts with the result remaining in an encrypted form [14]. Several HE schemes have already been proposed [10], however, in this work we adopt the Paillier scheme [24] which is an additive HE scheme. The scheme is proved to be secure and has been widely used in privacypreserving multiparty machine learning works. We summarize its properties as follows:Â
The Paillier scheme comprises three (3) algorithms: Key generation, Encryption, and Decryption algorithms. The Key generation algorithm generates the public and private keys. The public key is generated as (), where , are two large primes with , and . Meanwhile, the private key is generated as (), where and .
The Encryption algorithm is used to generate a ciphertext . For a message , is produced as , where .
The Decryption algorithm recovers from . Given , can be recovered as . The details can be viewed in [24].
The Paillier scheme supports unlimited homomorphic addition operations and limited multiplication operations. Thus, an addition operation on two encrypted messages and results in an encrypted sum of the two messages, i.e.,
(1) 
where represents an encryption operation. Also, an encrypted message raised to power produces an encrypted product of and , i.e.,
(2) 
where, is an unencrypted constant.
3.2 Machine Learning
This section presents a brief review of the machine learning algorithms considered in this work: logistic regression and neural networks [15].
Logistic Regression
Given a data item as with as the input and as the truth value, regression is learning a function such that [23]. In logistic regression, for a binary classification problem, the output value is bound between 0 and 1 through an activation function . Therefore, the function can be represented as , where is the weight coefficient vector. The activation function used in logistic regression is defined as, which is shown in Figure 1(a). The cost function used in logistic regression is the crossentropy function [23]. In this case, we simply write the cost function over the data item as .
Neural Networks
Neural networks generalize regressions to learn more complicated relationships in datasets. Figure 1(b) is an example neural network with four layers: an input layer, 2 hidden layers, and an output layer. Each node in the network is referred to as a neuron and is associated with an activation function . Examples of common activation functions used in neural networks are:
The +1 node represents bias. The nodes in neural networks are connected through the weight vectors W. Neural networks have an additional function called the cost function. Common examples of cost functions are crossentropy function, squared error cost function, etc [27]. For a data item , we use the same representation for the cost function in neural networks.
Thus, in any machine learning algorithm, the task is to determine the weight parameter that minimizes the cost function for a given dataset.
3.3 Stochastic Gradient Descent (SGD)
SGD is an algorithm widely used in machine learning for approaching the global minimum of a function [23]. Given the weight parameters W, the SGD updates the parameters as:
(3) 
where is the learning rate.
In practice, for efficiency, instead of selecting a single data item at a time, multiple data items are selected inform of a matrix (X,Y). The matrix (X,Y) is referred to as a minibatch. Thus, the parameter updates are based on minibatches as:
(4) 
3.4 Threat Model
In our proposed system, we assume that the model initiator is honestbutcurious, i.e., it follows the steps the way they are but it might attempt to infer from the encrypted information. The server is also honestbutcurious and noncolluding, i.e., on top of being honestbutcurious, it does not collude with any participant to reveal information. And, the participants are malicious, i.e., a participant might attempt to infer another participant’s private data or intentionally upload false parameters to the server.
3.5 Similarity Computation
Several similarity measurement techniques such as euclidean distance, jaccard similarity, cosine similarity, etc., have already been used in different machine learning algorithms. In this section, we review the cosine similarity which is of interest in this work. The cosine similarity between two Fdimensional vectors and is computed as:
(5) 
with in plaintext form. In an encrypted form, the cosine similarity can be computed as:
(6) 
using the properties of Paillier scheme discussed earlier. The cosine similarity computation outputs a value in the range [1, 1], with 1 indicating total similarity between the two vectors and 1 indicating total dissimilarity between the vectors.
4 Our Proposed MultiParty Machine Learning System
In this section, we present our proposed privacypreserving multiparty machine learning system with reliability check. We discuss the system entities and their roles and provide security, efficiency and effectiveness analysis of the system elements. But first, we discuss the weight similarity metric.
4.1 Weight Similarity
For reliability check, we propose a new metric called the weight similarity to measure the similarity between the participants’ and the model initiator’s datasets. To give a background on the weight similarity, let us look at the following example.
Consider two functions and . The gradients of the two functions and can be computed as and , respectively. Next, consider another function . Substituting with and gives us:
(7) 
Therefore, if then and .
The above example depicts the relationship between the data items, gradients and weights in multiparty machine learning systems. The parameters and depict the data items of say two participants and . The functions and are the cost functions of and , respectively. Also, the gradients and depict the gradients generated by the participants and , respectively. The function depicts the weight update function of machine learning. Recall that, in a multiparty machine learning system, all the participants have the same model architecture. Thus, if the data items of the participants and are similar, there is a high likelihood that their generated gradients are similar. Thus, a parallel weight update by the participants and (i.e., a weight parameter updated by two different participants independently) using their respective gradients results in two similar weights. We exploit this property to identify RPs and UPs in our proposed system whose architecture is described in the next subsection.
4.2 System Architecture
The architecture of our proposed multiparty privacypreserving machine learning with reliability check is depicted in Figure 2. The system consists of the following entities: a central server, a model initiator, and multiple participants (which includes RPs and UPs). The model initiator collaborates with the server to initialize the system. Thereafter, the privacypreserving learning process during which the model initiator and participants upload their intermediate weight parameters to the server begins. The server uses these intermediate weights to update the earlier set global parameters. During the update, the server uses weight similarity scores to filter out (exclude) the weight parameters from UPs. The weight similarity score is collaboratively computed by the entities in a privacypreserving manner. The description of the entities is as follows:
Server
The central server stores encrypted global parameters and makes them available to the model initiator and the participants for download. It then receives encrypted weight parameters from both the model initiator and the participants. It also receives encrypted weight similarity computation components and blinded similarity scores from the model initiator and the participants, respectively. The server updates the global parameters with the received weight parameters. Depending on the similarity score and the set threshold value, the server might include (or exclude) a participant’s weight parameters during the global parameter updates, i.e., the server filters out weight parameters from UPs when updating the global parameters. We assume the server to be honestbutcurious, i.e., it follows the algorithm the way it is, but it is curious about the data.
Model Initiator and Participants
The model initiator is an RP who sets the initial global parameters. The model initiator and the participants aim to learn a common model and thus, they share an identical model architecture and learning objective. They also share a homomorphic private key that is kept secret from the server. Therefore, E(.) depicts a homomorphic encryption operation, which protects the privacy of the exchanged parameters. The model initiator and the participants each keep their local datasets but only exchange encrypted intermediate parameters with the server. The exchanges happen at every communication round, which is decided by the server. For example, the server might schedule the model initiator and the participants to upload their intermediate parameters after every 10 local epochs. Thus, the training is done synchronously.
The model initiator runs two phases (initialization phase and learning phase), while the participants run only one phase (learning phase)
The model initiator and the participants then enter the learning phase and perform local training with their private datasets using the SGD and upload their encrypted intermediate weight parameters to the server at every communication round. The model initiator and the participants compute an additional component used by the server to establish a weight similarity score for each participant’s uploaded weights. The server uses the similarity score to determine if a participant’s parameters should be included in the global parameter updates. Next, the server updates the global parameters and makes them available to the participants, and the model initiator for the local training to continue. We present the detailed procedures in subsequent sections.
4.3 The Model Initiator Side Procedure
We show the pseudocode executed on the model initiator side in Algorithm 1. The model initiator has its own local dataset for conducting training using the standard SGD, however, the dataset is insufficient. As stated in the previous subsection, first, the model initiator initializes the weight parameters as . It then encrypts the parameters using the HE algorithm discussed in section 3.1 as E() and sends them to the server from where they are globalized as E() and made available for download by the participants. Note that the initialization phase is executed once. Â
Starting from , the model initiator then trains its local model by running the standard SGD on its insufficient local dataset to generate the intermediate weight parameter , which it encrypts as E(). It also generates a weight similarity computation component
Next, the model initiator downloads the updated global parameters E() from the server during the subsequent communication rounds, decrypts them using its private key and continues with the training process using its local dataset until the model improvement is minimal and all the participants have stopped. A diagrammatic illustration of the learning process (especially after the first communication round) is presented in Figure 4.
Security Analysis I
Theorem 1 (Security against an honestbutcurious model initiator): An honestbutcurious model initiator learns no information about the participants’ private data from the received global parameters in Algorithm 1 (line 5).
Proof: The updated global parameters are computed from the intermediate weight parameters of the participants and thus, an honestbutcurious model initiator learns no information about the private data of the participants.
Remark 1 (Regarding the privacypreservation via weights): In [27], Phong and Phuong proved that participants’ private data cannot be retrieved from intermediate weight parameters. Therefore, since the model initiator only receives the global parameters computed from the intermediate weights of the participants, it cannot retrieve the private data of any participant.
4.4 The Server Side Procedure
The pseudocode of our scheme on the server side is shown in Algorithm 2. As mentioned in the preceding subsections, the server first receives the initial weight parameters E() from the model initiator. It then sets E() as the global parameter E() which it makes available to the participants for download. At each communication round, the server receives the intermediate parameters E()Â and E() from the model initiator. E() is used during the global parameter update while E() is used for secure weight similarity score computation.
The server then initiates a secure computation of the weight similarity score between the model initiator’s and the participants’ intermediate weight parameters. To achieve this, the server first blinds E() by computing E() = E(), where . It then sends E() to the participants. The server waits for the participants computation and then receives an encrypted intermediate weight parameter E() and a blinded weight similarity score from each participant.
The server then computes the final weight similarity score as:
(8) 
After computing the weight similarity scores for all the participants, the server updates the global parameters E() by averaging using Equation 9.
(9) 
where is the number of participants whose weight similarity scores are above the threshold value , i.e., E() is only included in the parameter update if and only if the following condition holds for its corresponding weight similarity score,
(10) 
where, is a threshold value, which can be fixed or increased dynamically at each communication round. The server finally makes the updated global parameters available for download by the participants and the model initiator to continue with their training processes.
Security Analysis II
Theorem 2 (Security against an honestbutcurious server): An honestbutcurious server learns no information about the trained model and the private datasets used in the training.
Proof: An honestbutcurious server only computes on the encrypted parameters from the model initiator and the participants. Therefore, it obtains no information about the model, and the local datasets of the model initiator and the participants since the encryption scheme used is secure.
Remark 2 (Regarding the Computation of Weight Similarity Score): The computation of weight similarity scores aims at identifying RPs. The computation involves an exchange of encrypted intermediate parameters between the model initiator and the participants through the server. Since the server does not have access to the private key, the exchanged intermediate parameters are kept secure from the server. It is only the final weight similarity score that gets revealed to the server.
4.5 The Participant Side Procedure
Shown in Algorithm 3 is the pseudocode of our scheme the participants execute. Like with the model initiator, each participant runs the standard SGD on its own local dataset. Each participant first downloads the global parameter E() from the server and decrypts it with the shared homomorphic private key. Each participant then runs the standard SGD on its local training dataset to generate the intermediate weight parameter at each communication round.
To compute the weight similarity score, each participant receives E() from the server. Next, each participant generates an encrypted and blinded weight similarity score by computing according to Equation 6
Each participant then waits for the server to update the global parameters and then downloads the updated global parameters E() to continue with the training process using its local dataset. This is repeated until the accuracy improvement is minimal. However, a participant can decide to quit at any time. An illustration of the process is shown in Figure 5.
Security Analysis III
Theorem 3 (Security against a malicious participant): A malicious participant learns no information about the model initiator’s parameters during the weight similarity score computation procedure in Algorithm 3 (line 3). Also, a malicious participant learns no information about the private data of the model initiator and the other participants from the updated global parameters received from the server.
Proof: The similarity computation component forwarded to the participant is blinded by the server with a nonzero random value in our system. Therefore, a malicious participant obtains no information regarding the true parameter values of the model initiator. The proof for the data privacy of the model initiator and the other participants is similar to the one of the model initiator procedure.
Remark 3 (Regarding Computation of the Weight Similarity Score Component by the Participant): The computation and decryption of the blinded weight similarity scores by the participants aim at enabling the server to securely update the global parameters using intermediate weight parameters from only RPs without having access to the private key. However, this comes with additional computation and communication overhead. We leave the question of revealing the weight similarity score to the server with minimal computation and communication overhead open for future considerations.
4.6 Effectiveness and Efficiency Analysis
Effectiveness
Here, we analyze the effectiveness of our proposed metric, which minimizes the disruptions caused by participants with noise data during privacypreserving multiparty machine learning. Generally, the training of a machine learning model is a process for finetuning weights from random weights. The finetuning is guided by gradients that drive the learning process towards the local optimal solutions. In similar datasets, these gradients are similar and the update directions of the weights are almost the same. However, in the presence of noise, some update directions might be reversed, which contributes to weight similarities and dissimilarities between weight parameters of multiple participants in collaborative machine learning. Thus, setting a suitable weight similarity threshold from the beginning of the training minimizes the training disruption that would arise from the noise data of UPs. This threshold value can be fixed or dynamically raised as the learning tends towards the optimal solution. Our proposed approach does not guarantee any accuracy improvement on centralized training in which the dataset from all the participants is gathered centrally for training a model, but it guarantees improved convergence and reduced inaccuracies.
Efficiency
The efficiency of our proposed system can be analyzed from four perspectives. From the perspective of the model initiator, the privacypreserving operation, and the similarity computation component and its encryption can be designed to run in parallel especially after the generation of the intermediate weight parameters to speed up the process. Thus, the similarity computation operation has a reduced impact on the training efficiency of the model initiator.
From the perspective of the server, we employ the additively homomorphic Paillier algorithm that supports only addition operations with limited multiplication operations, and thus, it is more efficient as compared to those that perform arbitrary algebraic operations.
From the perspective of the participants, similar to the model initiator, the privacypreserving operation, and the weight similarity score generation and its decryption can be performed in parallel after the generation of the intermediate weight parameters. This minimizes the impact of the weight similarity computation on the training efficiency of the participants.
From the general view, all the participants and the model initiator can perform their local training using their local datasets simultaneously, which is equivalent to data parallelism. The main computational overhead only arises from the privacypreserving operation using the Paillier algorithm. The computation demand for weight similarity score generation can be minimized through parallel computations.
5 Experiments
To demonstrate the applicability of our proposed system, in this section, we present the performance evaluation of our scheme with experiments on realworld datasets (MNIST [20] and CIFAR10 [19]). We employ a desktop computer with an Intel(R) Core(TM) i56500 3.20GHz CPU, a GeForce GT 710 GPU, and 16GB RAM, running on Ubuntu 20.04 operating system for all the experiments. The Paillier algorithm library in [8] with key length of 1024 bits is used in the experiment.
We mainly compare our scheme with the Phung and Phuong scheme [27] (referred to as the PP system for simplicity in this section), and two baselines (centralized and standalone). In the centralized baseline, the model is trained on a centralized good quality dataset and is ought to achieve the best performance. Since there is no collaboration between different participants, no privacypreserving mechanism is included during the training. Meanwhile, in the standalone baseline, the model is trained using only the freenoise local dataset. And since there is no collaboration involved, no privacypreserving mechanism is considered during the model training as well. A reliability check is not considered in the PP system [27]. Thus, during the experimentation, weight parameters from all the participants are considered indiscriminately.
There are two categories of participants simulated in our proposed system, i.e., reliable participants (RP) and unreliable participants (UP). The local dataset of a RP is similar to the local dataset of the model initiator which is noisefree. Meanwhile, the local dataset of a UP is noisy, i.e., a fraction of UP’s local dataset is filled with noise. In this case, all the participants and the model initiator execute the same model architecture.
5.1 Experiments with the MNIST Dataset
In this section, we perform experiments with the MNIST dataset using a logistic regression model and a multilayer perceptron (MLP) model.Â
Datasets
The MNIST dataset comprises 28 x 28 grayscale handwritten digit images with a training set of 60,000 images and a test set of 10,000 images. In our experiment, we simulate four (4) RPs, two (2) UPs, and a model initiator, and the training and test set images are divided amongst the participants and the model initiator. Each RP is allocated 10,000 samples of the training set and 1,660 samples of the test set. Similarly, the model initiator is allocated 10,000 samples of the training set and 1,660 samples of the test set. For the UPs, each participant’s training set comprises 5,000 samples of the MNIST’s training set and 5,000 samples of noise data, and the test set comprises 850 samples of MNIST’s test set and 810 samples of noise data. In this case, we employ the notMINST dataset [4] as the noise data. The notMNIST dataset consists of grayscale images of “A” to “J” letters formatted as 28 x 28 images, with a training set of 500,000 images and a test set of 19,000 images. The noise data of 5,000 for the training set and 810 for the test set for each UP is randomly sampled from the notMNIST’s training and test sets, respectively. All the images are normalized and centered during the experiment.
Using a Logistic Regression Model
We implemented a logistic regression model in Python using Theano 1.0.4. We set the random seeds as
numpy.random.seed(139) and random.seed(1234). The SGD is used with a fixed learning rate of 0.13 and a batch size of 128. All the participants and the model initiator run the same code. In a communication round, each participant and the model initiator runs 5 local epochs before encrypting and uploading intermediate parameters to the server. We set a fixed weight similarity score threshold of 0.05 during the training.
Results: The results of using the logistic regression model are shown in Figure 6. Figure 6(a) demonstrates the similarity between the participants’ and the model initiator’s weight parameters. The model initiator’s and the RPs’ weight parameters are highly similar, while the model initiator’s and the UPs’ weight parameters are less similar. Figure 6(b) and Figure 6(c) depict the training convergence against the number of communication rounds. Our proposed scheme achieves faster convergence as compared to the baseline schemes and the PP system which indiscriminately combines weight parameters from all the participants including the UPs during the global parameter updates. On the contrary, our system only considers weight parameters from RPs during the global parameter updates, i.e., it only uses the weight parameters from participants whose weight similarity score with the model initiator is above the threshold.






Using an MLP
We also implemented an MLP model in Python and TensorFlow 2.1.0 with two hidden layers each with 64 neurons. We used ReLU as the activation function for the hidden layers and Sigmoid for the output layer. We set the random seeds as numpy.random.seed(12), random.seed(1234) and tensorflow.set_random_seed(12345). We used the SGD with a fixed learning rate of and a batch size of 64. All the participants and the model initiator run the same code. In each communication round, we vary the number of epochs in the local training. We also dynamically varied the similarity score in our experiment in the range of 0.10.7 in intervals of 0.1 after every 100 communication rounds. This is because, as the model tends towards convergence, the parameters from RPs become more similar.
Results: The results of using an MLP are shown in Figure 7. Our proposed system converges faster than the baseline schemes and the PP system. However, when the number of local epochs is increased to 50, there is a slight drop in the accuracy of our system. This could be due to the improper mixing of parameters. The PP system which does not filter out UPs converges slowly and it is the least accurate. The centralized baseline achieves the best accuracy as expected. However, the standalone baseline is slightly less accurate because of the inadequate amount of training data items. In terms of accuracy, our scheme is only slightly bettered by the centralized baseline scheme.
5.2 Experiments with the CIFAR10 Dataset
We also perform experiments with the CIFAR10 dataset using a logistic regression model and an MLP model.
Datasets
The CIFAR10 dataset consists of 60,000 RGB images of 10 different classes formatted as 32 x 32 x 3. 50,000 of these images form the training set while 10,000 of the images form the test set. In this experiment, we simulate three (3) RPs, a single UP, and a model initiator. Each RP holds 11,000 samples of the CIFAR10 training set as its local training set and 2,200 samples of the CIFAR10 test set as its local test set. Similarly, the model initiator holds 11,000 samples of the CIFAR10 training set as its local training set and 2,200 samples of the CIFAR10 test set as its local test set. The UP holds the remaining 6,000 samples of the CIFAR10 training set and 5,000 samples of a noise dataset as its local training set, and 1,200 samples of the CIFAR10 test set and 1,000 samples of a noise dataset as its local test set. As in the first case, here, we employ the notMNIST dataset as the noise data. However, notMNIST dataset is formatted as 28 x 28, thus, to correctly use it with the CIFAR10 dataset, we padded it with zeros to obtain the same dimensionality as that of CIFAR10. Therefore, the noise data of 5,000 and 1,000 for training and test sets are extracted from the padded notMNIST dataset. All the images are normalized and centered.
Using a Logistic Regression Model
Using Python and Theano 1.0.4, we implemented a logistic regression model to demonstrate the applicability of our system on the CIFAR10 dataset. In the experiment, random seeds are set as numpy.random.seed(15) and random.seed(123). We used the SGD in the learning process with a fixed learning rate of 0.01 and a batch size of 64 data items. All the participants and the model initiator execute the same model on their local datasets and upload their encrypted intermediate results to the server after every 5 local epochs. We set a fixed similarity score threshold of 0.03 for global parameter updates in this experiment.
Results: The results of the above experimental settings are shown in Figure 8. Figure 8(a) depicts the weight similarity score between participants’ and the model initiator’s weight parameters against the communication rounds. As expected, the UP’s and the model initiator’s weight parameters are the least similar. The RPs’ parameters are highly similar to the model initiator’s parameters since their local models generate similar gradients that affect the weight updates similarly. Figure 8(b) and Figure 8(c) depict the training convergence of the model against the number of communication rounds. Our proposed systemÂ converges faster and achieves an accuracy closer to the centralized baseline. The standalone baseline has limited data and hence lower accuracy. The PP system does not filter out UPs during the computation of global parameters and as a result, it converges slowly and achieves the least accuracy.






Using an MLP
We also evaluated our system using the CIFAR10 dataset by implementing a two hidden layer MLP model each with 64 and 128 neurons, respectively. The implementation was done with Python and TensorFlow 2.1.0. The ReLU activation function was used for the hidden layers and the sigmoid for the output layer. In the experiment, the random seeds are set as follows: numpy.random.seed(15), random.seed(123) and tensorflow.set_random_seed(12345). The SGD was used for learning with a batch size of 32 data items and a learning rate of . All the participants and the model initiator execute the same model on their local datasets and upload their encrypted intermediate results to the server with a varying number of localÂ epochs. We dynamically varied the similarity score threshold in the range 0.05 to 0.95 in intervals of 0.15 after every 100 communication rounds during the global parameter updates by the server.
Results: The results of the MLP model with the CIFAR10 dataset are presented in Figure 9. In Figure 9, we depict the training convergence of our system, PP system, and the baseline schemes against the communication rounds at different number of local epochs. In all the cases, our proposed system converges faster than all the other systems and attains an accuracy only bettered by the centralized baseline. However, at the local epoch of 50, the accuracy of our system slightly drops and as stated earlier, this could be due to improper mixing of parameters. The centralized baseline as expected achieves the best accuracy. The standalone baseline is less accurate as compared to ours and this is mainly because of the limited amount of the training data. The PP system does not filter out UPs and it indiscriminately includes parameters from all the participants during the global parameter updates which results in its slow convergence and the least accuracy.
5.3 Results of Similarity Computation
In Table 1, we present the execution time of the weight similarity score computation by the entities in our proposed scheme. We observe that the model initiator incurs the most computation overhead during the joint weight similarity score computation, which is mainly due to the encryption of its similarity computation component. The server simply performs a blinding operation and hence has the least computation overhead. The overhead for participants is associated with the multiplication operations they perform on the encrypted similarity computation components of the model initiator and the decryption of the final similarity score. Generally, the computation overhead is more for the neural network models as compared to the logistic regression models. This is because of the larger number of parameters associated with neural network models as compared to the logistic regression models.
Dataset and Model 
MNIST  CIFAR10  
Logistic Regression  Neural Network  Logistic Regression  Neural Network  
Model Initiator(s)  3.14  8.33  3.14  8.86 
Participant(s) 
2.36  6.98  2.38  7.52 
Server(s) 
0.19  0.63  0.19  0.65 

6 Conclusion
In this work, we propose a multiparty privacypreserving machine learning scheme that takes into account the data quality of the participants. The scheme utilizes the proposed weight similarity metric to filter out unreliable participants and integrates homomorphic encryption to prevent leakages to the server. In addition, participants upload their intermediate weights instead of gradients to prevent leakages to malicious participants. Therefore, our scheme is beneficial for privacypreserving machine learning in environments where data quality matters. Possibilities for several future investigations are opened through this work. For instance, this scheme is designed to run synchronously, an asynchronous design can be a future possibility. Further investigation of the weight similarity under different untrainable parameters is another future possibility. Reducing the computation burden on the model initiator and theÂ participants during weight similarity computation can be given more attention in the future.
Footnotes
 However, the model initiator’s and the participants’ learning phases are not entirely identical.
 .
 Where, which forms the power part of Equation 6 is an element of , is the similarity computation component of a participant and .
References
 (2016) Machine learning based sample extraction for automatic speech recognition using dialectal assamese speech. Neural Networks 78, pp. 97–111. Cited by: §1.
 (2016) Privacypreserving logistic regression with distributed data sources via homomorphic encryption. IEICE Transactions on Information and Systems 99 (8), pp. 2079–2089. Cited by: §1, §1.
 (2017) Practical secure aggregation for privacypreserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §2.
 (2011) Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnistdataset. html 2. Cited by: §5.1.1.
 (2018) Asymmetric joint learning for heterogeneous face recognition. In ThirtySecond AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, pp. 6682–6689. Cited by: §1.
 (2020) A federated learning framework for privacypreserving and parallel training. arXiv preprint arXiv:2001.09782. Cited by: §2.
 (2017) Accurate and reproducible invasive breast cancer detection in wholeslide images: a deep learning approach for quantifying tumor extent. Scientific Reports 7, pp. 46450. Cited by: §1.
 (2013) Python paillier library. GitHub. Note: \urlhttps://github.com/data61/pythonpaillier Cited by: §5.
 (2019) Arcface: additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699. Cited by: §1.
 (2007) A survey of homomorphic encryption for nonspecialists. EURASIP Journal on Information Security 2007 (1), pp. 013801. Cited by: §3.1.
 (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. Cited by: item 2.
 (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §2.
 (2016) CryptoNets: applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pp. 201–210. Cited by: §2.
 (2020) Privacyenhanced multiparty deep learning. Neural Networks 121, pp. 484–496. Cited by: §1, §1, §3.1.
 (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. Cited by: §3.2.
 (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, pp. 603–618. Cited by: item 1.
 General data protection regulation (gdpr). Note: Accessed: Jun. 29, 2020 External Links: Link Cited by: §1.
 (2016) Weighted naive bayes classifier: a predictive model for breast cancer detection. International Journal of Computer Applications 133 (9), pp. 32–37. Cited by: §1.
 (2014) The cifar10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html 55. Cited by: §5.
 (2010) MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2. Cited by: §5.
 (2016) Speech emotion recognition using convolutional and recurrent neural networks. In 2016 AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. Cited by: §1.
 (2019) Noninteractive privacypreserving neural network prediction. Information Sciences 481, pp. 507–519. Cited by: §2.
 (2017) Secureml: a system for scalable privacypreserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. Cited by: §3.2.1, §3.3.
 (1999) Publickey cryptosystems based on composite degree residuosity classes. In International Conference on the Theory and Applications of Cryptographic Techniques, pp. 223–238. Cited by: §1, §3.1, §3.1.
 (2010) Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems, Vancouver, British Colombia, Canada, pp. 1876–1884. Cited by: item 1.
 (2018) Privacypreserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security 13 (5), pp. 1333–1345. Cited by: item 1, §1, §1.
 (2019) Privacypreserving deep learning via weight transmission. IEEE Transactions on Information Forensics and Security 14 (11), pp. 3003–3015. Cited by: §1, §1, §1, §3.2.2, §4.3.1, §5.
 (2015) Privacypreserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, Colorado, USA, pp. 1310–1321. Cited by: §1, §1.
 (2017) Mastering the game of go without human knowledge. nature 550 (7676), pp. 354–359. Cited by: §1.
 (2020) Federated learning with differential privacy: algorithms and performance analysis. IEEE Transactions on Information Forensics and Security. Cited by: §2.
 (2019) DeepPAR and deepdpa: privacy preserving and asynchronous deep learning for industrial iot. IEEE Transactions on Industrial Informatics 16 (3), pp. 2081–2090. Cited by: §2.
 (2017) Private, yet practical, multiparty deep learning. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, pp. 1442–1452. Cited by: §1.
 (2020) The secret revealer: generative modelinversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 253–261. Cited by: item 2.
 (2019) Privacypreserving collaborative deep learning with unreliable participants. IEEE Transactions on Information Forensics and Security 15, pp. 1486–1500. Cited by: §1, §1.