# Improving the Privacy and Accuracy of ADMM-Based Distributed Algorithms

# Improving the Privacy and Accuracy of ADMM-Based Distributed Algorithms (Supplementary materials)

###### Abstract

Alternating direction method of multiplier (ADMM) is a popular method used to design distributed versions of a machine learning algorithm, whereby local computations are performed on local data with the output exchanged among neighbors in an iterative fashion. During this iterative process the leakage of data privacy arises. A differentially private ADMM was proposed in prior work (Zhang & Zhu, 2017) where only the privacy loss of a single node during one iteration was bounded, a method that makes it difficult to balance the tradeoff between the utility attained through distributed computation and privacy guarantees when considering the total privacy loss of all nodes over the entire iterative process. We propose a perturbation method for ADMM where the perturbed term is correlated with the penalty parameters; this is shown to improve the utility and privacy simultaneously. The method is based on a modified ADMM where each node independently determines its own penalty parameter in every iteration and decouples it from the dual updating step size. The condition for convergence of the modified ADMM and the lower bound on the convergence rate are also derived.

## 1 Introduction

Distributed machine learning is crucial for many settings where the data is possessed by multiple parties or when the quantity of data prohibits processing at a central location. It helps to reduce the computational complexity, improve both the robustness and the scalability of data processing. In a distributed setting, multiple entities/nodes collaboratively work toward a common optimization objective through an interactive process of local computation and message passing, which ideally should result in all nodes converging to a global optimum. Existing approaches to decentralizing an optimization problem primarily consist of subgradient-based algorithms (Nedic et al., 2008; Nedic & Ozdaglar, 2009; Lobel & Ozdaglar, 2011), ADMM-based algorithms (Wei & Ozdaglar, 2012; Ling & Ribeiro, 2014; Shi et al., 2014; Zhang & Kwok, 2014; Ling et al., 2016), and composite of subgradient and ADMM (Bianchi et al., 2014). It has been shown that ADMM-based algorithms can converge at the rate of while subgradient-based algorithms typically converge at the rate of , where is the number of iterations (Wei & Ozdaglar, 2012). In this study, we will solely focus on ADMM-based algorithms.

The information exchanged over the iterative process gives rise to privacy concerns if the local training data is proprietary to each node, especially when it contains sensitive information such as medical or financial records, web search history, and so on. It is therefore highly desirable to ensure such iterative processes are privacy-preserving.

A widely used notion of privacy is the -differential privacy; it is generally achieved by perturbing the algorithm such that the probability distribution of its output is relatively insensitive to any change to a single record in the input (Dwork, 2006). Several differentially private distributed algorithms have been proposed, including (Hale & Egerstedty, 2015; Huang et al., 2015; Han et al., 2017; Zhang & Zhu, 2017; Bellet et al., 2017). While a number of such studies have been done for (sub)gradient-based algorithms, the same is much harder for ADMM-based algorithms due to its computational complexity stemming from the fact that each node is required to solve an optimization problem in each iteration. To the best of our knowledge, only (Zhang & Zhu, 2017) applies differential privacy to ADMM, where the noise is either added to the dual variable (dual variable perturbation) or the primal variable (primal variable perturbation) in ADMM updates. However, (Zhang & Zhu, 2017) could only bound the privacy loss of a single iteration. Since an attacker can potentially use all intermediate results to perform inference, the privacy loss accumulates over time through the iterative process. It turns out that the tradeoff between the utility of the algorithm and its privacy preservation over the entire computational process becomes hard using the existing method.

In this study we propose a perturbation method that could simultaneously improve the accuracy and privacy for ADMM. We start with a modified version of ADMM whereby each node independently decides its own penalty parameter in each iteration; it may also differ from the dual updating step size. For this modified ADMM we establish conditions for convergence and quantify the lower bound of the convergence rate. We then present a penalty perturbation method to provide differential privacy. Our numerical results show that under this method, by increasing the penalty parameter over iterations, we can achieve stronger privacy guarantee as well as better algorithmic performance, i.e., more stable convergence and higher accuracy.

The remainder of the paper is organized as follows. We present problem formulation and definition of differential privacy and ADMM in Section 2 and a modified ADMM algorithm along with its convergence analysis in Section 3. A private version of this ADMM algorithm is then introduced in Section 4 and numerical results in Section 5. Discussions are given in Section 6 and Section 7 concludes the paper.

## 2 Preliminaries

### 2.1 Problem Formulation

Consider a connected network^{1}^{1}1A connected network is one in which every node is reachable (via a path) from every other node. given by an undirected graph , which consists of a set of nodes and a set of edges . Two nodes can exchange information if and only if they are connected by an edge. Let denote node ’s set of neighbors, excluding itself. A node contains a dataset , where is the feature vector representing the -th sample belonging to , the corresponding label, and the size of .

Consider the regularized empirical risk minimization (ERM) problems for binary classification defined as follows:

(1) |

where and are constant parameters of the algorithm, the loss function measures the accuracy of classifier, and the regularizer helps to prevent overfitting. The goal is to train a (centralized) classifier over the union of all local datasets in a distributed manner using ADMM, while providing privacy guarantee for each data sample ^{2}^{2}2The proposed penalty perturbation method is not limited to classification problems. It can be applied to general ADMM-based distributed algorithms since the convergence and privacy analysis in Section 3 & 4 remain valid..

### 2.2 Conventional ADMM

To decentralize (1), let be the local classifier of each node . To achieve consensus, i.e., , a set of auxiliary variables are introduced for every pair of connected nodes. As a result, (1) is reformulated equivalently as:

(2) | ||||||

s.t. |

where . The objective in (2) can be solved using ADMM. Let be the shorthand for ; let be the shorthand for , where , are dual variables corresponding to equality constraints and respectively. Then the augmented Lagrangian is as follows:

(3) | |||

In the -th iteration, the ADMM updates consist of the following:

(4) | |||

(5) | |||

(6) | |||

(7) |

Using Lemma 3 in (Forero et al., 2010), if dual variables and are initialized to zero for all node pairs , then and will hold for all iterations with .

### 2.3 Differential Privacy

Differential privacy (Dwork, 2006) can be used to measure the privacy risk of each individual sample in the dataset quantitatively. Mathematically, a randomized algorithm taking a dataset as input satisfies -differential privacy if for any two datasets , differing in at most one data point, and for any set of possible outputs , holds. We call two datasets differing in at most one data point as neighboring datasets. The above definition suggests that for a sufficiently small , an adversary will observe almost the same output regardless of the presence (or value change) of any one individual in the dataset; this is what provides privacy protection for that individual.

### 2.4 Private ADMM proposed in (Zhang & Zhu, 2017)

Two randomizations were proposed in (Zhang & Zhu, 2017): (i) dual variable perturbation, where each node adds a random noise to its dual variable before updating its primal variable using (8) in each iteration; and (ii) primal variable perturbation, where after updating primal variable , each node adds a random noise to it before broadcasting to its neighbors. Both were evaluated for a single iteration for a fixed privacy constraint. As we will see later in numerical experiments, the privacy loss accumulates significantly when inspected over multiple iterations.

In contrast, in this study we will explore the use of the penalty parameter to provide privacy. In particular, we will allow this to be private information to every node, i.e., each decides its own in every iteration and it is not exchanged among the nodes. Below we will begin by modifying the ADMM to accommodate private penalty terms.

## 3 Modified ADMM (M-ADMM)

### 3.1 Making a node’s private information

Conventional ADMM (Boyd et al., 2011) requires that the penalty parameter be fixed and equal to the dual updating step size for all nodes in all iterations. Varying the penalty parameter to accelerate convergence in ADMM has been proposed in the literature. For instance, (He et al., 2002; Magnússon et al., 2014; Aybat & Iyengar, 2015; Xu et al., 2016) vary this penalty parameter in every iteration but keep it the same for different equality constraints in (2). In (Song et al., 2016; Zhang & Wang, 2017) this parameter varies in each iteration and is allowed to differ for different equality constraints. However, all of these modifications are based on the original ADMM (Eqn. (4)-(7)) and not on the simplified version (Eqn. (8)-(9)); the significance of this difference is discussed below in the context of privacy requirement. Moreover, we will decouple from the dual updating step size, denoted as below. For simplicity, is fixed for all nodes in our analysis, but can also be private information as we show in numerical experiments.

First consider replacing with in Eqn. (4)-(5) of the original ADMM (as is done in (Song et al., 2016; Zhang & Wang, 2017)) and replacing with in Eqn. (6)-(7); we obtain the following:

This however violates our requirement that be node ’s private information since this is needed by node to perform the above computation. To resolve this, we instead start from the simplified ADMM, modifying Eqn. (8)-(9):

(10) | |||

(11) |

where is now node ’s private information. Indeed is no longer purely a penalty parameter related to any equality constraint in the original sense. We will however refer to it as the private penalty parameter for simplicity. The above constitutes the M-ADMM algorithm.

### 3.2 Convergence Analysis

We next show that the M-ADMM (Eqn. (10)-(11)) converges to the optimal solution under a set of common technical assumptions. Our proof is based on the method given in (Ling et al., 2016).

Assumption 1: Function is convex and continuously differentiable in , .

Assumption 2: The solution set to the original ERM problem (1) is nonempty and there exists at least one bounded element.

The KKT optimality condition of the primal update (10) is:

(12) |

We next rewrite (11)-(3.2) in matrix form. Define the adjacency matrix of the network as

Stack the variables , and for into matrices, i.e.,

Let be the number of neighbors of node , and define the degree matrix . Define for the -th iteration a penalty-weighted matrix . Then the matrix form of (11)-(3.2) are:

(13) | |||

(14) |

Note that is the Laplacian matrix and is the signless Laplacian matrix of the network, with the following properties if the network is connected: (i) is positive semi-definite; (ii) , i.e., every member in the null space of is a scalar multiple of 1 with 1 being the vector of all ’s (Kelner, 2007).

Let denote the square root of a symmetric positive semi-definite (PSD) matrix that is also symmetric PSD, i.e., . Define matrix such that . Since , which is in the column space of , this together with (14) imply that is in the column space of and . This guarantees the existence of . This allows us to rewrite (13)-(14) as:

(15) | |||

(16) |

###### Lemma 3.1

### 3.3 Convergence Rate Analysis

To further establish the convergence rate of modified ADMM, an additional assumption is used:

Assumption 3: For all , is strongly convex in and has Lipschitz continues gradients, i.e., for any and , we have:

(20) |

where is the strong convexity constant and is the Lipschitz constant.

###### Theorem 3.2

Define and with and as given in Assumption 3. Denote by the Frobenius inner product of any matrix and ; denote by and the smallest nonzero, and the largest, singular values of a matrix, respectively.

Although Theorem 3.2 only gives a lower bound on the convergence rate () of the M-ADMM, it reflects the impact of penalty on the convergence. Since and , larger penalty results in larger and . By (23), the first term, is smaller when is larger. The second term is bounded by , which is smaller when is larger. Therefore, the convergence rate decreases as increase.

## 4 Private M-ADMM

In this section we present a privacy preserving version of M-ADMM. To begin, a random noise with probability density proportional to is added to penalty term in the objective function of (10):

(24) |

To generate this noisy vector, choose the norm from the gamma distribution with shape and scale and the direction uniformly, where is the dimension of the feature space. Then node ’s local result is obtained by finding the optimal solution to the private objective function:

(25) |

It is equivalent to (4) below when noise is added to the dual variable :

Further, if , then the above is reduced to the dual variable perturbation in (Zhang & Zhu, 2017)^{3}^{3}3Only a single iteration is considered in (Zhang & Zhu, 2017) while imposing a privacy constraint. Since we consider the entire iterative process, we don’t impose per-iteration privacy constraint but calculate the total privacy loss..

The complete procedure is shown in Algorithm 1, where the condition used to generate helps bound the worst-case privacy loss but is not necessary in guaranteeing convergence.

In a distributed and iterative setting, the “output” of the algorithm is not merely the end result, but includes all intermediate results generated and exchanged during the iterative process. For this reason, we formally state the differential privacy definition in this setting below.

###### Definition 4.1

Consider a connected network with a set of nodes . Let denote the information exchange of all nodes in the -th iteration. A distributed algorithm is said to satisfy -differential privacy during iterations if for any two datasets and , differing in at most one data point, and for any set of possible outputs during iterations, the following holds:

We now state our main result on the privacy property of the penalty perturbation algorithm using the above definition. Additional assumptions on and are used.

Assumption 4: The loss function is strictly convex and twice differentiable. and with being a constant.

Assumption 5: The regularizer is -strongly convex and twice continuously differentiable.

###### Theorem 4.1

Normalize feature vectors in the training set such that for all and . Then the private M-ADMM algorithm (PP) satisfies the -differential privacy with

(26) |

## 5 Numerical Experiments

We use the same dataset as (Zhang & Zhu, 2017), i.e., the Adult dataset from the UCI Machine Learning Repository (Lichman, 2013). It consists of personal information of around 48,842 individuals, including age, sex, race, education, occupation, income, etc. The goal is to predict whether the annual income of an individual is above $50,000.

To preprocess the data, we (1) remove all individuals with missing values; (2) convert each categorical attribute (with categories) to a binary vector of length ; (3) normalize columns (features) such that the maximum value of each column is 1; (4) normalize rows (individuals) such that its norm is at most 1; and (5) convert labels to . After this preprocessing, the final data includes 45,223 individuals, each represented as a 105-dimensional vector of norm at most 1.

We will use as loss function the logistic loss , with and . The regularizer is . We will measure the accuracy of the algorithm by the average loss over the training set. We will measure the privacy of the algorithm by the upper bound . The smaller and , the higher accuracy and stronger privacy guarantee.

### 5.1 Convergence of M-ADMM

We consider a five-node network and assign each node the following private penalty parameters: for node , where and .

Figure 1(a) shows the convergence of M-ADMM under these parameters while using a fixed dual updating step size across all nodes (blue curve). This is consistent with Theorem 3.1. As mentioned earlier, this step size can also be non-fixed (black) and different (red) for different nodes. In Figure 1(b) we let each node use the same penalty and compare the results by increasing , . We see that increasing penalty slows down the convergence, and larger increase in slows it down even more, which is consistent with Theorem 3.2.

### 5.2 Private M-ADMM

We next inspect the accuracy and privacy of the penalty perturbation (PP) based private M-ADMM (Algorithm 1) and compare it with the dual variable perturbation (DVP) method proposed in (Zhang & Zhu, 2017). In this set of experiments, for simplicity of presentation we shall fix , let , and noise for all nodes. We observe similar results when and vary from node to node.

For each parameter setting, we perform 10 independent runs of the algorithm, and record both the mean and the range of their accuracy. Specifically, denotes the average loss over the training dataset in the -th iteration of the -th experiment (). The mean of average loss is then given by , and the range . The larger the range the less stable the algorithm, i.e., under the same parameter setting, the difference in performances (convergence curves) of every two experiments is larger. Each parameter setting also has a corresponding upper bound on the privacy loss denoted by . Figures 2(a)2(b) show both and as vertical bars centered at . Their corresponding privacy upper bound is given in Figures 2(c)2(d). The pair 2(a)-2(c) (resp. 2(b)-2(d)) is for the same parameter setting.

Figure 2 compares PP (blue & red, with increasing geometrically) with DVP (black & magenta, with , ). We see that in both cases improved accuracy comes at the expense of higher privacy loss (from magenta to black under DVP, from red to blue under PP). However, we also see that with suitable choices of , , PP can outperform DVP significantly both in accuracy and in privacy (e.g., red outperforms magenta in both accuracy and privacy, and blue outperforms black in both accuracy and privacy).

We also performed experiments with the same dataset on larger networks with tens and hundreds of nodes and with samples evenly and unevenly spread across nodes. In both cases, convergence is attained and our algorithm continues to outperform (Zhang & Zhu, 2017) in a large network (see Figures 3 & 4). Since the privacy loss of the network is dominated by the node with the largest privacy loss and it increases as the number of samples in a node decreases (Theorem 4.1), the loss of privacy in a network with uneven sample size distributions is higher; note that this is a common issue with this type of analysis.

## 6 Discussion

Our numerical results show that increasing the penalty over iterations can improve the algorithm’s accuracy and privacy simultaneously. Below we provide some insight on why this is the case and discuss possible generalizations of our method.

### 6.1 Higher accuracy

When the algorithm is perturbed by random noise, which is necessary to achieve privacy, increasing the penalty parameters over iterations makes the algorithm more noise resistant. In particular, for the minimization in (25), larger results in smaller updates of variables, i.e., smaller distance between and . In the non-private case, since always moves toward the optimum, smaller update slows down the process. In the private case, on the other hand, since a random noise is added to each update, does not always move toward the optimum in each step. When the overall perturbation has a larger variance, it is more likely that could move further away from the optimum in some iterations. Because larger leads to smaller update, it helps prevent from moving too far away from the optimum, thus stabilizing the algorithm (smaller ).

### 6.2 Stronger privacy

First of all, more added noise means stronger privacy guarantee. Increasing and in such a way that the overall perturbation in (4) is increasing leads to less privacy loss, as shown in Figure 2. The noise resistance provided by an increasing indeed allows larger noises to be added under PP without jeopardizing convergence as observed in Section 6.1.

More interestingly, keeping private further strengthens privacy protection. Consider the following threat model: An attacker knows and for all , i.e., all data points except for the first data point of node , as well as all intermediate results of node and its neighbors. If the attacker also knows the dual updating step size and penalty parameter of node , it can then infer the unknown data point with high confidence by combining the KKT optimality conditions from all iterations (see supplementary material for details). However, if the penalty parameters are private to each node, then it is impossible for the attacker to infer the unknown data. Even if the attacker knows the participation of an individual, it remains hard to infer its features.

### 6.3 Generalization & comparison

The main contribution of this paper is the finding that increasing improves the algorithm’s ability to resist noise: even though we increase noise in each iteration to improve privacy, the accuracy does not degrade significantly due to this increasing robustness, which improves the privacy-utility tradeoff. This property holds regardless of the noise distribution. While the present privacy analysis uses a similar framework as in (Chaudhuri et al., 2011; Zhang & Zhu, 2017) (objective perturbation with added Gamma noise), we can also use methods from other existing (centralized) ERM differentially private algorithms to every iteration in ADMM. For example, if we allow some probability () of violating -differential privacy and adopt a weaker variant (, )-differential privacy, we can adopt methods from works such as (Kifer et al., 2012; Jain & Thakurta, 2014; Bassily et al., 2014), by adding Gaussian noise to achieve tighter bounds on privacy loss. However, as noted above, the robustness is improved as increases; thus the same conclusion can be reached that both privacy and accuracy can be improved.

This idea can also be generalized to other differentially private iterative algorithms. A key observation of our algorithm is that the overall perturbation () is related to the parameter that controls the updating step size (). In general, if the algorithm is perturbed in each iteration with a quantity , which is a function of added noise and some parameter that controls the step size, such that the resulting step size and move in opposite directions (i.e., decreasing step size increases the ), then it is possible to simultaneously improve both accuracy and privacy by varying to decrease the step size over time.

Interestingly, in a differentially private (sub)gradient-based distributed algorithm (Huang et al., 2015), the step size and the overall perturbation move in the same direction (i.e., decreasing step size decreases perturbation). The reason for this difference is that under this subgradient-based algorithm, the sensitivity of the algorithm decreases with decreasing step size, which in turn leads to privacy constraint being satisfied with smaller perturbation. In contrast, for ADMM the sensitivity of the algorithm is independent of the step size, and the perturbation actually needs to increase to improve privacy guarantee; the decreasing step size acts to compensate for this increase in noise to maintain accuracy, as discussed in Section 6.1.

This issue of step size never arises in the study of (Zhang & Zhu, 2017) because the analysis is only for a single iteration; however, as we have seen doing so leads to significant total privacy loss over many iterations.

## 7 Conclusions

This paper presents a penalty-perturbation idea to introduce privacy preservation in iterative algorithms. We showed how to modify an ADMM-based distributed algorithm to improve privacy without compromising accuracy. The key idea is to add a perturbation correlated to the step size so that they change in opposite directions. Applying this idea to other iterative algorithms can be part of the future work.

## Acknowledgements

This work is supported by the NSF under grants CNS-1422211, CNS-1646019, CNS-1739517.

## References

- Aybat & Iyengar (2015) Aybat, N. S. and Iyengar, G. An alternating direction method with increasing penalty for stable principal component pursuit. Computational Optimization and Applications, 61(3):635–668, 2015.
- Bassily et al. (2014) Bassily, R., Smith, A., and Thakurta, A. Differentially private empirical risk minimization: Efficient algorithms and tight error bounds. arXiv preprint arXiv:1405.7085, 2014.
- Bellet et al. (2017) Bellet, A., Guerraoui, R., Taziki, M., and Tommasi, M. Fast and Differentially Private Algorithms for Decentralized Collaborative Machine Learning. PhD thesis, INRIA Lille, 2017.
- Bianchi et al. (2014) Bianchi, P., Hachem, W., and Iutzeler, F. A stochastic primal-dual algorithm for distributed asynchronous composite optimization. In Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on, pp. 732–736. IEEE, 2014.
- Boyd et al. (2011) Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
- Chaudhuri et al. (2011) Chaudhuri, K., Monteleoni, C., and Sarwate, A. D. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069–1109, 2011.
- Dwork (2006) Dwork, C. Differential privacy. In Proceedings of the 33rd International Conference on Automata, Languages and Programming - Volume Part II, ICALP’06, pp. 1–12, Berlin, Heidelberg, 2006. Springer-Verlag.
- Forero et al. (2010) Forero, P. A., Cano, A., and Giannakis, G. B. Consensus-based distributed support vector machines. Journal of Machine Learning Research, 11(May):1663–1707, 2010.
- Hale & Egerstedty (2015) Hale, M. and Egerstedty, M. Differentially private cloud-based multi-agent optimization with constraints. In American Control Conference (ACC), 2015, pp. 1235–1240. IEEE, 2015.
- Han et al. (2017) Han, S., Topcu, U., and Pappas, G. J. Differentially private distributed constrained optimization. IEEE Transactions on Automatic Control, 62(1):50–64, 2017.
- He et al. (2002) He, B., Liao, L.-Z., Han, D., and Yang, H. A new inexact alternating directions method for monotone variational inequalities. Mathematical Programming, 92(1):103–118, 2002.
- Huang et al. (2015) Huang, Z., Mitra, S., and Vaidya, N. Differentially private distributed optimization. In Proceedings of the 2015 International Conference on Distributed Computing and Networking, pp. 4. ACM, 2015.
- Jain & Thakurta (2014) Jain, P. and Thakurta, A. G. (near) dimension independent risk bounds for differentially private learning. In International Conference on Machine Learning, pp. 476–484, 2014.
- Kelner (2007) Kelner, J. An algorithmistâs toolkit, 2007. URL http://bit.ly/2C4yRCX.
- Kifer et al. (2012) Kifer, D., Smith, A., and Thakurta, A. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory, pp. 25–1, 2012.
- (16) Li, M., Andersen, D. G., and Park, J. W. Scaling distributed machine learning with the parameter server.
- Lichman (2013) Lichman, M. UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ml.
- Ling & Ribeiro (2014) Ling, Q. and Ribeiro, A. Decentralized linearized alternating direction method of multipliers. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 5447–5451. IEEE, 2014.
- Ling et al. (2016) Ling, Q., Liu, Y., Shi, W., and Tian, Z. Weighted admm for fast decentralized network optimization. IEEE Transactions on Signal Processing, 64(22):5930–5942, 2016.
- Lobel & Ozdaglar (2011) Lobel, I. and Ozdaglar, A. Distributed subgradient methods for convex optimization over random networks. IEEE Transactions on Automatic Control, 56(6):1291–1306, 2011.
- Magnússon et al. (2014) Magnússon, S., Weeraddana, P. C., Rabbat, M. G., and Fischione, C. On the convergence of an alternating direction penalty method for nonconvex problems. In Signals, Systems and Computers, 2014 48th Asilomar Conference on, pp. 793–797. IEEE, 2014.
- Nedic & Ozdaglar (2009) Nedic, A. and Ozdaglar, A. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
- Nedic et al. (2008) Nedic, A., Olshevsky, A., Ozdaglar, A., and Tsitsiklis, J. N. Distributed subgradient methods and quantization effects. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pp. 4177–4184. IEEE, 2008.
- Shi et al. (2014) Shi, W., Ling, Q., Yuan, K., Wu, G., and Yin, W. On the linear convergence of the admm in decentralized consensus optimization. IEEE Trans. Signal Processing, 62(7):1750–1761, 2014.
- Song et al. (2016) Song, C., Yoon, S., and Pavlovic, V. Fast admm algorithm for distributed optimization with adaptive penalty. In AAAI, pp. 753–759, 2016.
- Wei & Ozdaglar (2012) Wei, E. and Ozdaglar, A. Distributed alternating direction method of multipliers. In Decision and Control (CDC), 2012 IEEE 51st Annual Conference on, pp. 5445–5450. IEEE, 2012.
- Xu et al. (2016) Xu, Z., Figueiredo, M. A., and Goldstein, T. Adaptive admm with spectral penalty parameter selection. arXiv preprint arXiv:1605.07246, 2016.
- Zhang & Wang (2017) Zhang, C. and Wang, Y. Privacy-preserving decentralized optimization based on admm. arXiv preprint arXiv:1707.04338, 2017.
- Zhang & Kwok (2014) Zhang, R. and Kwok, J. Asynchronous distributed admm for consensus optimization. In International Conference on Machine Learning, pp. 1701–1709, 2014.
- Zhang & Zhu (2017) Zhang, T. and Zhu, Q. Dynamic differential privacy for admm-based distributed classification learning. IEEE Transactions on Information Forensics and Security, 12(1):172–187, 2017.

## Appendix A Proof of Simplifying ADMM (Forero et al., 2010)

By KKT condition of (5), there is:

Implies: