Differentially Private Distributed Constrained Optimization
Many resource allocation problems can be formulated as an optimization problem whose constraints contain sensitive information about participating users. This paper concerns solving this kind of optimization problem in a distributed manner while protecting the privacy of user information. Without privacy considerations, existing distributed algorithms normally consist in a central entity computing and broadcasting certain public coordination signals to participating users. However, the coordination signals often depend on user information, so that an adversary who has access to the coordination signals can potentially decode information on individual users and put user privacy at risk. We present a distributed optimization algorithm that preserves differential privacy, which is a strong notion that guarantees user privacy regardless of any auxiliary information an adversary may have. The algorithm achieves privacy by perturbing the public signals with additive noise, whose magnitude is determined by the sensitivity of the projection operation onto user-specified constraints. By viewing the differentially private algorithm as an implementation of stochastic gradient descent, we are able to derive a bound for the suboptimality of the algorithm. We illustrate the implementation of our algorithm via a case study of electric vehicle charging. Specifically, we derive the sensitivity and present numerical simulations for the algorithm. Through numerical simulations, we are able to investigate various aspects of the algorithm when being used in practice, including the choice of step size, number of iterations, and the trade-off between privacy level and suboptimality.
Electric vehicles (EVs), including pure electric and hybrid plug-in vehicles, are believed to be an important component of future power systems . Studies predict that the market share of EVs in the United States can reach approximately 25% by year 2020 . By that time, EVs will become a significant load on the power grid [27, 3], which can lead to undesirable effects such as voltage deviations if charging of the vehicles are uncoordinated.
The key to reducing the impact of EVs on the power grid is to coordinate their charging schedules, which is often cast as a constrained optimization problem with the objective of minimizing the peak load, power loss, or load variance [26, 5]. Due to the large number of vehicles, computing an optimal schedule for all vehicles can be very time consuming if the computation is carried out on a centralized server that collects demand information from users. Instead, it is more desirable that the computation is distributed to individual users. Among others, Ma et al.  proposed a distributed charging strategy based on the notion of valley-filling charging profiles, which is guaranteed to be optimal when all vehicles have identical (i.e., homogeneous) demand. Gan et al.  proposed a more general algorithm that is optimal for nonhomogeneous demand and allows asynchronous communication.
In order to solve the constrained optimization problem of scheduling in a distributed manner, the server is required to publish certain public information that is computed based on the tentative demand collected from participating users. Charging demand often contains private information of the users. As a simple example, zero demand from a charging station attached to a single home unit is a good indication that the home owner is away from home. Note that the public coordination signal is received by everyone including potential adversaries whose goal is to decode private user information from the public signal, so that it is desirable to develop solutions for protecting user privacy.
It has been long recognized that ad hoc solutions such as anonymization of user data are inadequate to guarantee privacy due to the presence of public side information. A famous case is the reidentification of certain users from an anonymized dataset published by Netflix, which is an American provider of on-demand Internet streaming media. The dataset was provided for hosting an open competition called the Netflix Prize for find the best algorithm to predict user ratings on films. It has been reported that certain Netflix subscribers can be identified from the anonymized Netflix prize dataset through auxiliary information from the Internet Movie Database (IMDb) . As such, providing rigorous solutions to preserving privacy has become an active area of research. In the field of systems and control, recent work on privacy includes, among others, filtering of streaming data , smart metering , traffic monitoring , and privacy in stochastic control .
Recently, the notion of differential privacy proposed by Dwork and her collaborators has received attention due to its mathematically rigorous formulation . The original setting assumes that the sensitive user information is held by a trustworthy party (often called curator in related literature), and the curator needs to answer external queries (about the sensitive user information) that potentially come from an adversary who is interested in learning information belonging to some user. For example, in EV charging, the curator is the central server that aggregates user information, and the queries correspond to public coordination signals. Informally, preserving differential privacy requires that the curator must ensure that the results of the queries remain approximately unchanged if data belonging to any single user are modified. In other words, the adversary should know little about any single user’s information from the results of queries. A recent survey on differential privacy can be found in ; there is also a recent textbook on this topic written by Dwork and Roth .
Motivated by the privacy concerns in EV charging and recent advances in differential privacy, in this paper, we investigate the problem of preserving differential privacy in distributed constrained optimization. We present a differentially private distributed algorithm for solving a class of constrained optimization problems, whose privacy guarantee is proved using the adaptive composition theorem. We show that the private optimization algorithm can be viewed as an implementation of stochastic gradient descent . Based on previous results on stochastic gradient descent , we are able to derive a bound for the suboptimality of our algorithm and reveal the trade-off between privacy and performance of the algorithm.
We illustrate the applicability of this general framework of differentially private distributed constrained optimization in the context of EV charging. To this end, we begin by computing the sensitivity of the public signal with respect to changes in private information. Specifically, this requires analyzing the sensitivity of the projection operation onto the user-specified constraints. Although such sensitivity can be difficult to compute for a general problem, using tools in optimization theory, we are able to derive an explicit expression of the sensitivity for the EV charging example. Through numerical simulations, we show that our algorithm is able to provide strong privacy guarantees with little loss in performance when the number of participating users (i.e., vehicles) is large.
There is a large body of research work on incorporating differential privacy into resource allocation problems. A part of the work deals with indivisible resources (or equivalently, games with discrete actions), including the work by, among others, Kearns et al. , Rogers and Roth , and Hsu et al. . Our paper focuses on the case of divisible resources and where private information is contained in the constraints of the allocation problem.
In the work of differentially private resource allocation, it is a common theme that the coordination signals are randomly perturbed to avoid revealing private information of the users, such as in the work by Huang et al.  and Hsu et al. . Huang et al.  study the problem of differentially private convex optimization in the absence of constraints. In their formulation, the private user information is encoded in the individual cost functions and can be interpreted as user preferences. Apart from incorporating constraints, a major difference of our setting compared to Huang et al.  is the way that privacy is incorporated into the optimization problem. In Huang et al. , they assume that the public coordination signals do not change when the individual cost function changes. However, this assumption fails to hold, e.g., in the case of EV charging, where the coordination signal is computed by aggregating the response from users and hence is sensitive to changes in user information. Instead, we treat the public coordination signals as the quantity that needs to be perturbed (i.e., as a query, in the nomenclature of differential privacy) in order to prevent privacy breach caused by broadcasting the signals.
The recent work by Hsu et al.  on privately solving linear programs is closely related to our work, since their setting also assumes that the private information is contained in the (affine) constraints. Our work can be viewed as a generalization of their setting by extending the form of objective functions and constraints. In particular, the objective function can be any convex function that depends on the aggregate allocation and has Lipschitz continuous gradients. The constraints can be any convex and separable constraints; for illustration, we show how to implement the algorithm for a particular set of affine constraints motivated by EV charging.
The paper is organized as follows. Section II introduces the necessary background on (non-private) distributed optimization and, in particular, projected gradient descent. Section III reviews the results in differential privacy and gives a formal problem statement of differentially private distributed constrained optimization. Section IV gives an overview of the main results of the paper. Section V describes a differentially private distributed algorithm that solves a general class of constrained optimization problems. We also study the trade-off between privacy and performance by analyzing the suboptimality of the differentially private algorithm.
In Section VI, we illustrate the implementation of our algorithm via a case study of EV charging. In particular, we compute the sensitivity of the projection operation onto user-specified constraints, which is required for implementing our private algorithm. Section VII presents numerical simulations on various aspects of the algorithm when being used in practice, including choice of step size, number of iterations, and the trade-off between privacy level and performance.
Ii Background: Distributed constrained optimization
Denote the -norm of any by . The subscript is dropped in the case of -norm. For any nonempty convex set and , denote by the projection operator that projects onto in -norm. Namely, is the solution of the following constrained least-squares problem
It can be shown that problem (1) is always feasible and has a unique solution so that is well-defined. For any function (not necessarily convex), denote by the set of subgradients of at :
When is convex and differentiable at , the set becomes a singleton set whose only element is the gradient . For any function , denote its range by . For any differentiable function that depends on multiple variables including , denote by the partial derivative of with respect to . For any , denote by the zero-mean Laplace probability distribution such that the probability density function of a random variable obeying the distribution is . The vector consisting all ones is written as . The symbol is used to represent element-wise inequality: for any , we have if and only if for all . For any positive integer , we denote by the set .
Ii-B Distributed constrained optimization
Before discussing privacy issues, we first introduce the necessary background on distributed constrained optimization. We consider a constrained optimization problem over variables in the following form:
Throughout the paper, we assume that the objective function in problem (2) is differentiable and convex, and its gradient is -Lipschitz in the -norm, i.e., there exists such that
The set is assumed to be convex for all . For resource allocation problems, the variable and the constraint set are used to capture the allocation and constraints on the allocation for user/agent .
Input: , , , and step sizes .
Initialize arbitrarily. For , repeat:
For , update according to
The optimization problem (2) can be solved iteratively using projected gradient descent, which requires computation of the gradient of and its projection onto the feasible set at each iteration. The computational complexity of projected gradient descent is dominated by the projection operation and grows with . For practical applications, the number can be quite large, so that it is desirable to distribute the projection operation to individual users. A distributed version of the projected gradient descent method is shown in Algorithm 1. The algorithm guarantees that the output converges to the optimal solution as with proper choice of step sizes (see  for details on how to choose ).
Iii Problem formulation
Iii-a Privacy in distributed constrained optimization
In many applications, the specifications of may contain sensitive information that user wishes to keep undisclosed from the public. In the framework of differential privacy, it is assumed that an adversary can potentially collaborate with some users in the database in order to learn about other user’s information. Under this assumption, the distributed projected descent algorithm (Algorithm 1) can lead to possible loss of privacy of participating users for reasons described below. It can be seen from Algorithm 1 that affects through equation (3) and consequently also . Since is broadcast publicly to every charging station, with enough side information (such as collaborating with some participating users), an adversary who is interested in learning private information about some user may be able to infer information about from the public signals . We will later illustrate the privacy issues in the context of EV charging.
Iii-B Differential privacy
Our goal is to modify the original distributed projected gradient descent algorithm (Algorithm 1) to preserve differential privacy. Before giving a formal statement of our problem, we first present some preliminaries on differential privacy. Differential privacy considers a set (called database) that contains private user information to be protected. For convenience, we denote by the universe of all possible databases of interest. The information that we would like to obtain from a database is given by for some mapping (called query) that acts on . In differential privacy, preserving privacy is equivalent to hiding changes in the database. Formally, changes in a database can be defined by a symmetric binary relation between two databases called adjacency relation, which is denoted by ; two databases and that satisfy are called adjacent databases.
Definition 1 (Adjacent databases).
Two databases and are said to be adjacent if there exists such that for all .
A mechanism that acts on a database is said to be differentially private if it is able to ensure that two adjacent databases are nearly indistinguishable from the output of the mechanism.
Definition 2 (Differential privacy ).
Given , a mechanism preserves -differential privacy if for all and all adjacent databases and in , it holds that
The constant indicates the level of privacy: smaller implies higher level of privacy. The notion of differential privacy promises that an adversary cannot tell from the output of with high probability whether data corresponding to a single user in the database have changed. It can be seen that any non-constant differentially mechanism is necessarily randomized, i.e., for a given database, the output of such a mechanism obeys a certain probability distribution. Finally, although it is not explicitly mentioned in Definition 2, a mechanism needs to be an approximation of the query of interest in order to be useful. For this purpose, a mechanism is normally defined in conjunction with some query of interest; a common notation is to include the query of interest in the subscript of the mechanism as .
Iii-C Problem formulation: Differentially private distributed constrained optimization
Recall that our goal of preserving privacy in distributed optimization is to protect the user information in , even if an adversary can collect all public signals . To mathematically formulate our goal under the framework of differential privacy, we define the database as the set and the query as the -tuple consisting of all the gradients . We assume that belong a family of sets parameterized by . Namely, there exists a parameterized set such that for all , we can write for some . We also assume that there exists a metric . In this way, we can define the distance between any and using the metric as
For any given , we define and use throughout the paper the following adjacency relation between any two databases and in the context of distributed constrained optimization.
Definition 3 (Adjacency relation for constrained optimization).
For any databases and , it holds that if and only if there exists such that , and for all .
The constant is chosen based on the privacy requirement, i.e., the kind of user activities that should be kept private. Using the adjacency relation described in Definition 3, we state in the following the problem of designing a differentially private distributed algorithm for constrained optimization.
Problem 4 (Differentially private distributed constrained optimization).
Iii-D Example application: EV charging
In EV charging, the goal is to charge vehicles over a horizon of time steps with minimal influence on the power grid. For simplicity, we assume that each vehicle belongs to one single user. For any , the vector represents the charging rates of vehicle over time. In the following, we will denote by the -th component of . Each vehicle needs to be charged a given amount of electricity by the end of the scheduling horizon; in addition, for any , the charging rate cannot exceed the maximum rate for some given constant vector . Under these constraints on , the set is described as follows:
The tuple is called the charging specification of user . Throughout the paper, we assume that and satisfy
so that the constraints (5) are always feasible.
The objective function in problem (2) quantifies the influence of a charging schedule on the power grid. We choose as follows for the purpose of minimizing load variance:
In (7), is the number of households, which is assumed proportional to the number of EVs, i.e., there exists such that ; then, the quantity becomes the aggregate EV load per household. The vector is the base load profile incurred by loads in the power grid other than EVs, so that quantifies the variation of the total load including the base load and EVs. It can be verified that is convex and differentiable, and is Lipschitz continuous.
The set (defined by and ) can be associated with personal activities of the owner of vehicle in the following way. For example, may indicate that the owner is temporarily away from the charging station (which may be co-located with the owner’s residence) so that the vehicle is not ready to be charged. Similarly, may indicate that the owner is not actively using the vehicle so that the vehicle does not need to be charged.
We now illustrate why publishing the exact gradient can potentially lead to a loss of privacy. The gradient can be computed as . If an adversary collaborates with all but one user so that the adversary is able obtain for all in the database. Then, the adversary can infer exactly from , even though user did not reveal his to the adversary. After obtaining , the adversary can obtain information on by, for example, computing .
The adjacency relation in the case of EV charging is defined as follows. Notice that, in the case of EV charging, the parameter that parameterizes the set is given by , in which is the charging specifications of user as defined in (5).
Definition 5 (Adjacency relation for EV charging).
For any databases and , we have if and only if there exists such that
and , for all .
In terms of choosing and , one useful choice for is the maximum amount of energy an EV may need; this choice of can be used to hide the event corresponding to whether an user needs to charge his vehicle.
Iv Overview of main results
Iv-a Results for general constrained optimization problems
Input: , , , , , , , and .
Initialize arbitrarily. Let for all and for .
For , repeat:
If , then set ; else draw a random vector from the distribution (proportional to)
For , compute:
In the first half of the paper, we present the main algorithmic result of this paper, a differentially private distributed algorithm (Algorithm 2) for solving the constrained optimization problem (2). The constant that appears in the input of Algorithm 2 is defined as
In other words, can be viewed as a bound on the global -sensitivity of the projection operator to changes in for all . Later, we will illustrate how to compute using the case of EV charging.
Compared to the (non-private) distributed algorithm (Algorithm 1), the key difference in Algorithm 2 is the introduction of random perturbations in the gradients (step 2) that convert into a noisy gradient . The noisy gradients can be viewed as a randomized mechanism that approximates the original gradients . In Section V, we will prove that the noisy gradients (as a mechanism) preserve -differential privacy and hence solve Problem 4.
Algorithm 2 can be viewed as an instance of stochastic gradient descent that terminates after iterations. We will henceforth refer to Algorithm 2 as differentially private distributed projected gradient descent. The step size is chosen as for some . The purpose of the additional variables is to implement the polynomial-decay averaging method in order to improve the convergence rate, which is a common practice in stochastic gradient descent ; introducing does not affect privacy. The parameter is used for controlling the averaging weight . Details on choosing can be found in Shamir and Zhang .
Like most iterative optimization algorithms, stochastic gradient descent only converges in a probabilistic sense as the number of iterations . In practice, the number of iterations is always finite, so that it is desirable to analyze the suboptimality for a finite . In Section V, we provide an analysis on the expected suboptimality of Algorithm 2.
Iv-B Results for the case of EV charging
Having presented and analyzed the algorithm for a general distributed constrained optimization problem, in the second half of the paper, we illustrate how Algorithm 2 can be applied to the case of EV charging. In Section VI, we demonstrate how to compute using the case of EV charging as an example. We show that can be bounded by and that appear in (8) as described below.
The suboptimality analysis given in Theorem 7 can be further refined in the case of EV charging. The special form of given by (7) allows obtaining an upper bound on suboptimality as given in Corollary 9 below.
This upper bound shows the trade-off between privacy and performance. As decreases, more privacy is preserved but at the expense of increased suboptimality. On the other hand, this increase in suboptimality can be mitigated by introducing more participating users (i.e., by increasing ), which coincides with the common intuition that it is easier to achieve privacy as the number of users increases.
V Differentially private distributed projected gradient descent
In this section, we give the proof that the modified distributed projected gradient descent algorithm (Algorithm 2) preserves -differential privacy. In the proof, we will extensively use results from differential privacy such as the Laplace mechanism and the adaptive sequential composition theorem.
V-a Review: Results from differential privacy
The introduction of additive noise in step 2 of Algorithm 2 is based on a variant of the widely used Laplace mechanism in differential privacy. The Laplace mechanism operates by introducing additive noise according to the -sensitivity () of a numerical query (for some dimension ), which is defined as follows.
Definition 10 (-sensitivity).
For any query , the -sensitivty of under the adjacency relation is defined as
Note that the -sensitivity of does not depend on a specific database . In this paper, we will use the Laplace mechanism for bounded -sensitivity.
Proposition 11 (Laplace mechanism ).
Consider a query whose -sensitivity is . Define the mechanism as , where is an -dimensional random vector whose probability density function is given by . Then, the mechanism preserves -differential privacy.
As a basic building block in differential privacy, the Laplace mechanism allows construction of the differentially private distributed projected gradient descent algorithm described in Algorithm 2 through adaptive sequential composition.
Proposition 12 (Adaptive seqential composition ).
Consider a sequence of mechanisms , in which the output of may depend on as described below:
Suppose preserves -differential privacy for any Then, the -tuple mechanism preserves -differential privacy for .
V-B Proof on that Algorithm 2 preserves -differential privacy
Using the adaptive sequential composition theorem, we can show that Algorithm 2 preserves -differential privacy. We can view the -tuple mechanism as a sequence of mechanisms . The key is to compute the -sensitivity of , denoted by , when the outputs of are given, so that we can obtain a differentially private mechanism by applying the Laplace mechanism on according to .
In Algorithm 2, when the outputs of are given, the -sensitivity of satisfies .
See Appendix A. ∎
Use the expression of from Lemma 13 to obtain
Using the adaptive sequential composition theorem, we know that the privacy of is given by , which completes the proof. ∎
V-C Suboptimality analysis: Privacy-performance trade-off
As a consequence of preserving privacy, we only have access to noisy gradients rather than the exact gradients . Recall that the additive noise in step 2 of Algorithm 2 has zero mean. In other words, the noisy gradient is an unbiased estimate of , which allows us to view Algorithm 2 as an instantiation of stochastic gradient descent. As we mentioned in Section IV, it is in fact a variant of stochastic gradient descent that uses polynomial-decay averaging for better convergence. The stochastic gradient descent method (with polynomial-decay averaging), which is described in Algorithm 3, can be used for solving the following optimization problem:
where and for certain dimensions . Proposition 14 (due to Shamir and Zhang ) gives an upper bound of the expected suboptimality after finitely many steps for the stochastic gradient descent algorithm (Algorithm 3).
Proposition 14 (Shamir and Zhang ).
Suppose is a convex set and is a convex function. Assume that there exist and such that and for given by step 1 of Algorithm 3. If the step sizes are chosen as for some , then for any , it holds that
Recall that the definition of is given by . It can be verified that the gradient of the objective function with respective to is given by , which is formed by repeating for times, so that we have . Using the expression of , we have
where in the last step we have used the fact that
Substitute the expression of into (13) to obtain the result. ∎
The result can be obtained by optimizing the right-hand side of (10) over . ∎
However, since it is generally impossible to obtain a tight bound for and , optimizing according to Corollary 16 usually does not give the best in practice; numerical simulation is often needed in order to find the best for a given problem. We will demonstrate how to choose optimally later using numerical simulations in Section VII.
Vi Sensitivity computation: The case of EV charging
So far, we have shown that Algorithm 2 (specifically, the mechanism consisting of the gradients that are broadcast to every participating user) preserves -differential privacy. The magnitude of the noise introduced to the gradients depends on , which is the sensitivity of the projection operator as defined in (9). In order to implement Algorithm 2, we need to compute explicitly. In the next, we will illustrate how to compute using the case of EV charging as an example. We will give an expression for that depends on the constants and appearing in the adjacency relation (8). Since and are part of the privacy requirement, one can choose accordingly once the privacy requirement has been determined.
The input of Algorithm 2 includes the constant as described by (9), which bounds the global -sensitivity of the projection operator with respect to changes in . In this section, we will derive an explicit expression of for the case of EV charging. Using tools in sensitivity analysis of optimization problems, we are able to establish the relationship between and the constants and that appear in the adjacency relation (8) used in EV charging.
Recall that for any , the output of the projection operation is the optimal solution to the constrained least-squares problem
Define the -sensitivity for a fixed as
it can be verified that . In the following, we will establish the relationship between and ; we will also show that does not depend on the choice of , so that for any . For notational convenience, we consider the following least-squares problem:
where , , and are given constants. If we let
then problem (16) is mapped back to problem (15). We also have based on the assumption as described in (6). Denote the optimal solution of problem (16) by . Since our purpose is to derive an expression for when is fixed, we also treat as fixed and has dropped the dependence of on . Our goal is to bound the global solution sensitivity with respect to changes in and , i.e.,
for any and . We will proceed by first bounding the local solution sensitivities and with respect to and . Then, we will obtain a bound on the global solution, sensitivity (17) through integration of and .
Vi-B Local solution sensitivity of nonlinear optimization problems
We begin by reviewing existing results on computing local solution sensitivity of nonlinear optimization problems. Consider a generic nonlinear optimization problem parametrized by described as follows:
whose Lagrangian can be expressed as
where and are the Lagrange multipliers associated with constraints and , respectively. If there exists a set such that the optimal solution is unique for all , then the optimal solution of problem (18) can be defined as a function . This condition on the uniqueness of optimal solution holds for problem (16), since the objective function therein is strictly convex.
Denote by and the optimal Lagrange multipliers. Under certain conditions described in Proposition 17 below, the partial derivatives , , and exist; these partial derivatives will also be referred to as local solution sensitivity of problem (18).
Proposition 17 (Fiacco ).
Let be the primal-dual optimal solution of problem (18). Suppose the following conditions hold.
is a locally unique optimal primal solution.
The functions , and are twice continuously differentiable in and differentiable in .
The gradients of the active constraints and the gradients are linearly independent.
Strict complementary slackness holds: when for all .
Then the local sensitivity of problem (18) exists and is continuous in a neighborhood of . Moreover, is uniquely determined by the following: