# Distributed Primal-dual Interior-point Methods for Solving Loosely Coupled Problems Using Message Passing^{†}^{†}thanks: This work has been supported by the Swedish Department of Education within the ELLIIT project.

###### Abstract

In this paper, we propose a distributed algorithm for solving loosely coupled problems with chordal sparsity which relies on primal-dual interior-point methods. We achieve this by distributing the computations at each iteration, using message-passing. In comparison to already existing distributed algorithms for solving such problems, this algorithm requires far less number of iterations to converge to a solution with high accuracy. Furthermore, it is possible to compute an upper-bound for the number of required iterations which, unlike already existing methods, only depends on the coupling structure in the problem. We illustrate the performance of our proposed method using a set of numerical examples.

istributed optimization; primal-dual interior-point method; message-passing; high precision solution.
{classcode}

10.1080/1055.6788.YYYY.xxxxxx \issn1029-4937 \issnp1055-6788 \jvol00 \jnum00 \jyear2014 \jmonthOctober

## 1 Introduction

Centralized algorithms for solving optimization problems rely on the existence of a central computational unit powerful enough to solve the problem in a timely manner, and they render useless in case we lack such a unit. Also such algorithms become unviable when it is impossible to form the problem in a centralized manner, for instance due to structural constraints including privacy requirements. In cases like these, distributed optimization algorithms are the only resort for solving optimization problems, e.g., see [5, 13, 7, 26, 27]. In this paper we are interested in devising efficient distributed algorithms for solving convex optimization problems in the form

(1a) | ||||

(1b) | ||||

(1c) |

where , , with and for all . Here denotes the component-wise inequality. This problem can be seen as a combination of coupled subproblems, each of which is defined by an objective function and by constraints that are expressed by and matrices and . Furthermore, we assume that these subproblems are only dependent on a few elements of , and that they are loosely coupled. The structure in such problems is a form of partial separability, which implies that the Hessian of the problem is sparse, see e.g., [32] and references therein. Existing distributed algorithms for solving (1), commonly solve the problem using a computational network with computational agents, each of which is associated with its own local subproblem. The graph describing this computational network has the node set with an edge between any two nodes in case they need to communicate with one another. The existence of an edge also indicates existence of coupling among the subproblems associated to neighboring agents. This graph is referred to as the computational graph of the algorithm and matches the coupling structure in the problem, which enables us to solve the problem distributedly while providing complete privacy among the agents.

Among different algorithms for solving problems like (1) distributedly, the ones based on first order methods are among the simplest ones. These algorithms are devised by applying gradient/subgradient or proximal point methods to the problem or an equivalent reformulations of it, see e.g., [26, 27, 5, 13, 7, 10]. In this class, algorithms that are based on gradient or subgradient methods, commonly require simple local computations. However, they are extremely sensitive to the scaling of the problem, see e.g., [26, 27]. Algorithms based on proximal point methods alleviate the scaling sensitivity issue, see e.g., [5, 13, 7, 10], but this comes at a price of more demanding local computations and/or more sophisticated communication protocols among agents, see e.g., [31, 28, 14, 15].

Despite the effectiveness of this class of algorithms, they generally still require many iterations to converge to an accurate solution. In order to improve the convergence properties of the aforementioned algorithms, there has recently been a surge of interest in devising distributed algorithms using second order methods, see e.g., [9, 34, 25, 20, 1]. In [25], the authors propose a distributed optimization algorithm based on a Lagrangian dual decomposition technique which enables them to use second order information of the dual function to update the dual variables within a dual interior-point framework. To this end, at each iteration, every agent solves a constrained optimization problem for updating local primal variables and then communicates with all the other agents to attain the necessary information for updating the dual variables. This level of communication is necessary due to the coupling in the considered optimization problem. The authors in [34] present a distributed Newton method for solving a network utility maximization problem. The proposed method relies on the special structure in the problem, which is that the objective function is given as a summation of several decoupled terms, each of which depends on a single variable. This enables them to utilize a certain matrix splitting method for computing Newton directions distributedly. In [20, 1] the authors put forth distributed primal and primal-dual interior-point methods that rely on proximal splitting methods, particularly ADMM, for solving for primal and primal-dual directions, distributedly. This then allows them to propose distributed implementations of their respective interior-point methods. One of the major advantages of the proposed algorithms in [34, 20, 1] lies in the fact that the required local computations are very simple. These approaches are based on inexact computations of the search directions, and they rely on first order or proximal methods to compute these directions. Generally the number of required iterations to compute the directions depends on the desired accuracy, and in case they require high accuracy for the computed directions, this number can grow very large. This means that commonly the computed directions using these algorithms are not accurate, and particularly the agents only have approximate consensus over the computed directions. This inaccuracy of the computed directions can also sometimes adversely affect the number of total primal or primal-dual iterations for solving the problem.

In this paper we propose a distributed primal-dual interior-point method and we evade the aforementioned issues by investigating another distributed approach to solve for primal-dual directions. To this end we borrow ideas from so-called message-passing algorithms for exact inference over probabilistic graphical models, [21, 29]. In this class of inference methods, message-passing algorithms are closely related to non-serial dynamic programming, see e.g., [3, 24, 33, 21]. Non-serial dynamic programming techniques, unlike serial dynamic programming, [4], that are used for solving problems with chain-like or serial coupling structure, are used to solve problems with general coupling structure. Specifically, a class of non-serial dynamic programming techniques utilize a tree representation of the coupling in the problem and use similar ideas as in serial techniques to solve the problem efficiently, see e.g., [3, 33, 24, 30]. We here also use a similar approach for computing the primal-dual directions. As we will see later, this enables us to devise distributed algorithms, that unlike the previous ones compute the exact directions within a finite number of iterations. In fact, this number can be computed a priori, and it only depends on the coupling structure in the problem. Unfortunately these advantages come at a cost. Particularly, these algorithms can only be efficiently applied to problems that are sufficiently sparse. Furthermore, for these algorithms the computational graphs can differ from the coupling structure of the problem, and hence they can only provide partial privacy among the agents. The approach presented in this paper is also closely related to multi-frontal factorization techniques for sparse matrices, e.g., see [22, 12, 2]. In fact we will show that the message-passing framework can be construed as a distributed multi-frontal factorization method using fixed pivoting for certain sparse symmetric indefinite matrices. To the best knowledge of the authors the closest approach to the one put forth in this paper is the work presented in [18, 19]. The authors for these papers, propose an efficient primal-dual interior-point method for solving problems with a so-called nested block structure. Specifically, by exploiting this structure, they present an efficient way for computing primal-dual directions by taking advantage of parallel computations when computing factorization of the coefficient matrix in the augmented system at each iteration. In this paper, we consider a more general coupling structure and focus on devising a distributed algorithm for computing the search directions, and we provide assurances that this can be done even when each agent has a limited access to information regarding the problem, due to privacy constraints.

### Outline

Next we first define some of the common notations used in this paper, and in Section 2 we put forth a general description of coupled optimization problems and describe mathematical and graphical ways to express the coupling in the problem. In Section 3 we review some concepts related to chordal graphs. These are then used in Section 4 to describe distributed optimization algorithms based on message-passing for solving coupled optimization problems. We briefly describe the primal-dual interior-point method in Section 5. In Section 6, we first provide a formal mathematical description for loosely coupled problems and then we show how primal-dual methods can be applied in a distributed fashion for solving loosely coupled problems. Furthermore, in this section we discuss how the message-passing framework is related to multi-frontal factorization techniques. We test the performance of the algorithm using a numerical example in Section 7, and finish the paper with some concluding remarks in Section 8.

### Notation

We denote by the set of real scalars and by the set of real matrices. With we denote a column vector of all ones. The set of symmetric matrices are represented by . The transpose of a matrix is denoted by and the column and null space of this matrix is denoted by and , respectively. We denote the set of positive integers with . Given a set , the matrix is the - matrix that is obtained by deleting the rows indexed by from an identity matrix of order , where denotes the number of elements in set . This means that is a - dimensional vector with the components of that correspond to the elements in , and we denote this vector with . With we denote the th element of vector at the th iteration. Also given vectors for , the column vector is all of the given vectors stacked.

## 2 Coupled Optimization Problems

Consider the following convex optimization problem

(2) |

where for all . We assume that each function is only dependent on a small subset of elements of . Particularly, let us denote the ordered set of these indices by . We also denote the ordered set of indices of functions that depend on with . With this description of coupling within the problem, we can now rewrite the problem in (2), as

(3) |

where is a – matrix that is obtained from an identity matrix of order by deleting the rows indexed by . The functions are lower dimensional descriptions of s such that for all and . For instance consider the following optimization problem

(4) |

and let us assume that , , , , , and . With this dependency description we then have , , , , , , and . This problem can then be written in the same format as in (3) as

(5) |

The formulation of coupled problems as in (3) enables us to get a more clear picture of the coupling in the problem. Next we describe how the coupling structure in (2) can be expressed graphically using undirected graphs.

### 2.1 Coupling and Sparsity Graphs

A graph is specified by its vertex and edge sets and , respectively. The coupling structure in (2) can be described using an undirected graph with node or vertex set and the edge set with if and only if . We refer to this graph, , as the coupling graph of the problem. Notice that all sets induce complete subgraphs on the coupling graph of the problem. Another graph that sheds more light on the coupling structure of the problem is the so-called sparsity graph, , of the problem. This graph is also undirected, though with node or vertex set and the edge set with if and only if . Similarly, all sets induce complete subgraphs on the sparsity graph of the problem. Let us now reconsider the example in (5). The sparsity and coupling graphs for this problem are illustrated in Figure 1, on the left and right respectively. It can then be verified that all s and s induce complete graphs over coupling and sparsity graphs, respectively.

As we will see later graph representations of the coupling structure in problems play an important role in designing distributed algorithms for solving coupled problems and gaining insight regarding their distributed implementations. Specifically, chordal graphs and their characteristics play a major role in the design of our proposed algorithm. This is the topic of the next section.

## 3 Chordal Graphs

A graph with vertex set and edge set is chordal if every of its cycles of length at least four has a chord, where a chord is an edge between two non-consecutive vertices in a cycle, [17, Ch. 4]. A clique of is a maximal subset of that induces a complete subgraph on . Consequently, no clique of is entirely contained in any other clique, [6]. Let us denote the set of cliques of as . There exists a tree defined on such that for every with , is contained in all the cliques in the path connecting the two cliques in the tree. This property is called the clique intersection property, and trees with this property are referred to as clique trees. For instance the graph on the left in Figure 1 is chordal and has five cliques, namely , , , and . A clique tree over these cliques is given in Figure 2. This tree then satisfies the clique intersection property, e.g., notice that and the only clique in the path between and , that is , also includes .

Chordal graphs and their corresponding clique trees play a central role in the design of the upcoming algorithms. For chordal graphs there are efficient methods for computing cliques and clique trees. However, the graphs that we will encounter, particularly the sparsity graphs, do not have to be chordal. As a result, next and for the sake of completeness we first review simple heuristic methods to compute a chordal embedding of such graphs, where a chordal embedding of a graph is a chordal graph with the same vertex set and an edge set such that . We will also explain how to compute its cliques and the corresponding clique tree.

### 3.1 Chordal Embedding and Its Cliques

Greedy search methods are commonly used for computing chordal embeddings of graphs, where one such method is presented in Algorithm 1, [11], [21]. The graph with the returned edge set will then be a chordal graph.

This algorithm also computes the set of cliques of the computed chordal embedding which are returned in the set . Notice that in steps 4, 5 and 6 is defined based on the most recent description of the sets and . The criterion used in Step 3 of the algorithm for selecting a vertex is the so-called min-degree criterion. There exist other versions of this algorithm that utilize other criteria, e.g., min-weight, min-fill and weighted-min-fill. Having computed a chordal embedding of the graph and its clique set, we will next review how to compute a clique tree over the computed clique set.

### 3.2 Clique Trees

Assume that a set of cliques for a chordal graph is given as . In order to compute a clique tree over the clique set we need to first define a weighted undirected graph, , over with edge set where if and only if , where the assigned weight to this edge is equal to . A clique tree over can be computed by finding any maximum spanning tree of the aforementioned weighted graph. This means finding a tree in the graph that contains all its nodes and edges with maximal accumulated weight. An algorithm to find such a tree is presented in Algorithm 2, [11], [21]. The tree described by the vertex set and edge set is then a clique tree.

We will now discuss distributed optimization using message-passing.

## 4 Optimization Over Clique Trees

In this section, we describe a distributed optimization algorithm based on message-passing. Particularly, we focus on the building blocks of this algorithm, namely we will provide a detailed description of its computational graph, messages exchanged among agents, the communication protocol they should follow and how they compute their corresponding optimal solutions. The convergence and computational properties of such methods, within exact inference over probabilistic graphical models, are extensively discussed in [21, Ch. 10, Ch. 13]. For the sake of completeness and future reference, we here also review some of these results and provide proofs for these results using the unified notation in this paper, in the appendix.

### 4.1 Distributed Optimization Using Message-passing

Consider the optimization problem in (2). Let denote the chordal sparsity graph for this problem and let and be its set of cliques and a corresponding clique tree, respectively. It is possible to devise a distributed algorithm for solving this problem that utilizes the clique tree as its computational graph. This means that the nodes act as computational agents and collaborate with their neighbors that are defined by the edge set of the tree. For example, the sparsity graph for the problem in (5) has five cliques and a clique tree over these cliques is illustrated in Figure 2. This means the problem can be solved distributedly using a network of five computational agents, each of which needs to collaborate with its neighbors as defined by the edges of the tree, e.g., Agent 2 needs to collaborate with agents .

In order to specify the messages exchanged among these agents, we first assign different terms of the objective function in (2) to each agent. A valid assignment in this framework is that can only be assigned to agent if . We denote the ordered set of indices of terms of the objective function assigned to agent by . For instance, for the problem in (5), assigning and to Agent 2 would be a valid assignment since and hence . Notice that the assignments are not unique and for instance there can exist agents and with so that and making assigning to agents or both valid. Also for every term of the objective function there will always exist an agent that it can be assigned to, which is proven in the following proposition. {proposition} For each term of the objective function, there always exists a for which . {proof} Recall that each set induces a complete subgraph on the sparsity graph, , of the problem. Then by definition of cliques, is either a subset of a clique or is a clique of the sparsity graph.

Before we continue with the rest of the algorithm description, we first need to define some new notations that are going to be extensively used in the following. Consider Figure 3 which illustrates a clique tree for a given sparsity graph . Each node in the tree is associated to a clique of and let denote the set of indices of cliques that are on the node -side of edge . Similarly, denotes the same but for the ones on the -side of . Also we denote the set of indices of variables in the cliques specified by by , i.e., . Similarly the set of indices of variables in cliques specified by is denoted by . The set of all indices of objective function terms that are assigned to nodes specified by is represented by , i.e., , and the ones specified by with . In order to make the newly defined notations more clear, let us reconsider the example in (5) and its corresponding clique tree in Figure 2, and let us focus on the edge. For this example then , , , , and . With the notation defined, we will now express the messages that are exchanged among neighboring agents. Particularly, let and be two neighboring agents, then the message sent from agent to agent , , is given by

(6) |

where is the so-called separator set of agents and . As a result, for agent to be able to send the correct message to agent it needs to wait until it has received all the messages from its neighboring agents other than . Hence, the information required for computing a message also sets the communication protocol for this algorithm. Specifically, it sets the ordering of agents in the message-passing procedure in the algorithm, where messages can only be initiated from the leaves of the clique tree and upwards to the root of the tree, which is referred to as an upward pass through the tree. For instance, for the problem in (5) and as can be seen in Figure 2, . Then the message to be sent from Agent 2 to Agent 1 can be written as

(7) |

which can only be computed if Agent 2 has received the messages from agents 4 and 5.

The message, , that every agent receives from a neighboring agent in fact summarizes all the necessary information that agent needs from all the agents on the -side of the edge . Particularly this message provides the optimal value of

as a function of the variables that agents and share, i.e., . This is shown in the following theorem. {theorem} Consider the message sent from agent to agent as defined in (6). This message can also be equivalently rewritten as

(8) |

See [21, Thm. 10.3] or Appendix 9. With this description of messages and at the end of an upward-pass through the clique tree, the agent at the root of the tree, indexed , will have received messages from all its neighbors. Consequently, it will have all the necessary information to compute its optimal solution by solving the following optimization problem

(9) |

The next theorem proves the optimality of such a solution. {theorem} The equation in (9) can be rewritten as

(10) |

which means that denotes the optimal solution for elements of specified by . {proof} See [21, Corr. 10.2, Prop. 13.1] or Appendix 10 Let us now assume that the agent at the root having computed its optimal solution , sends messages and the computed optimal solution to its children, i.e., to all agents . Here denotes the optimal solution computed by agent . Then all these agents, similar to the agent at the root, will then have received messages from all their neighbors and can compute their corresponding optimal solution as

(11) |

Notice that since is optimal, the additional regularization term in (11) will not affect the optimality of the solution. All it does is to assure that the computed optimal solution by the agent is consistent with that of the root. This observation also allows us to rewrite (11) as

(12) |

This means that the root does not need to compute nor send the message to its neighbors and it suffices to only communicate its computed optimal solution. The same procedure is executed downward through the tree until we reach the leaves, where each agent , having received the computed optimal solution by its parent, i.e., , computes its optimal solution by

(13) |

where denotes the index for the parent of agent . As a result by the end of one upward-downward pass through the clique tree, all agents have computed their corresponding optimal solutions, and hence, at this point, the algorithm can be terminated. Furthermore, with this way of computing the optimal solution, it is always assured that the solutions computed by parents and the children are consistent with one another. Since this is the case for all the nodes in the clique tree, it follows that we have consensus over the network. A summary of this distributed approach is given in Algorithm 3.

###### Remark 1

So far we have provided a distributed algorithm to compute a consistent optimal solution for convex optimization problems in the form (2). However, this algorithm relies on the fact that we are able to eliminate variables and compute the optimal objective value as a function of the remaining ones in closed form. This capability is essential, particularly for computing the exchanged messages among agents and in turn limits the scope of problems that can be solved using this algorithm. We will later show how the described algorithm can be incorporated within a primal-dual interior-point method to solve general convex optimization problems, distributedly.

###### Remark 2

The message-passing scheme presented in this section is in fact a recursive algorithm and it terminates within a finite number of steps or after an upward-downward pass. Let us define, , the height of a tree as the maximum number of edges in a path from the root to a leaf. This number then tells us how many steps it will take to perform the upward-downward pass through the tree. As a result, the shorter the tree the fewer the number of steps we need to take to complete a pass through the tree and compute the solution. Due to this fact, and since given a tree we can choose any node to be the root, having computed the clique tree we can improve the convergence properties of our algorithm by choosing a node as the root that gives us the minimum height.

### 4.2 Modifying the Generation of the Computational Graph

As was discussed above, the clique tree of the sparsity graph of a coupled problem, defines the computational graph for the distributed algorithm that solves it. Given the sparsity graph for the problem, one of the ways for computing a chordal embedding and a clique tree for this graph is through the use of algorithms 1 and 2. Particularly, using these algorithms allows one to automate the procedure for producing a clique tree for any given sparsity graph, with possibly different outcomes depending on the choice of algorithms. However, it is important to note that sometimes manually adding edges to the sparsity graph or its chordal embedding can enable us to shape the clique tree to our benefit and produce more suitable distributed solutions. In this case, though, extra care must be taken. For instance, it is important to assure that the modified sparsity graph is still a reasonable representation of the coupling in the problem and that the generated tree satisfies the clique intersection property, and is in fact a clique tree, as this property has been essential in the proof of the theorems presented in this section. We illustrate this using an example. Consider the following coupled optimization problem

(14a) | ||||

(14b) | ||||

(14c) | ||||

(14d) | ||||

(14e) | ||||

(14f) | ||||

(14g) | ||||

(14h) |

This problem can be equivalently rewritten as

where for , are the indicator functions for the constraints in (14b)– (14h), respectively, defined as

This problem is in the same format as (3). Let us assume that we intend to produce a distributed algorithm for solving this problem using message-passing that would take full advantage of parallel computations. Without using any intuition regarding the problem and/or incorporating any particular preference regarding the resulting distributed algorithm, we can produce the chordal sparsity graph for this problem as depicted in the top graph of Figure 4. A clique tree for this sparsity graph can be computed using algorithms 1 and 2, which is illustrated in the bottom plot of Figure 4. A distributed algorithm based on this computational graph does not take full advantage of parallel computations.

In order to produce a distributed algorithm that better facilitates the use of parallel computations, it is possible to modify the sparsity graph of the problem as shown in Figure 5, the top graph, where we have added additional edges, marked with dashed lines, to the graph while preserving its chordal property. Notice that by doing so, we have virtually grouped variables –, that couple the terms in the objective function and constraints, together. The corresponding clique tree for this graph is illustrated in Figure 5, the bottom graph. Notice that due to the special structure in the clique tree, within the message-passing algorithm the computation of the messages generated from agents 1–4 can be done independently, and hence in parallel. So using this clique tree as the computational graph of the algorithm enables us to fully take advantage of parallel computations. Next we briefly describe a primal-dual interior-point method for solving convex optimization problems, and then we investigate the possibility of devising distributed algorithms based on these methods for solving loosely coupled problems.

## 5 Primal-dual Interior-point Method

Consider the following convex optimization problem

(15) |

where , and with and . Under the assumption that we have constraint qualification, e.g., that there exist a strictly feasible point, then , and constitute a primal-dual optimal solution for (15) if and only if they satisfy the KKT optimality conditions for this problem, given as

(16a) | ||||

(16b) | ||||

(16c) | ||||

(16d) | ||||

(16e) |

A primal-dual interior-point method computes such a solution by iteratively solving linearized perturbed versions of (16) where (16d) is modified as

with , [35, 8]. Particularly, for this framework, at each iteration given primal and dual iterates , and so that and for all , the next update direction is computed by solving the linearization of

(17a) | ||||

(17b) | ||||

(17c) |

at the current iterates, given as

(18a) | ||||

(18b) | ||||

(18c) |

where

(19a) | ||||

(19b) | ||||

(19c) |

Define , ,

and . By eliminating as

(20) |

we can rewrite (18) as

(21) |

which has a lower dimension than (18), and unlike the system of equations in (18), is symmetric. This system of equations is sometimes referred to as the augmented system. It is also possible to further eliminate in (21) and then solve the so-called normal equations for computing . However, this commonly destroys the inherent structure in the problem, and hence we abstain from performing any further elimination of variables. The system of equations in (21) also expresses the optimality conditions for the following quadratic program

(22) |

and hence, we can compute and also by solving (22). Having computed and , can then be computed using (20), which then allows us to update the iterates along the computed directions. A layout for a primal-dual interior-point is given in Algorithm 4.

###### Remark 3

There are different approaches for computing proper step sizes in the 5th step of the algorithm. One of such approaches ensures that for and , by first setting

and conducting a backtracking line search as below

with and initialized as . Moreover, in order to ensure steady decrease of the primal and dual residuals, the back tracking is continued as

where . The resulting ensures that the primal and dual iterates remain feasible at each iteration and that the primal and dual residuals will converge to zero, [35, 8].

###### Remark 4

The primal-dual interior-point method presented in Algorithm 4, is an infeasible long step variant of such methods, [35]. There are other alternative implementations of primal-dual methods that particularly differ in their choice of search directions, namely short-step, predictor-corrector and Mehrotra’s predictor-corrector. The main difference between the distinct primal-dual directions, commonly arise due to different approaches for perturbing the KKT conditions, specially through the choice of , [35]. This means that for the linear system of equations in (17), only the right hand side of the equations will be different and hence the structure of the coefficient matrix in (21) remains the same for all the aforementioned variants. Consequently, all the upcoming discussions will be valid for other such variants.

Next we provide a formal description of loosely coupled problems and will show how we can devise a distributed primal-dual interior-point method for solving these problems using message-passing.

## 6 A Distributed Primal-dual Interior-point Method

In this section we put forth a distributed primal-dual interior-point method for solving loosely coupled problems. Particularly, we first provide a formal description for loosely coupled problems and then give details on how to compute the primal-dual directions and proper step sizes, and how to decide on terminating the algorithm distributedly.

### 6.1 Loosely Coupled Optimization Problems

Consider the convex optimization problem in (1). We can provide mathematical and graphical descriptions of the coupling structure in this problem, as in Section 2. The only difference is that the coupling structure will in this case concern the triplets and instead of single functions . Similar to (3) we can reformulate (1) as

(23a) | ||||

(23b) | ||||

(23c) |

where in this formulation, the functions and are defined in the same manner as the functions , with , and the matrices are defined by removing unnecessary columns from where and for all . Furthermore, we assume that the loose coupling in the problem is such that the sparsity graph of the problem is such that for all cliques in the clique tree, we have and that is small in comparison to the cliques sizes.

From now on let us assume that the chordal sparsity graph of the problem in (23) has cliques and that