An Iterative Interior Point Network Utility Maximization Algorithm This work was supported by the Indo-French Centre for Promotion of Advanced Research under project number 5100-IT1.

# An Iterative Interior Point Network Utility Maximization Algorithm 1

## Abstract

Distributed and iterative network utility maximization algorithms, such as the primal-dual algorithms or the network-user decomposition algorithms, often involve trajectories where the iterates may be infeasible. In this paper, we propose a distributed and iterative algorithm that ensures feasibility of the iterates at all times and convergence to the global maximum. A benchmark algorithm due to Kelly et al. [J. of the Oper. Res. Soc., 49(3), 1998] involves fast user updates coupled with slow network updates in the form of additive-increase multiplicative-decrease of suggested user flows. The proposed algorithm may be viewed as one with fast user updates and fast network updates that keeps the iterates feasible at all times. Simulations suggest that our proposed algorithm converges faster than the aforementioned benchmark algorithm.

## 1 Introduction and the main result

The setting of this paper is similar to that of Kelly et al. [1]. Consider a network with directed link resources. Let be the capacity of the link . There are users and each has a single fixed path. Each user sends data along its associated path with the first vertex of the path being the source of the user’s data and the last vertex being its terminus. Let be the matrix with if the path uses link and otherwise. Let denote the set of users and let denote the set of links. Let be the utility functions of the users. User derives a utility when sending a flow of rate . The functions are assumed to be strictly concave and increasing. Let , , and . Let . Throughout, we make the standing assumption that has an interior feasible point, i.e., there exists a point for which all inequalities are strict. The system optimal operating point solves the problem:

 System(w,A,c) : maxx∈A W(x) :=n∑e=1we(x(e)). (1)

The catches are that the network operator does not know the utility functions of the users, and the users know neither the rate choices of the other users nor the flow constraints on the network.

Kelly [2] proposed the decomposition of the above problem into two subproblems, one to be solved by each user, and the other to be solved by the network. Let be the cost per unit rate to user set by the network, and let be the price user is willing to pay. The maximization problem solved by user is

 User(we;λe): maxpe:pe≥0 we(peλe)−pe. (2)

If is known to the network, the network’s optimization problem is

 Network(A,c;p): maxx∈A n∑e=1pelog(x(e)). (3)

The solution to Network is well-known to satisfy the so-called proportional fairness criterion: if are the optimal dual price variables associated with the dual to Network, then

 x(e)=pe∑l:l∈eμl,e∈[n], (4)

is the optimal solution to the network problem. Kelly [2] showed that there exist costs per unit rate , prices , and flows , satisfying for such that solves User for and solves Network; furthermore, is the unique solution to System. The costs per unit rate satisfy for some dual price variables.

In order to ensure operation at , taking the information asymmetry constraints into account, Kelly et al. [1] proposed the following fast user adaptation dynamics:

 pe(t) =x(e,t)⋅w′e(x(e,t)), e∈[n], (5) ddtx(e,t) =κ⋅(pe(t)−x(e,t)⋅∑l:l∈eμl(t)), e∈[n], (6) μl(t) =ψl(∑e:e∋lx(e,t)),l∈[m], (7)

where is a penalty2 or cost per unit flow when the total flow in the link is . It signifies the level of congestion in that link. Thus in (7) is the cost per unit flow through link , and may be interpreted as a dual variable of the network problem. The optimal dual variables for Network are such that the net cost of user flow matches the price paid by that user; see (4). The network, adapts the flow using an additive-increase multiplicative-decrease scheme as in (6); the network therefore attempts to equalize, albeit slowly, the instantaneous net cost of user flow, , to the instantaneous price paid by that user, . On the other hand, if we differentiate (2) with respect to and use the relation , we get that in (5) maximizes User. So the users adapt instantaneously (in comparison to the network’s slower speed of adaptation) to the congestion signal. Kelly et al. [1] provided a Lyapunov function for the dynamical system defined by (5)-(7). The stable equilibrium point of the dynamical system maximizes a relaxation of the system problem, as determined by the choice of in (7).

Despite the popularity of this approach, there are two issues we would like to highlight.

• may not remain feasible at all times .

• converges to the optimal value of a relaxation of the system problem.

The first of these issues was highlighted in Johansson et al. [3]. The dynamics (5)-(7) cannot then be used in systems where feasibility has to be ensured at all times. The second of these issues is often circumvented via iterative algorithms where the Lagrange multipliers or penalty functions are also adapted (see for example, Arrow and Hurwicz [4], Low and Lapsley [5], Chiang et al. [6], Palomar and Chiang [7]). Such approaches either assume knowledge of the utility functions at the network end or may encounter infeasible iterates, or both.

The literature on network utility maximization is vast and we will not be able to do justice to all of them. However, there are two works, Hochbaum [8] and La and Anantharam [9], that are very relevant to our contribution which we bring to the reader’s attention. A greedy algorithm proposed by Hochbaum [8] can be adapted to solve the system problem with iterates remaining feasible at all times and without full knowledge of the utility functions at the network side. Though the algorithm circumvents both the problems highlighted above, it works only when the set of feasible flows forms the “independent set of a polymatroid”. This is the case when the network has, for example, a single source and multiple sinks or when the network has multiple sources but a single sink.

Mo and Walrand [10] proposed a window-based rate control mechanism that converges to the solution to Network for a fixed . The window update rule of [10] uses only delay information provided to the user (propagation and round-trip delays). La and Anantharam [9] proposed two algorithms that solve the system problem using the decomposition of Kelly et al. [1]. The first algorithm incorporates the solution to the user problem into the window update rule of [10]. The second algorithm of La and Anantharam [9] explicitly finds the solution to the user problem and the network problem in each iteration. Although their simulations showed the convergence of the algorithm for general networks, a mathematical proof was given only for the case of a network with a single link. Their algorithm additionally imposes more stringent conditions on the utility functions than those assumed in this paper.

In this paper, we propose a discrete-time algorithm (see Algorithm 1 below) that remains feasible at all times and converges to the global maximum of System. The corresponding continuous time dynamics also shares the same properties. Our algorithm can be considered as a generalization of the algorithm of La and Anantharam [9] to 1) more general networks, and 2) a larger class of objective functions.

For a set of flows, abusing notation, write as per (5), and set . Write

 T(x)\coloneqqargmaxy∈An∑e=1pe(x(e))log(y(e)) (8)

for the solution to Network. If for some , then the objective function in (8) is not strictly concave over . The optimization problem (8) may then have multiple solutions, and so is to be viewed as a set-valued mapping whose values are convex and compact subsets of . Define .

###### Algorithm 1
1. Initialize such that . Initialize .

2. User update:

 p(k)e=x(k)(e)⋅w′e(x(k)(e)), e∈[n]. (9)
3. Network update:

Find a point and set:

 x(k+1)=x(k)+ak+1(v−x(k)). (10)
4. Set and go to step 2.

Our main result is the following theorem.

###### Theorem 1

Assume that has an interior feasible point. The iterates of Algorithm 1 converge to the optimal solution to the system problem, i.e., as .

Observe that, in Algorithm 1, the users exhibit the same fast adaptation as in the dynamics (5). But in the network update, iterate is a convex combination of and which, by induction, remains in the feasible set for all . This resolves the feasibility issue in the dynamics of (5)-(7). In the proof we will argue that the iterates track the differential inclusion

 ddtx(t)∈T(x(t))−x(t); (11)

we will in fact see that the solution to this differential inclusion remains feasible at all times.

Theorem 1 also states that the iterates converge to the global optimum of the system problem. This resolves the issue that the dynamics (5)-(7) converge to the solution to a relaxed problem different from the original system problem.

The main technical issues to surmount are 1) the dynamics in (11) have multiple fixed points and it is nontrivial to show convergence to the global optimum; 2) is not necessarily a continuous function of .

Algorithm 1 assumes the existence of a central entity that computes the solution to Network, i.e. . Section 3 describes algorithms to compute efficiently for some class of networks. The assumption regarding the existence of a central entity is not crucial. An alternative is to use Mo and Walrand’s algorithm [10] that finds for a fixed ; that algorithm uses only the information available at the user end. The can then be adapted (user updates) at a slower time scale. This allows for the use of Algorithm 1 in large scale networks in a distributed setting.

The rest of the paper is organized as follows. In Section 2, we prove Theorem 1. In Section 3, we address the complexity of identifying the proportionally fair solution point for the network problem. We provide an example of a network where flows aggregate into a ‘main branch’, reminiscent of traffic from the suburbs flowing into an arterial highway leading to the downtown of a large city, for which the complexity to solve the network problem is . We also argue that this complexity is manageable ( plus computations for feasibility checks) in situations where the feasible set is a polymatroid, for example, when all flows either originate or terminate at a single vertex. We also demonstrate via simulations that the dynamics in (11) converge to the equilibrium at a faster rate than the dynamics of (5)-(7) for identical speed parameters . In Section 4, we end the paper with some concluding remarks.

## 2 Convergence

The update equation in step 3 of Algorithm 1 is a standard stochastic approximation scheme but without the stochasticity. A common method to analyze the asymptotic behavior of such schemes is the dynamical systems approach based on the theory of ordinary differential equations (ODE). But being a set valued map necessitates the use of differential inclusions.

The outline of the proof is as follows. We will first characterize the fixed points of the mapping . We will then argue that the system optimal point is one of the finitely many fixed points of the mapping . We will next show that the solution to the differential inclusion in (11) models the asymptotic behavior of the iterates . Following this, We will show that every solution to the differential inclusion converges to one of the fixed points of via Lyapunov theory. Finally, we will prove that the fixed point to which the solution to the differential inclusion converges as is the system optimal point.

### 2.1 Characterization of the fixed points of T(x)

###### Definition 1

A point is a fixed point of the set valued map if .

Let . Let be the subset of whose points have support contained within . Define a subproblem of the system problem as

 Subsystem(w,A,c,S): maxy∈A|S ∑e∈Swe(y(e)). (12)
###### Lemma 1

Let be a fixed point of the mapping . Let . Then is the unique optimal solution to the Subsystem.

{IEEEproof}

We have for all . If for some , then any element has which contradicts the fact that is a fixed point. Hence for all , and we may write

 ¯¯¯x∈T(¯¯¯x)=argmaxy∈A  ∑e∈Spe(¯¯¯x(e))log(y(e)). (13)

Since , we also have that maximizes (13) over , i.e.,

 ¯¯¯x=argmaxy∈A|S  ∑e∈Spe(¯¯¯x(e))log(y(e)). (14)

Let and be the optimal dual variables for the network subproblem (14). The corresponding Karush-Kuhn-Tucker (KKT) conditions are

 pe(¯¯¯x(e))¯¯¯x(e)=∑l:l∈eμl−ηe, e∈S, (15) μl⋅(∑e:e∋l¯¯¯x(e)−c(l))=0, l∈[m], (16) ηe⋅¯¯¯x(e)=0, e∈[n], (17) ηe≥0, e∈S, μl≥0, l∈[m] and ¯¯¯x∈A|S. (18)

Since , it is easy to see that equations (15-18) are the KKT conditions of Subsystem as well. Since is an interior feasible point of , KKT conditions are sufficient for optimality in (12) and is the optimal solution to Subsystem. Uniqueness follows from the strict concavity of .

Is every solution to Subsystem, , a fixed point of the mapping ? The possibility that for an and the first step of the proof of Lemma 1 says this is not always true. However, we can assert the following.

###### Lemma 2

The global maximum of the system problem, , is a fixed point of the mapping .

{IEEEproof}

solves the system problem. Let . can be a proper subset of . Let and be the optimal dual variables of the system problem. We then have

 w′e(x⋆(e))=∑l:l∈eμl−ηe, e∈S, (19) w′e(0)=∑l:l∈eμl−ηe, e∈Sc, (20) μl⋅(∑e:e∋lx⋆(e)−c(l))=0, l∈[m], (21) ηe⋅x⋆(e)=0, e∈[n], (22) ηe≥0, e∈[n], μl≥0, l∈[m] and%  x⋆∈A. (23)

Observe that is finite for an ; otherwise a small increase in and a corresponding decrease in for a suitable (which has finite ) will result in a feasible flow that has a larger objective function value. Hence for . Since for all , it follows from (19)-(23) that and satisfy the KKT conditions of the problem (8). Hence .

### 2.2 Need for the theory of differential inclusions

We now describe the issues that make it necessary to use differential inclusions to study the asymptotic behavior of . is the set of points that solve (8). If for some at a point , then the objective function in (8) is not strictly concave. Hence there can be multiple points that solve (8). A continuous selection from allows the use of differential equations to analyze the stochastic approximation scheme in (10). A natural question that arises is whether there is such a continuous selection from . We give an example in the Appendix showing that such a selection is not possible.

### 2.3 Differential Inclusions: Preliminaries

In this section, we define a differential inclusion and state relevant results from [11] that are used to show the convergence of Algorithm 1. Let be a set valued map. Consider the following differential inclusion:

 dxdt∈F(x). (24)

A solution to the differential inclusion in (24) with initial condition is an absolutely continuous function that satisfies (24) for almost every . The following conditions are sufficient for the existence of a solution to the differential inclusion (24):

1. is nonempty, convex and compact for each .

2. has a closed graph.

3. For some , for all , satisfies the following condition

 supz∈F(x)||z||≤K(1+||x||). (25)

The stochastic approximation scheme in is given as

 yk+1∈y(k)+ak+1(F(y(k))+U(k+1)), (26)

where satisfy the usual conditions:

 limk→∞ak=0, ∞∑k=1ak→∞, (27)

and are deterministic or random perturbations.

Let . Let be a continuous piece-wise linear function formed by the interpolation of as in

 ry(t)=y(k)+y(k+1)−y(k)t(k+1)−t(k)⋅(t−t(k)), (28) ∀ t∈[t(k),t(k+1)).
###### Definition 2

(A perturbed solution to (24)). Let be locally integrable function such that

 limt→∞sup0≤v≤T∣∣∣∣∣∣∫vtU(s)ds∣∣∣∣∣∣=0.

Let be a function such that as . Define

 Fϵ(x)={y∈Rm:∃z:||z−x||<ϵ,d(y,F(z))<ϵ}, (29)

where . An absolutely continuous function is a perturbed solution to (24) if there exists and as above such that

 dydt−U(t)∈Fδ(t)(y(t)). (30)

for almost every .

The following lemma, taken from [11], gives conditions on and for to be a perturbed solution to (24).

###### Lemma 3

[11, Prop. 1.3] Suppose is bounded, i.e., , and for all ,

 lims→∞sup{ ∣∣ ∣∣∣∣ ∣∣i−1∑k=sak+1U(k+1)∣∣ ∣∣∣∣ ∣∣:i=s+1,s+2, …,m(t(s)+T)}=0, (31)

where . Then is a perturbed solution of the differential inclusion (24).

###### Definition 3

A compact set is an internally chain transitive set if for any and every , there exists , solutions to (24) and , that satisfy the following.

1. for all and for all ,

2. for all ,

3. and .

We shall call the sequence as an chain in from to .

The following lemma, again taken from [11], characterizes the limit set of a perturbed solution.

###### Lemma 4

[11, Thm. 3.6] Let be a perturbed solution to (24). Then the limit set of

 L(r)\coloneqq⋂t≥0{r(s):s≥t}

is internally chain transitive.

### 2.4 Convergence analysis

We proceed to prove the convergence of to the optimal solution to the system problem. Observe that maps points in to itself. For , define

 F(x)\coloneqqT(PA(x))−x. (32)

where is the projection of onto the set .

###### Lemma 5

For each , is nonempty, convex and compact. Furthermore, has the closed graph property and satisfies (25).

{IEEEproof}

The objective function of the network problem is continuous and the constraint set is compact. The maximum exists due to the Weierstrass theorem. Also, the set of maximizers is closed and convex. Thus is nonempty, convex and compact, and hence so is .

We next prove the closed graph property of . A function has the closed graph property if it is upper hemicontinuous. The objective function in (8), , is jointly continuous3 in and . Also, the constraint set of the network problem does not vary with . By Berge’s maximum theorem [12, p. 116], is upper hemicontinuous. Since , the projection onto the convex set , is continuous, the composition is upper hemicontinuous. Consequently, is upper hemicontinuous and hence has the closed graph property.

Finally,

 supz∈F(x)||z|| =supz∈T(PA(x))||z−x|| ≤supz∈A||z−x|| ≤supz∈A||z||+||x|| ≤K+||x|| ≤K(1+||x||),

where .

###### Lemma 6

Let be obtained by the linear interpolation of as given in (28). Then is a perturbed solution to the differential inclusion (24) with defined as in (32).

{IEEEproof}

We first show that . Observe that . Assume . Since and is a convex combination of and , we have . It follows that

 F(xk)=T(PA(x(k)))−x(k)=T(x(k))−x(k).

We now see that the update equation in (10) is the same as the stochastic approximation scheme in (26) with for all . Observe that is bounded because for all and is compact; since , the condition in (31) is trivially satisfied. Hence, by Lemma 3, is a perturbed solution. We restrict our attention to solutions of (24) with initial condition . Since , lies in for all . Define

 Φt(x0)\coloneqq{x(t):x solves (???), x(0)=x0}.
###### Definition 4

Let be a subset of . Let be a continuous function such that

Then is called a Lyapunov function for .

Define as

 V(x)\coloneqqn∑e=1we(x⋆(e))−n∑e=1we(x(e)). (33)
###### Lemma 7

Let be the set of fixed points of . The function in (33) is a Lyapunov function for .

{IEEEproof}

Let and . We have, from the definition of in (8), that

 n∑e=1pe(x(e))log(v(e))≥n∑e=1pe(x(e))log(x(e)) (34)

because maximizes the network problem.

If for all , then the network problem has unique solution. Therefore, equality holds in (34) if and only if , i.e, is a fixed point of the mapping . Thus we have

 0 ≤n∑e=1pe(x(e))log v(e)x(e) \lx@stackrel(a)≤n∑e=1pe(x(e))(v(e)x(e)−1) =n∑e=1w′e(x(e))(v(e)−x(e)) =∇W(x)\vbox\scalebox{.5}{∙}(v−x), (35)

where (a) uses the inequality .

More generally, let for and for ; in particular, for . Define to be .

The value of the objective function in (8) evaluated at and are equal. Hence ,

 ∑e∈Spe(x(e))log(~v(e))≥∑e∈Spe(x(e))log(x(e)), (36)

and must be the unique solution to the problem defined in (14). Therefore, (36) holds with equality if and only if . Following the steps leading to (35), we have

 ∑e∈Sw′e(x(e))(~v(e)−x(e))≥0 (37)

which is a strict inequality if . Since for , this along with (37) yields

 ∇W(x)\vbox\scalebox{.5}{∙}(v−x)≥0; (38)

since for , equality holds in (38) if and only if . Hence

 dV(x(t))dt=−∇W(x(t))\vbox\scalebox{.5}{∙}(v−x(t))≤0, ∀ v∈T(x(t)). (39)

The inequality in (39) holds with an equality if and only if . Therefore is a Lyapunov function for .

###### Lemma 8

Let be the set of fixed points of . Every internally chain transitive set for in (32) is a singleton that is a subset of .

{IEEEproof}

By Lemma 1, there are at most finitely many fixed points of the mapping . Hence the cardinality of the set is finite and has empty interior. Also, by Lemma 7, is a Lyapunov function for . Proposition 3.27 of [11] states that if is a Lyapunov function for and if has an empty interior, then every internally chain transitive set is a subset of .

Choose small enough so that open balls of radius centered at each of the finite points are disjoint. Fix . Since any chain involves remaining in for all time and jumps of size at most to another point in , by the disjointedness of the -balls covering , there can be no chain in joining two of its distinct points. It follows that the internally chain transitive subsets of are singletons.

###### Lemma 9

The iterates converges to a fixed point of the mapping .

{IEEEproof}

In Lemma 6, we showed that is a perturbed solution to (24). By Lemma 4, the limit set of is internally chain transitive. By Lemma 8, is a singleton and . Let . Since is compact and is the only limit point of the sequence , every subsequence of has a further subsequence that converges to . Hence converges to . In the rest of this section, we show that the iterates converge to , the optimal solution to the system problem.

Let the dual variables of the optimization problem Network be . Kelly et al. [1] simplified the dual to this problem to be:

 Dual(p,A,c): minμl≥0,l∈[m](n∑e=1pe⋅log1∑l:l∈eμl+m∑l=1μlc(l)). (40)

We now argue that the search for the optimal may be restricted to a compact set.

###### Lemma 10

The optimization problem in (2.4) with is equivalent to the following optimization problem. For any ,

 max0≤μl≤2P/c(l)n∑e=1pe(x(e))⋅log(∑l:l∈eμl)−m∑l=1μlc(l), (41)

where

 P\coloneqqmaxx∈An∑e=1x(e)⋅w′e(x(e))<∞.
{IEEEproof}

Define to be the objective function in (41). For any , by reducing , we increase the objective function’s value. To see this, it suffices to show that for any . But this is easily checked as follows:

 ∂R(μ)∂μl =∑e:e∋lpe(x(e))1∑l′:l′∈eμl′−c(l) ≤∑e:e∋lpe(x(e))1μl−c(l) ≤1μln∑e=1pe(x(e))−c(l) ≤1μl[maxx∈An∑e=1pe(x(e))]−c(l) =Pμl−c(l)<0, (42)

where the last inequality follows if .

###### Lemma 11

Let converge to , a fixed point of the mapping . Then , the optimal solution to the system problem.

{IEEEproof}

Let solve problem (8) with , and so satisfies the KKT conditions

 pe(x(k)(e))v(k+1)(e)−∑l:l∈eμ(k)l+η(k)e=0, e∈[n], (43) μ(k)l⋅(∑e:e∋lv(k+1)(e)−c(l))=0, l∈[m], (44) η(k)e⋅v(k+1)(e)=0, e∈[n], (45) η(k)e≥0, e∈[n], μ(k)l≥0, l∈[m]. (46)

Let us first claim that for all . This holds for , the initial point, in Algorithm 1. If, for some , , then for all and so, and consequently, being a convex combination of and also satisfies . The claim follows by induction. Since , we have , and so, by (45), . Thus (43) simplifies to

 pe(x(k)(e))v(k+1)(e))=∑l:l∈eμ(k)l. (47)

Since we also have , and in (47), we have . Hence

 v(k+1)(e) =pe(x(k)(e))∑l:l∈eμ(k)l