A New Backpressure Algorithm for Joint Rate Control and Routing with Vanishing Utility Optimality Gaps and Finite Queue Lengths

# A New Backpressure Algorithm for Joint Rate Control and Routing with Vanishing Utility Optimality Gaps and Finite Queue Lengths

Hao Yu and Michael J. Neely
Department of Electrical Engineering
University of Southern California
###### Abstract

The backpressure algorithm has been widely used as a distributed solution to the problem of joint rate control and routing in multi-hop data networks. By controlling a parameter in the algorithm, the backpressure algorithm can achieve an arbitrarily small utility optimality gap. However, this in turn brings in a large queue length at each node and hence causes large network delay. This phenomenon is known as the fundamental utility-delay tradeoff. The best known utility-delay tradeoff for general networks is and is attained by a backpressure algorithm based on a drift-plus-penalty technique. This may suggest that to achieve an arbitrarily small utility optimality gap, the existing backpressure algorithms necessarily yield an arbitrarily large queue length. However, this paper proposes a new backpressure algorithm that has a vanishing utility optimality gap, so utility converges to exact optimality as the algorithm keeps running, while queue lengths are bounded throughout by a finite constant. The technique uses backpressure and drift concepts with a new method for convex programming.

## I Introduction

In multi-hop data networks, the problem of joint rate control and routing is to accept data into the network to maximize certain utilities and to make routing decisions at each node such that all accepted data are delivered to intended destinations without overflowing any queue in intermediate nodes. The original backpressure algorithm proposed in the seminal work [1] by Tassiulas and Ephremides addresses this problem by assuming that incoming data are given and are inside the network stability region and develops a routing strategy to deliver all incoming data without overflowing any queue. In the context of [1], there is essentially no utility maximization consideration in the network. The backpressure algorithm is further extended by a drift-plus-penalty technique to deal with data network with both utility maximization and queue stability considerations [2, 3, 4]. Alternative extensions for both utility maximization and queue stabilization are developed in [5, 6, 7, 8]. The above extended backpressure algorithms have different dynamics and/or may yield different utility-delay tradeoff results. However, all of them rely on “backpressure” quantities, which are the differential backlogs between neighboring nodes.

It has been observed in [9, 5, 7, 10] that the drift-plus-penalty and other alternative algorithms can be interpreted as first order Lagrangian dual type methods for constrained optimization. In addition, these backpressure algorithms follow certain fundamental utility-delay tradeoffs. For instance, the primal-dual type backpressure algorithm in [5] achieves an utility optimality gap with an queue length, where is an algorithm parameter. By controlling parameter , a small utility optimality gap is available only at the cost of a large queue length. The drift-plus-penalty backpressure algorithm [4], which has the best utility-delay tradeoff among all existing first order Lagrangian dual type methods for general networks, can only achieve an utility optimality gap with an queue length. Under certain restrictive assumptions over the network, a better tradeoff is achieved via an exponential Lyapunov function in [11], and an tradeoff is achieved via a LIFO-backpressure algorithm in [12]. The existing utility-delay tradeoff results seem to suggest that a large queueing delay is unavoidable if a small utility optimality gap is demanded.

Recently, there have been many attempts in obtaining new variations of backpressure algorithms by applying Newton’s method to the Lagrangian dual function. In the recent work [10], the authors develop a Newton’s method for joint rate control and routing. However, the utility-delay tradeoff in [10] is still ; and the algorithm requires a centralized projection step (although Newton directions can be approximated in a distributed manner). Work [13] considers a network flow control problem where the path of each flow is given (and hence there is no routing part in the problem), and proposes a decentralized Newton based algorithm for rate control. Work [14] considers network routing without an end-to-end utility and only shows the stability of the proposed Newton based backpressure algorithm. All of the above Netwon’s method based algorithms rely on distributed approximations for the inverse of Hessians, whose computations still require certain coordinations for the local information updates and propagations and do not scale well with the network size. In contrast, the first order Lagrangian dual type methods do not need global network topology information. Rather, each node only needs the queue length information of its neighbors.

This paper proposes a new first order Lagrangian dual type backpressure algorithm that is as simple as the existing algorithms in [4, 5, 7] but has a better utility-delay tradeoff. The new backpressue algorithm achieves a vanishing utility optimality gap that decays like , where is the number of iterations. It also guarantees that the queue length at each node is always bounded by a fixed constant of the same order as the optimal Lagrange multiplier of the network optimization problem. This improves on the utility-delay tradeoffs of prior work. In particular, it improves the utility-delay tradeoff in [5] and the utility-delay tradeoff of the drift-plus-penalty algorithm in [4], both of which yield an unbounded queue length to have a vanishing utility optimality gap. The new backpressure algorithm differs from existing first order backpressure algorithms in the following aspects:

1. The “backpressure” quantities in this paper are with respect to newly introduced weights. These are different from queues used in other backpressure algorithms, but can still be locally tracked and updated.

2. The rate control and routing decision rule involves a quadratic term that is similar to a term used in proximal algorithms [15].

Note that the benefit of introducing a quadratic term in network optimization has been observed in [16]. Work [16] considers a network utility maximization problem with given routing paths that is a special case of the problem treated in this paper. The algorithm of [16] considers a fixed set of predetermined paths for each session and does not scale well when treating all (typically exponentially many) possible paths of a general network. The algorithm proposed in [16] is not a backpressure type and hence is fundamentally different from ours. For example, the algorithm in [16] needs to update the primal variables (source session rates for each path) at least twice per iteration, while our algorithm only updates the primal variables (source session rates and link session rates) once per iteration. The prior work [16] shows that the utility optimality gap is asymptotically zero without analyzing the decay rate, while this paper shows the utility optimality gap decays like .

## Ii System Model and Problem Formulation

Consider a slotted data network with normalized time slots . This network is represented by a graph , where is the set of nodes and is the set of directed links. Let and . This network is shared by end-to-end sessions denoted by a set . For each end-to-end session , the source node and destination node are given but the routes are not specified. Each session has a continuous and concave utility function that represents the “satisfaction” received by accepting amount of data for session into the network at each slot. Unlike [5, 10] where is assumed to be differentiable and strongly concave, this paper considers general concave utility functions , including those that are neither differentiable nor strongly concave. Formally, each utility function is defined over an interval , called the domain of the function. It is assumed throughout that either or , the latter being important for proportionally fair utilities [17] that have singularities at .

Denote the capacity of link as and assume it is a fixed and positive constant.111As stated in [10], this is a suitable model for wireline networks and wireless networks with fixed transmission power and orthogonal channels. Define as the amount of session ’s data routed at link that is to be determined by our algorithm. Note that in general, the network may be configured such that some session is forbidden to use link . For each link , define as the set of sessions that are allowed to use link . The case of unrestricted routing is treated by defining for all links .

Note that if with , then and can also be respectively written as and . For each node , denote the sets of its incoming links and outgoing links as and , respectively. Note that and are the decision variables of a joint rate control and routing algorithm. If the global network topology information is available, the optimal joint rate control and routing can be formulated as the following multi-commodity network flow problem:

 maxxf,μ(f)l ∑f∈FUf(xf) (1) s.t. xf1{n=Src(f)}+∑l∈I(n)μ(f)l≤∑l∈O(n)μ(f)l,∀f∈F,∀n∈N∖{Dst(f)} (2) ∑f∈Fμ(f)l≤Cl,∀l∈L, (3) μ(f)l≥0,∀l∈L,∀f∈Sl, (4) μ(f)l=0,∀l∈L,∀f∈F∖Sl, (5) xf∈dom(Uf),∀f∈F (6)

where is an indicator function; (2) represents the node flow conservation constraints relaxed by replacing the equality with an inequality, meaning that the total rate of flow into node is less than or equal to the total rate of flow out of the node (since, in principle, we can always send fake data for departure links when the inequality is loose); and (3) represents link capacity constraints. Note that for each flow , there is no constraint (2) at its destination node since all incoming data are consumed by this node.

The above formulation includes network utility maximization with fixed paths as special cases. In the case when each session only has one single given path, e.g., the network utility maximization problem considered in [18], we could modify the sets used in constraints (4) and (5) to reflect this fact. For example, if link is only used for sessions and , then . Similarly, the case [16] where each flow is restricted to using links from a set of predefined paths can be treated by modifying the sets accordingly. See Appendix A for more discussions.

The solution to problem (1)-(6) corresponds to the optimal joint rate control and routing. However, to solve this convex program at a single computer, we need to know the global network topology and the solution is a centralized one, which is not practical for large data networks. As observed in [9, 5, 7, 10], various versions of backpressure algorithms can be interpreted as distributed solutions to problem (1)-(6) from first order Lagrangian dual type methods.

###### Assumption 1

(Feasibility) Problem (1)-(6) has at least one optimal solution vector .

###### Assumption 2

(Existence of Lagrange multipliers) Assume the convex program (1)-(6) has Lagrange multipliers attaining the strong duality. Specifically, define convex set . Assume there exists a Lagrange multiplier vector such that

 q(λ∗)=sup{(???):(???)-(???)}

where is the Lagrangian dual function of problem (1)-(6) by treating (3)-(6) as a convex set constraint.

Assumptions 1 and 2 hold in most cases of interest. For example, Slater’s condition guarantees Assumption 2. Since the constraints (2)-(6) are linear, Proposition 6.4.2 in [19] ensures that Lagrange multipliers exist whenever constraints (2)-(6) are feasible and when the utility functions are either defined over open sets (such as with ) or can be concavely extended to open sets, meaning that there is an and a concave function such that whenever .222If , such concave extension is possible if the right-derivative of at is finite (such as for or ). Such an extension is impossible for the example because the slope is infinite at . Nevertheless, Lagrange multipliers often exist even for these utility functions, such as when Slater’s condition holds [19].

###### Fact 1

(Replacing inequality with equality) If Assumption 1 holds, problem (1)-(6) has an optimal solution vector such that all constraints (2) take equalities.

###### Proof:

Note that each can appear on the left side in at most one constraint (2) and appear on the right side in at most one constraint (2). Let be an optimal solution vector such that at least one inequality constraint (2) is loose. Note that we can reduce the value of on the right side of a loose (2) until either that constraint holds with equality, or until reduces to . The objective function value does not change, and no constraints are violated. We can repeat the process until all inequality constraints (2) are tight. \qed

## Iii The New Backpresure Algorithm

### Iii-a Discussion of Various Queueing Models

At each node, an independent queue backlog is maintained for each session. At each slot , let be the source session rates; and let be the link session rates. Some prior work enforces the constraint (2) via virtual queues of the following form:

 Y(f)n[t+1]= max{Y(f)n[t]+xf[t]1{n=Src(f)}+∑l∈I(n)μ(f)l[t]−∑l∈O(n)μ(f)l[t],0}. (7)

While this virtual equation is a meaningful approximation, it differs from reality in that new injected data are allowed to be transmitted immediately, or equivalently, a single packet is allowed to enter and leave many nodes within the same slot. Further, there is no clear connection between the virtual queues in (7) and the actual queues in the network. Indeed, it is easy to construct examples that show there can be an arbitrarily large difference between the value in (7) and the physical queue size in actual networks (see Appendix B).

An actual queueing network has queues with the following dynamics:

 Z(f)n[t+1]≤ max{Z(f)n[t]−∑l∈O(n)μ(f)l[t],0}+xf[t]1{n=Src(f)}+∑l∈I(n)μ(f)l[t]. (8)

This is faithful to actual queue dynamics and does not allow data to be retransmitted over multiple hops in one slot. Note that (8) is an inequality because the new arrivals from other nodes may be strictly less than because those other nodes may not have enough backlog to send. The model (8) allows for any decisions to be made to fill the transmission values in the case that , provided that (8) holds.

This paper develops an algorithm that converges to the optimal utility defined by problem (1)-(6), and that produces worst-case bounded queues on the actual queueing network, that is, with actual queues that evolve as given in (8). To begin, it is convenient to introduce the following virtual queue equation

 Q(f)n[t+1]= Q(f)n[t]−∑l∈O(n)μ(f)l[t]+xf[t]1{n=Src(f)}+∑l∈I(n)μ(f)l[t], (9)

where represents a virtual queue value associated with session at node . At first glance, this model (9) appears to be only an approximation, perhaps even a worse approximation than (7), because it allows the values to be negative. Indeed, we use only as virtual queues to inform the algorithm and do not treat them as actual queues. However, this paper shows that using these virtual queues to choose the decisions ensures not only that the desired constraints (2) are satisfied, but that the resulting decisions create bounded queues in the actual network, where the actual queues evolve according to (8). In short, our algorithm can be faithfully implemented with respect to actual queueing networks, and converges to exact optimality on those networks.

The next lemma shows that if an algorithm can guarantee virtual queues defined in (9) are bounded, then actual physical queues satisfying (8) are also bounded.

###### Lemma 1

Consider a network flow problem described by problem (1)-(6). For all and , let be decisions yielded by a dynamic algorithm. Suppose , , evolve by (7)-(9) with initial conditions . If there exists a constant such that , then

1. for all .

2. for all .

###### Proof:
1. Fix . Define an auxiliary virtual queue that is initialized by and evolves according to (9). It follows that . Since by assumption, we have . This implies that also satisfies:

 ˆQ(f)n[t+1]= max{ˆQ(f)n[t]−∑l∈O(n)μ(f)l[t],0}+xf[t]1{n=Src(f)}+∑l∈I(n)μ(f)l[t],∀t (10)

which is identical to (8) except the inequality is replaced by an equality. Since ; and satisfies (10), by inductions, .

Since and , we have . It follows that .

2. The proof of part (2) is similar and is in Appendix C.

\qed

### Iii-B The New Backpressure Algorithm

In this subsection, we propose a new backpressure algorithm that yields source session rates and link session rates at each slot such that the physical queues for each session at each node are bounded by a constant and the time average utility satisfies

 1tt−1∑τ=0∑f∈FUf(xf[t])≥∑f∈FUf(x∗f)−O(1/t),∀t

where are from the optimal solution to (1)-(6). Note that Jensen’s inequality further implies that

 ∑f∈FUf(1tt−1∑τ=0xf[τ])≥∑f∈FUf(x∗f)−O(1/t),∀t

The new backpressure algorithm is described in Algorithm 1. Similar to existing backpressure algorithms, the updates in Algorithm 1 at each node are fully distributed and only depend on weights at itself and its neighbor nodes. Unlike existing backpressure algorithms, the weights used to update decision variables and are not the virtual queues themselves, rather, they are augmented values equal to the sum of the virtual queues and the amount of net injected data in the previous slot . In addition, the updates involve an additional quadratic term, which is similar to a term used in proximal algorithms [15].

### Iii-C Almost Closed-Form Updates in Algorithm 1

This subsection shows the decisions and in Algorithm 1 have either closed-form solutions or “almost” closed-form solutions at each iteration .

###### Lemma 2

Let denote the solution to (12)-(13).

1. Suppose and is differentiable. Let . If , then ; otherwise is the root to the equation and can be found by a bisection search.

2. Suppose and for some weight . Then:

 ^xf =2αnxf[t−1]−W(f)n[t]4αn+√(W(f)n[t]−2αnxf[t−1])2+8αnwf4αn
###### Proof:

Omitted for brevity. \qed

The problem (14)-(17) can be represented as follows by eliminating , completing the square and replacing maximization with minimization. (Note that .)

 min 12K∑k=1(zk−ak)2 (18) s.t. K∑k=1zk≤b (19) zk≥0,∀k∈{1,2,…,K} (20)
###### Lemma 3

The solution to problem (18)-(20) is given by where can be found either by a bisection search (See Appendix D) or by Algorithm 2 with complexity .

###### Proof:

A similar problem where (19) is replaced with an equality constraint in considered in [20]. The optimal solution to this quadratic program is characterized by its KKT condition and a corresponding algorithm can be developed to obtain its KKT point. A complete proof is presented in Appendix D. \qed

Note that step (3) in Algorithm 2 has complexity and hence the overall complexity of Algorithm 2 is dominated by the sorting step (2) with complexity .

## Iv Performance Analysis of Algorithm 1

### Iv-a Basic Facts from Convex Analysis

###### Definition 1 (Lipschitz Continuity)

Let be a convex set. Function is said to be Lipschitz continuous on with modulus if there exists such that for all .

###### Definition 2 (Strongly Concave Functions)

Let be a convex set. Function is said to be strongly concave on with modulus if there exists a constant such that is concave on .

By the definition of strongly concave functions, it is easy to show that if is concave and , then is strongly concave with modulus for any constant .

###### Lemma 4

Let be a convex set. Let function be strongly concave on with modulus and be a global maximum of on . Then, for all .

### Iv-B Preliminaries

Define column vector . For each , define column vector

 (21)

which is composed by the control actions appearing in each constraint (2); and introduce a function with respect to as

 g(f)n(y(f)n)=xf1{n=Src% (f)}+∑l∈I(n)μ(f)l−∑l∈O(n)μ(f)l (22)

Thus, constraint (2) can be rewritten as

 g(f)n(y(f)n)≤0,∀f∈F,∀n∈N∖{Dst(f)}.

Note that each vector is a subvector of and has length where is the degree of node (the total number of outgoing links and incoming links) if node is the source of session ; and has length if node is not the source of session .

###### Fact 2

Each function defined in (22) is Lipschitz continuous with respect to vector with modulus

 βn≤√dn+1.

where is the degree of node .

###### Proof:

This fact can be easily shown by noting that each is a linear function with respect to vector and has at most non-zero coefficients that are equal to . \qed

Note that virtual queue update equation (9) can be rewritten as:

 Q(f)n[t+1]=Q(f)n[t]+g(f)n(y(f)n[t]), (23)

and weight update equation (11) can be rewritten as:

 W(f)n[t]=Q(f)n[t]+g(f)n(y(f)n[t−1]). (24)

Define

 L(t)=12∑f∈F∑n∈N∖Dst(f)(Q(f)n[t])2 (25)

and call it a Lyapunov function. In the remainder of this paper, double summations are often written compactly as a single summation, e.g.,

 ∑f∈F∑n∈N∖Dst(f)(⋅)Δ=∑f∈F,n∈N∖Dst(f)(⋅).

Define the Lyapunov drift as

 Δ[t]=L(t+1)−L(t).

The following lemma follows directly from equation (23).

###### Lemma 5

At each iteration in Algorithm 1, the Lyapunov drift is given by

 Δ[t]=∑f∈F,n∈N∖Dst(f)(Q(f)n[t]g(f)n(yfn[t])+12(g(f)n(yfn[t]))2). (26)
###### Proof:

Fix and , we have

 12(Q(f)n[t+1])2−12(Q(f)n[t])2 (a)= 12(Q(f)n[t]+g(f)n(y(f)n[t]))2−12(Q(f)n[t])2 = (27)

where (a) follows from (23).

By the definition of , we have

 Δ[t] =12∑f∈F,n∈N∖Dst(f)((Q(f)n[t+1])2−(Q(f)n[t])2) (a)=∑f∈F,n∈N∖Dst(f)(Q(f)n[t]g(f)n(yfn[t])+12(g(f)n(yfn[t]))2)

where (a) follows from (27). \qed

Define . At each time , consider choosing a decision vector that includes elements in each subvector to solve the following problem:

 maxy f(y)−∑f∈F,n∈N∖Dst(f)(W(f)n[t]g(f)n(y(f)n)+αn∥y(f)n−y(f)n[t−1]∥2)−∑f∈F,n=Dst(f)αn∑l∈I(n)(μ(f)l−μ(f)l[t−1])2 (28) s.t. (29)

The expression (28) is a modified drift-plus-penalty expression. Unlike the standard drift-plus-penalty expressions from [4], the above expression uses weights , which arguments each by , rather than virtual queues . It also includes a “prox”-like term that penalizes deviation from the previous vector. This results in the novel backpressure-type algorithm of Algorithm 1. Indeed, the decisions in Algorithm 1 were derived as the solution to the above problem (28)-(29). This is formalized in the next lemma.

###### Lemma 6

At each iteration , the action jointly chosen in Algorithm 1 is the solution to problem (28)-(29).

###### Proof:

The proof involves collecting terms associated with the and decisions. See Appendix E for details. \qed

Furthermore, the next lemma summarizes that the action jointly chosen in Algorithm 1 provides a lower bound for the drift-plus-penalty expression at each iteration .

###### Lemma 7

Let be an optimal solution to problem (1)-(6) given in Fact 1, i.e., . If , where is the degree of node , then the action jointly chosen in Algorithm 1 at each iteration satisfies

 −Δ[t]+f(y[t])≥f(y∗)+Φ[t]−Φ[t−1]

where .

###### Proof:

See Appendix F. \qed

It remains to show that this modified backpressure algorithm leads to fundamentally improved performance.

### Iv-C Utility Optimality Gap Analysis

Define column vector as the stacked vector of all virtual queues defined in (9). Note that (25) can be rewritten as . Define vectorized constraints (2) as .

###### Lemma 8

Let be an optimal solution to problem (1)-(6) given in Fact 1, i.e., . If in Algorithm 1, where is the degree of node , then for all ,

 t−1∑τ=0f(y[τ])≥tf(y∗)−ζ+12∥Q[t]∥2.

where is a constant.

###### Proof:

By Lemma 7, we have . Summing over yields

 t−1∑τ=0f(y[τ])−t−1∑τ=0Δ[τ] ≥ tf(y∗)+t−1∑τ=0(Φ[τ]−Φ[τ−1]) = tf(y∗)+(Φ[t]−Φ[−1]) (a)≥ tf(y∗)−Φ[−1]

where (a) follows from the fact that .

Recall , simplifying summations and rearranging terms yields

 t−1∑τ=0f(y[τ])≥ tf(y∗)−