Efficient Algorithms for the Data Exchange Problem

# Efficient Algorithms for the Data Exchange Problem

Nebojsa Milosavljevic, Sameer Pawar, Salim El Rouayheb, Michael Gastpar and Kannan Ramchandran The material in this paper appears in part in [1, 2, 3].N. Milosavljevic, S. Pawar, M. Gastpar and K. Ramchandran are with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA 94720 USA (e-mail:{nebojsa,spawar, gastpar, kannanr}@eecs.berkeley.edu).S. El Rouayheb is with the Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616 USA (e-mail: salim@iit.edu).M. Gastpar is also with the School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland (e-mail: michael.gastpar@epfl.ch).This research was funded by the NSF grants (CCF-0964018, CCF-0830788), a DTRA grant (HDTRA1-09-1-0032), and in part by an AFOSR grants (FA9550-09-1-0120, FA9550-10-1-0567).
###### Abstract

In this paper we study the data exchange problem where a set of users is interested in gaining access to a common file, but where each has only partial knowledge about it as side-information. Assuming that the file is broken into packets, the side-information considered is in the form of linear combinations of the file packets. Given that the collective information of all the users is sufficient to allow recovery of the entire file, the goal is for each user to gain access to the file while minimizing some communication cost. We assume that users can communicate over a noiseless broadcast channel, and that the communication cost is a sum of each user’s cost function over the number of bits it transmits. For instance, the communication cost could simply be the total number of bits that needs to be transmitted. In the most general case studied in this paper, each user can have any arbitrary convex cost function. We provide deterministic, polynomial-time algorithms (in the number of users and packets) which find an optimal communication scheme that minimizes the communication cost. To further lower the complexity, we also propose a simple randomized algorithm inspired by our deterministic algorithm which is based on a random linear network coding scheme.

## I Introduction

In recent years cellular systems have witnessed significant improvements in terms of data rates, and are nearly approaching the theoretical limits in terms of the physical layer spectral efficiency. At the same time, the rapid growth in the popularity of data-enabled mobile devices, such as smart phones and tablets, and the resulting explosion in demand for more throughput are challenging our abilities to deliver data, even with the current highly efficient cellular systems. One of the major bottlenecks in scaling the throughput with the increasing number of mobile devices is the “last mile” wireless link between the base station and the mobile devices – a resource that is shared among many users served within the cell. This motivates the study of paradigms where cell phone devices can cooperate among themselves to get the desired data in a peer-to-peer fashion without solely relying on the base station.

An example of such a setting is shown in Figure 1, where a base station wants to deliver the same file to multiple geographically-close users over an unreliable wireless downlink. In the example of Figure 1, we assume that the file consists of six equally sized packets , , , , and belonging to some finite field . Suppose that after a few initial transmission attempts by the base station, the three users individually receive only parts of the file (see Figure 1), but collectively have the entire file. Now, if all users are in close vicinity and can communicate with each other, then, it is much more desirable and efficient, in terms of resource usage, to reconcile the file among users by letting all of them “talk” to each other without involving the base station. The cooperation among the users has the following advantages:

• Local communication among users has a smaller footprint in terms of interference, thus allowing one to use the shared resources (code, time or frequency) freely without penalizing the base station’s resources, i.e., higher resource reuse factor.

• Transmissions within the close group of users is much more reliable than from the base station to any terminal due to geographical proximity of terminals.

• This cooperation allows file recovery even when the connection to the base station is either unavailable after the initial phase of transmission, or it is too weak to meet the delay requirement.

Let us consider the example in Figure 1, and let user , user  and user  transmit and symbols in , respectively. It can be shown that the minimum total number of symbols in needed to recover the file is . One possible communication scheme that achieves it is: user  transmits , user  transmits , while user  transmits , , . Note that the load of the communication of the system is unevenly distributed among the users, i.e., user  transmits out of symbols in . The next question we ask here is out of all communication schemes that deliver the entire file to the users in the minimum number of transmissions, which one distributes the load of communication to the users as fair as possible. For instance, for the same minimum number of transmissions, we can have the following scheme: user  transmits , user  transmits , , and user  transmits , . Intuitively, this scheme is more fair111To be precise, the fairness cost that we consider belongs to the broader class of separable convex costs that is studied in this work. than the previous one since it spreads the transmissions more uniformly among the users. And, it can be shown that such scheme minimizes a convex fairness cost.

In the example from Figure 1, we considered only a simple form of side-information, where different users observe subset of uncoded “raw” packets of the original file. Content distribution networks [4, 5, 6] are increasingly using codes, such as linear network codes or Fountain codes [7], to improve the system efficiency. In such scenarios, the side-information representing the partial knowledge gained by the users would be coded and in the form of linear combinations of the original file packets, rather than the raw packets themselves. We refer to this model of side-information as a linear packet model.

### Contributions

In this paper, we study the data exchange problem under the linear packet model and the separable convex communication cost. Such cost captures all the communication objectives discussed earlier: 1. Minimization of the (weighted) sum of bits users need to exchange, 2. Fairness. Our contributions can be summarized as follows:

1. We propose a deterministic polynomial time algorithm for finding an optimal communication scheme w.r.t. the communication cost. An important step of this algorithm is to iteratively determine how much should each user transmit in an optimal scheme. We provide two methods to solve this problem. The first one is based on minimizing a submodular function, in which case the total complexity of the algorithm is , where is the total number of users, and is the number of packets in the file. The second technique is based on subgradient methods, in which case the total complexity of the algorithm can be bounded by given that we use constant step size in the subgradient algorithm.

2. We devise a randomized algorithm inspired by the deterministic scheme that reduces complexity to . The randomized algorithm is based on a random linear network coding scheme, and it achieves the optimal number of transmissions with high probability. To be more precise, the probability of not achieving the optimum is inversely proportional to the underlying field size . Our randomized algorithm can be regarded as a generalization of the algorithm proposed in [8], where the authors considered linear communication cost.

3. For the data exchange problem with additional capacity constraints on each user, we provide both deterministic and randomized algorithm of the same complexity as in 1. and 2.

The challenging part of the deterministic algorithm is that the underlying optimization problem has exponential number of constraints coming from the cut-set bound region. By using combinatorial optimization techniques such as Dilworth truncation and Edmonds’ algorithm, we devise an efficient, polynomial time solution.

### Literature Overview

The problem of reconciling a file among multiple wireless users having parts of it while minimizing the cost in terms of the total number of bits exchanged is known in the literature as the data exchange problem and was introduced by El Rouayheb et al. in [9]. A closely related problem was also studied by Csiszár and Narayan in [10] where all users want to agree on a secret key in the presence of an eavesdropper who observes the entire communication. A randomized algorithm for the data exchange problem was proposed in [11], while Tajbakhsh et al. [12] formulated this problem as a linear program (LP). The solution proposed in [12] is approximate.

The linear cost data exchange problem was studied by Ozgul et al. [8], where the authors proposed a randomized algorithm. A deterministic polynomial time algorithm was proposed by Courtade and Wesel in [13] concurrently to the authors’ work [2]. Minimum linear communication cost problem was also studied in the network coding literature. Lun et al. [14] proposed a polynomial time algorithm for the single source multicast problem over a directed acyclic graph.

In [15, 16], the authors considered a different version of the data exchange problem where users can only broadcast messages to their immediate neighbors. In [15] it was shown that the problem is NP-hard, while an approximate solution is provided in [16]. In [17], Lucani et al. considered the problem of data exchange when the channel between different users can have erasures.

In [10], the authors posed a related security problem referred to as the “multi-terminal key agreement” problem. They showed that obtaining the file among the users in minimum number of bits exchanged over the public channel is sufficient to maximize the size of the secret key shared between the users. This result establishes a connection between the Multi-party key agreement and the data exchange problem.

The rest of the paper is organized as follows. In Section II, we describe the model and formulate the optimization problem. In Section III, we provide a polynomial time algorithm that solves for how many symbols in should each user transmit. We start Section III by analyzing a linear cost function, and then we extend our solution to any separable convex cost. In Section IV, we propose a polynomial time code construction. In Section V, we describe an algorithm based on random linear network coding approach, that achieves the optimal communication cost. In Section VI, we present a polynomial time solution to the problem where each user additionally has capacity constraints, i.e., user  is not allowed to transmit more than symbols in . We conclude our work in Section VII.

## Ii System Model and Preliminaries

In this paper, we consider a setup with users that are interested in gaining access to a file. The file is broken into linearly independent packets each belonging to a field , where is a power of some prime number. Each user observes some collection of the linear combinations of the file packets as shown below.

 xi=Aiw, i∈M, (1)

where is a given matrix, and is a vector of the file packets. In the further text, we refer to (1) as a linear packet model.

Let us denote by , a transmission of user . In [10] it was shown that in order for each user to recover the file, interaction among them is not needed. Hence, without loss of generality, we can assume that is a function of user ’s initial observation. We define

 Ri≜|vi|q (2)

to be the size of user ’s transmission represented in number of symbols in . To decode the file, user  collects transmissions of all the users and creates a decoding function

 ψi:Fℓiq×FR1q×⋯×FRmq→FNq, (3)

that reconstructs the file, i.e.,

 ψi(xi,v1,…,vm)=w. (4)
###### Definition 1.

A rate vector is an achievable data exchange (DE) rate vector if there exists a communication scheme with transmitted messages that satisfies (4) for all .

###### Remark 1.

Using cut-set bounds, it follows that all the achievable DE-rate vectors necessarily belong to the following region

 R≜{R∈Rm:R(S)≥N−rank(AM∖S), ∀S⊂M}, (5)

where

 R(S)≜∑i∈SRi,  % and  AM∖S≜⋃i∈M∖SAi.
###### Theorem 1.

For a sufficiently large field size , any integer DE-rate vector that belongs to the cut-set region , can be achieved via linear network coding, i.e., it is sufficient for each user to transmit properly chosen linear combinations of the data packets it observes.

Proof of Theorem 1 is provided in Appendix A. In Section IV we show that any field size larger than the number of users is sufficient to guarantee the existence of such solution. In general, finding the minimum field size can be a hard problem.

In order for each user to recover the entire file, it is necessary to receive a sufficient number of linear combinations of the other users’ observations. Hence, , , defined above is a vector of symbols in . Therefore, can be written as follows

 (6)

where is an transmission matrix with elements belonging to . In order for each user to recover the file, the transmission matrices , should satisfy,

 rank([AiU])=N,    ∀i∈M, (7)

where . Hence, the decoding function of user  involves inverting the matrix given in (7) in order to obtain .

In this work, we design a polynomial complexity scheme that achieves the file exchange among all the users while simultaneously minimizing a convex separable cost function , where , is a non-decreasing convex function. Such assumption on monotonicity of function is consistent with the nature of the problem at hand; sending more bits is always more expensive than sending fewer. From (5) and the above mentioned cost function, the problem considered in this work can be formulated as the following optimization problem:

 minR∈Zmm∑i=1φi(Ri), (8) s.t.  R(S)≥N−rank(AM∖S), ∀S⊂M.

Optimization problem (8) is a convex integer problem with constraints. It was shown in [18] that only n of these constraints are active but the challenge is how to determine which of them are. Solving the optimization problem (8) answers the question of how many symbols in each user has to transmit in an optimal scheme. In this paper we provide a polynomial time algorithm that solves problem (8). Once we obtain an optimal rate allocation, the actual transmissions of each user can be solved in polynomial time by using the algebraic network coding framework [19], [20]. This is explained in Section IV.

## Iii Deterministic Algorithm

Our goal is to solve problem (8) efficiently. To do so, we will split it into two subproblems:

1. Given a total budget constraint , i.e., , determine whether is feasible or not. If is feasible, find the feasible rate split among the users that will achieve the total budget and minimize the cost .

2. Find that minimizes the objective function.

The bottleneck here is how to solve Problem 1 efficiently. The optimal value of can then be found using binary search (see Algorithm 3) since the objective function is w.r.t. . First, let us identify these two problems by rewriting problem (8) as follows

 minβ∈Z+h(β), (9)

where

 h(β)≜minR∈Zmm∑i=1φi(Ri), (10) s.t.  R(M)=β,  R(S)≥N−rank(AM∖S), ∀S⊂M.

Note that the optimizations (9) and (10) are associated with Problem 2 and Problem 1 defined above, respectively. Next we will explain our approach to solving these two problems.

### Iii-a Optimization with a given sum-rate budget β

Now, let us focus on the set of constraints of optimization problem (10). By substituting with , we obtain

 R(M) =β, R(M∖S) =R(M)−R(S)=β−R(S) ≥N−rank(AS), ∀S⊂M, S≠∅. (11)

Therefore, optimization problem (10) can be equivalently represented as follows

 h(β)=minR∈Zmm∑i=1φi(Ri), (12) s.t.  R(M)=β, R(S)≤β−N+rank(AS), ∀S⊂M, S≠∅.

Before we go any further, let us introduce some concepts from combinatorial optimization theory.

###### Definition 2 (Polyhedron).

Let be a set function defined over set , i.e., , where is the power set of . Then the polyhedron and the base polyhedron of are defined as follows.

 P(fβ) ≜{R∈Zm | R(S)≤fβ(S), ∀S⊆M}, (13) B(fβ) ≜{R∈P(fβ) | R(M)=fβ(M)}. (14)

Note that the set of constraints of problem (12), for any fixed , constitutes the base polyhedron of the set function

 fβ(S)=⎧⎨⎩β−N+rank(AS)if S⊂M, S≠∅βif S=M,0if S=∅. (15)
###### Example 1.

Let us consider the source model from Figure 1, where the three users observe the following parts of the file :

 x1 =[w1w2]T, x2 =[w2w4w5w6]T, x3 =[w3w4w5w6]T. (16)

For , the base polyhedron is defined by the following set of inequalities:

 R1≤f4({1})=0,  R2≤f4({2})=2,  R3≤f4({3})=2, R1+R2≤f4({1,2})=3,  R1+R3≤f4({1,3})=4, R2+R3≤f4({2,3})=3, R1+R2+R3≤f4({1,2,3})=4. (17)

It can be verified that no rate vector exists such that . Therefore, . On the other hand, for , the polyhedron is defined as follows

 R1≤f5({1})=1,  R2≤f5({2})=3,  R3≤f5({3})=3, R1+R2≤f5({1,2})=4,  R1+R3≤f5({1,3})=5, R2+R3≤f5({2,3})=4, R1+R2+R3≤f5({1,2,3})=5. (18)

It can be easily verified that the rate vector , , belongs to the polyhedron . Therefore, .

Summarizing the discussion so far, the optimization problem (12) is equivalent to

 minR∈Zmm∑i=1φi(Ri),  s.t.  R∈B(fβ), (19)

where is defined in (15). For now, let us assume that parameter is chosen such that the optimization problem (19) is feasible, i.e., . We will explain later how the condition can be efficiently verified.

The main idea behind solving the optimization problem in (19) efficiently, is to utilize the combinatorial properties of the set function .

###### Definition 3.

We say that a set function is intersecting submodular if

 f(S)+f(T)≥f(S∪T)+f(S∩T), (20)

When the inequality conditions in (20) are satisfied for all sets , the function is fully submodular.

###### Lemma 1.

The function is intersecting submodular for any . When , is fully submodular.

Proof of Lemma 1 is provided in Appendix C.

###### Theorem 2 (Dilworth Truncation [21]).

For every intersecting submodular function there exists a fully submodular function such that both functions have the same polyhedron, i.e., , and can be expressed as

 gβ(S)=minP{∑V∈Pfβ(V):P is a partition of S}. (21)

The function is called the Dilworth truncation of .

The base polyhedron of any fully submodular function always exists, i.e., there exists a rate vector such that . Since, , it follows that whenever , i.e., when which implies feasibility of the optimization problem (19).

Continuing with Example 2, the Dilworth truncation of the set function is given by

 g4({1})=0, g4({2})=2, g4({3})=2, g4({1,2})=2, g4({1,3})=2, g4({2,3})=3, g4({1,2,3})=3. (22)

Note that , and hence, is not a feasible sum-rate for the problem (19). On the other hand, for , Dilworth truncation of a set function is given by

 g5({1})=1, g5({2})=3, g5({3})=3, g5({1,2})=4, g5({1,3})=4, g5({2,3})=4, g5({1,2,3})=5. (23)

Now, which indicates that is a feasible sum-rate for the problem (19). Hence, the optimization problem (19) can be written as

 minR∈Zmm∑i=1φi(Ri),  s.t.,  R∈B(gβ) (24)

provided that .

###### Remark 2.

Parameter is feasible w.r.t. the problem (19) if . Otherwise, . This is the direct consequence of the Dilworth truncation (21).

Depending upon the cost function , in the sequel, we provide several algorithms that can efficiently solve problem (9). First, we analyze a special case when the cost function is linear,

 φi(Ri)=αiRi,  αi>0,  ∀i∈M. (25)

The condition , ensures that is a non-decreasing function.

### Iii-B Linear Cost - Edmonds’ Algorithm

When the cost function is linear, the optimization problem (24) has the following form

 minR∈Zmm∑i=1αiRi,  s.t.,  R∈B(gβ). (26)

Due to the submodularity of function , the optimization problem (26) can be solved analytically using Edmonds’ greedy algorithm [22] (see Algorithm 1).

The greediness of this algorithm is reflected in the fact that each update of the rate vector is sum-rate optimal:

 R∗j(1) =gβ({j(1)}) R∗j(1)+R∗j(2) =gβ({j(1),j(2)}) ⋮ m∑i=1R∗j(i) =gβ({j(1),…,j(m)}). (27)

In other words, at each iteration, the individual user’s rate update reaches the boundary of polyhedron . Optimality of this approach is the direct consequence of submodularity of function  [22].

###### Remark 3.

The optimal rate vector belongs to the base polyhedron . In other words,

 m∑i=1R∗i=gβ(M). (28)
###### Remark 4.

The complexity of Edmonds’ algorithm is , where is the complexity of computing function for any given set .

###### Example 2.

Let us consider the same source model as in Example 2, and let the cost function be , and . The intersecting submodular function , and its Dilworth truncation are given in (18) and (23), respectively. The rate vector is updated in an increasing order w.r.t. the weight vector. In this case, the order is (see Figure 2).

The main problem in executing Edmonds’ algorithm efficiently is that the function is not available analytically. To compute this function for any given set we need to solve minimization problem (21). Such minimization has to be performed over all partitions of the set , which annuls the efficiency of the proposed method.

To overcome this problem note that we have access to the function (see (15)), and by Theorem 2, we know that . As pointed out before, each rate update reaches the boundary of polyhedron (see (27)). Since we don’t explicitly have function , this polyhedron boundary can be calculated by applying the Dilworth truncation formula (21). For the three-user problem in Example 2 this procedure would go as follows:

 R∗1=f5({1})=1, R∗3=min{f5({1,3})−R∗1,f5({3})}=3, R∗2=min{f5({1,2,3})−R∗1−R∗3,f5({1,2})−R∗1, f5({2,3})−R∗3,f5({2})}=1.

Generalization of this procedure to an arbitrary number of users is shown in Algorithm 2. We refer the interested reader to references [21] and [23] where this algorithm is explained in more details for an arbitrary intersecting submodular functions.

In each iteration , the minimization problem (29) is over all subsets of . Using the fact that all the subsets considered in (29) contain a common element it is easy to see that is fully submodular over the domain set . Now the polynomial time solution of Algorithm 2 follows from the fact that minimization of a fully submodular function can be done in polynomial time [24].

###### Remark 5.

The complexity of Algorithm 2 is , where is the complexity of minimizing submodular function. The best known algorithm to our knowledge is proposed by Orlin in [24], and has complexity , where is complexity of computing the submodular function. For the submodular function defined in (29), equals to the complexity of computing rank, and it is a function of the file size . When users observe linear combinations of the file packets, the rank over can be computed by Gaussian elimination in time. For the “raw” packet model, rank computation reduces to counting distinct packets, and therefore its complexity is .

###### Remark 6.

From Remark 2 and the fact that Edmonds’ algorithm provides a rate vector with sum-rate , it immediately follows that if Algorithm 2 outputs a rate vector such that , then , and such is not a feasible sum-rate w.r.t. the problem (19). Hence, for any given , the feasibility of such sum-rate can be verified in time.

### Iii-C Finding the optimal value of β

So far we have shown how to compute function defined in (12) for any when . To complete our solution, i.e., to solve the problem defined in (9), it remains to show how to minimize function efficiently.

###### Theorem 3.

Function , defined in (10), is convex when is a feasible sum-rate w.r.t. the optimization problem (10).

Proof of Theorem 3 is provided in Appendix B.

In order to minimize function , first, we identify the set of sum-rates that are feasible w.r.t. the problem (9). More precisely, we need to find the minimum sum-rate, since every that is larger than or equal to such value is feasible as well. Hence, we proceed by analyzing the sum-rate objective, i.e., when .

For any fixed parameter , Algorithm 2 provides an optimal rate allocation w.r.t. the linear cost. It is only left to find that minimizes in (9). Let us first consider the sum-rate cost, i.e., . From the equivalence of the Algorithms 1 and  2, and from Remark 3 it follows that for any given parameter , the output rate vector of Algorithm 2 satisfies

 m∑i=1R∗i=gβ(M). (30)

Thus, for a randomly chosen parameter we can verify whether it is feasible w.r.t. the problem (12) by applying Remark 2, i.e., if , then such sum-rate can be achieved. Therefore, we can apply a simple binary search algorithm to find the minimum sum-rate. Note that the minimum sum-rate is always less than or equal to the file size . Hence, we can confine our search accordingly (see Algorithm 3).

###### Remark 7.

The complexity of Algorithm 3 is .

For the general linear cost function , by Theorem 3, is convex for greater than the minimum sum-rate (obtained from Algorithm 3). In Section III-E, Lemma 5, we show that the search space for that minimizes function can be limited to the file size . Hence, in order to solve the minimization problem (9) we can apply a simple binary search algorithm that finds the minimum of by looking for a slope change in function .

###### Remark 8.

Since for any fixed , can be found by using Algorithm 2, and can be found by applying Algorithm 3, the complexity of Algorithm 4 is .

### Iii-D Using Subgradient Methods to Solve Step 4 of Algorithm 2

In this section we propose an alternative solution to the minimization problem (29) in Algorithm 2 that does not involve minimization of a submodular function. The underlying linear optimization problem has the following form

 minR∈Zmm∑i=1αiRi,  s.t. R∈B(fβ), (31)

given that is a feasible sum-rate. Without loss of generality, let us assume that . In this case, the minimization in Step 3 of Algorithm 2 can be written as

 R∗i=minS{fβ(S)−R∗(S):i∈S, S⊆{1,2,…,i}},  i=1,2,…,m. (32)

Minimization (32) can be interpreted as a maximal update along the coordinate such that still belongs to polyhedron . This problem can be separately formulated as the following minimization problem

 R∗i=maxR∈RiRi, (33) s.t. Rk≥R∗k, k=1,2,…,i−1, R(S∪{i})≤fβ(S∪{i}),  ∀S⊆{1,2,…,i−1}.

Note that in an optimal solution, the condition , , holds with equality because any possible increase of can lead to the smaller value of . Moreover, since the above minimization is over an integer submodular polyhedron, the optimal solution is also an integer number. Therefore, minimization problems (33) and (32) are equivalent.

Let us denote by the rate region that corresponds to the optimization problem (33).

 R(i)={R∈Ri | R(S∪{i})≤fβ(S∪{i}), ∀S⊆{1,2,…,i−1}}. (34)

To solve optimization problem (33), we apply the dual subgradient method. First, the Lagrangian function of the problem (33) is

 L(R,λ)=Ri+i−1∑k=1λk(Rk−R∗k),  R∈R(i), (35)

where , . Then, the dual function equals to

 δ(λ) =maxR∈R(i)L(R,λ) =maxR∈R(i){Ri+i−1∑k=1λkRk}−i−1∑k=1λkR∗k. (36)

Due to the maximization step in (36) over multiple hyper-planes, it immediately follows that is a convex function. By the weak duality theorem [25],

 δ(λ)≥R∗i,  ∀λk≥0, k=1,2,…,i−1. (37)

Hence,

 minλ{δ(λ) | λk≥0, k=1,2,…,i−1}≥R∗i (38)

Since optimization problem (33) is linear, there is no duality gap, i.e.,

 R∗i=minλ{δ(λ) | λk≥0, k=1,2,…,i−1}. (39)

To solve optimization problem (39), we apply the dual subgradient method [26] as follows. Starting with a feasible iterate , , w.r.t. the optimization problem (39), and the step size , every subsequent iterate for all , can be recursively computed as follows

 λk[j+1]={λk[j]−θj(~Rk[j]−R∗k)}+, (40)

where is an optimal solution to the problem

 maxR∈R(i)Ri+i−1∑k=1λk[j]Rk. (41)

Note that , , is a derivative of the dual function .

###### Lemma 2.

An optimal solution to the problem (41) can be obtained as follows. Let be an ordering of such that . Then,

 ~Ri[j] ={fβ({i}),if λt(1)≤1,0,otherwise. ~Rt(k) =fβ(St(k)∪{i})−k−1∑u=1~Rt(u)[j]−~Ri[j], (42)

for , where .

Proof of this Lemma is provided in Appendix D

###### Remark 9.

The complexity of the algorithm proposed by Lemma 2 is .

The reason why we apply subgradient methods instead of a gradient descent is because function even though convex, is not differentiable. From Lemma 2, it follows that for a given , there may be more than one maximizer of the problem (41). Due to possibility of having more than one direction along which we can update vector according to (40), subgradient method is not technically a descent method; the function value may often increase in the consecutive steps. For that reason, at each step we keep track of the smallest solution up to that point in time

 ~λ[j]=argmin{δ(λ[0]),δ(λ[1]),…,δ(λ[j])}. (43)

Before we go any further, note that the primal optimization problem (33) is over real vectors. However, the minimization (31) is an integer optimization problem. As pointed out above, the optimal solution of the problem (33) is equal to the solution of the problem (31). Therefore, we can choose the number of iterations of the dual subgradient method such that we get “close enough” to an integer solution. In other words,

 ∣∣δ(~λ[l])−R∗i∣∣≤ε, (44)

where . Then,

 R∗i=round(δ(~λ[l])). (45)

### Convergence Analysis

In this section we explore the relationship between the number of iterations of the dual subgradient method , and the step size , such that it is guaranteed that (45) provides the optimal solution.

###### Lemma 3.

Let be an optimal vector that minimizes the dual function . Then,

 δ(~λ[l−1])−δ(λ∗) ≤(∑i−1k=1λk[0])2+(∑i−1k=1λ∗k)2+2N2∑l−1j=0θ2j2∑l−1j=0θj. (46)

Proof of Lemma 3 can be derived from the notes on subgradient methods presented in [26]. For the sake of completeness, we provide its entire proof in Appendix E.

Since by Lemma 3, can be an arbitrary minimizer of the dual function , let us choose that can be bounded as suggested by the following lemma.

###### Lemma 4.

There exists an optimal solution to the problem (39) that satisfies

 i−1∑k=1λ∗k≤m. (47)

Proof of this Lemma is provided in Appendix F. Initial feasible can be chosen as follows

 λk[0]=0,  ∀k∈{1,2,…,i−1}. (48)

Combining (46), (47) and (49), we obtain