Non-clairvoyant Precedence Constrained Scheduling1footnote 11footnote 1This research was supported in part by NSF awards CCF-1536002, CCF-1540541, and CCF-1617790, and the Indo-US Joint Center for Algorithms Under Uncertainty. Sahil Singla was supported in part by the Schmidt Foundation.

# Non-clairvoyant Precedence Constrained Scheduling111This research was supported in part by NSF awards CCF-1536002, CCF-1540541, and CCF-1617790, and the Indo-US Joint Center for Algorithms Under Uncertainty. Sahil Singla was supported in part by the Schmidt Foundation.

Naveen Garg (naveen@cse.iitd.ac.in) Computer Science and Engineering Department, Indian Institute of Technology, Delhi.    Anupam Gupta (anupamg@cmu.edu) Computer Science Department, Carnegie Mellon University.    Amit Kumar (amitk@cse.iitd.ac.in) Computer Science and Engineering Department, Indian Institute of Technology, Delhi.    Sahil Singla (singla@cs.princeton.edu) Department of Computer Science at Princeton University and School of Mathematics at Institute for Advanced Study, Princeton.
July 3, 2019
###### Abstract

We consider the online problem of scheduling jobs on identical machines, where jobs have precedence constraints. We are interested in the demanding setting where the jobs sizes are not known up-front, but are revealed only upon completion (the non-clairvoyant setting). Such precedence-constrained scheduling problems routinely arise in map-reduce and large-scale optimization. In this paper, we make progress on this problem. For the objective of total weighted completion time, we give a constant-competitive algorithm. And for total weighted flow-time, we give an -competitive algorithm under -speed augmentation and a natural “no-surprises” assumption on release dates of jobs (which we show is necessary in this context).

Our algorithm proceeds by assigning virtual rates to all the waiting jobs, including the ones which are dependent on other uncompleted jobs, and then use these virtual rates to decide on the actual rates of minimal jobs (i.e., jobs which do not have dependencies and hence are eligible to run). Interestingly, the virtual rates are obtained by allocating time in a fair manner, using a Eisenberg-Gale-type convex program (which we can also solve optimally using a primal-dual scheme). The optimality condition of this convex program allows us to show dual-fitting proofs more easily, without having to guess and hand-craft the duals. We feel that this idea of using fair virtual rates should have broader applicability in scheduling problems.

\setitemize

itemsep=2pt,topsep=0pt,parsep=0pt

## 1 Introduction

We consider the problem of online scheduling of jobs under precedence constraints. We seek to minimize the average weighted flow time of the jobs on multiple parallel machines, in the online non-clairvoyant setting. Formally, there are identical machines, each capable of one unit of processing per unit of time. A set of jobs arrive online. Each job has a processing requirement and a weight , and is released at some time . If the job finishes at time , its flow or response time is defined to be . The goal is to give a preemptive schedule that minimizes the total (or, equivalently, the average) weighted flow-time . The main constraints of our model are the following: (i) the scheduling is done online, so the scheduler does not know of the jobs before they are released; (ii) the scheduler is non-clairvoyant—when a job arrives, the scheduler knows its weight but not its processing time . (It is only when the job finishes its processing that the scheduler knows the job is done, and hence knows .); And (iii) there are precedence constraints between jobs given by a partial order : means job cannot be started until is finished. Naturally, the partial order should respect release dates: if then . (We will require a stronger assumption for some of our results.)

This model for constrained parallelism is a natural one, both in theory and in practice. In theory, this precedence-constrained (and non-clairvoyant!) scheduling model (with other objective functions) goes back to Graham’s work on list scheduling [Gra66]. In practice, most languages and libraries produce parallel code that can be modeled using precedence DAGs [RS08, ALLM16, GKR16]. Often these jobs (i.e., units of processing) are distributed among some workstations or servers, either in server farms or on the cloud, i.e., they use identical parallel machines.

### 1.1 Our Results and Techniques

Weighted Completion Time. We develop our techniques on the problem of minimizing the average weighted completion time . Our convex-programming approach gives us:

###### Theorem 1.1.

There is a -competitive deterministic online algorithm for minimizing the average weighted completion time on parallel machines with both release dates and precedences, in the online non-clairvoyant setting.

For this result, at each time , the algorithm has to know only the partial order restricted to , i.e., the jobs released by time . The algorithmic idea is simple in hindsight: the algorithm looks at the minimal unfinished jobs (i.e., they do not depend on any other unfinished jobs): call them . If is the set of (already released and) unfinished jobs at time , then . To figure out how to divide our processing among the jobs in , we write a convex program that fairly divides the time among all jobs in the larger set , such that (a) these jobs can “donate” their allocated time to some preceding jobs in , and that (b) the jobs in do not get more than unit of processing per time-step.

For this fair allocation, we maximize the (weighted) Nash Welfare , where is the virtual rate of processing given to job , regardless of whether it can currently be run (i.e., is in ). This tries to fairly distribute the virtual rates among the jobs [Nas50], and can be solved using an Eisenberg-Gale-type convex program. (We can solve this convex program in our setting using a simple primal-dual algorithm, see §6.) The proof of Theorem 1.1 is via writing a linear-programming relaxation for the weighted completion time problem, and fitting a dual to it. Conveniently, the dual variables for the completion time LP naturally fall out of the dual (KKT) multipliers for the convex program!

Weighted Flow Time. We then turn to the weighted flow-time minimization problem. We first observe that the problem has no competitive algorithm if there are jobs that depend on jobs released before . Indeed, if OPT ever has an empty queue while the algorithm is processing jobs, the adversary could give a stream of tiny new jobs, and we would be sunk. Hence we make an additional no-surprises assumption about our instance: when a job is released, all the jobs having a precedence relationship to are also released at the same time. In other words, the partial order is a collection of disjoint connected DAGs, where all jobs in each connected component have the same release date. A special case of this model has been studied in [RS08, ALLM16] where each DAG is viewed as a “hyper-job” and there are no precedence constraints between different hyper-jobs. In this model, we show:

###### Theorem 1.2.

There is an -competitive deterministic non-clairvoyant online algorithm for the problem of minimizing the average weighted flow time on parallel machines with release dates and precedences, under the no-surprises and -speedup assumptions.

Interestingly, the algorithm for weighted flow-time is almost the same as for weighted completion time. In fact, exactly the same algorithm works for both the completion time and flow time cases, if we allow a speedup of for the latter. To get the -speedup algorithm, we give preference to the recently-arrived jobs, since they have a smaller current time-in-system and each unit of waiting proportionally hurts them more. This is along the lines of strategies like LAPS and WLAPS [EP12].

### 1.2 The Intuition

Consider the case of unit weight jobs on a single machine. Without precedence constraints, the round-robin algorithm, which runs all jobs at the same rate, is -competitive for the flow-time objective with a -speed augmentation. Now consider precedences, and let the partial order be a collection of disjoint chains: only the first remaining job from each chain can be run at each time. We generalize round-robin to this setting by running all minimal jobs simultaneously, but at rates proportional to length of the corresponding chains. We can show this algorithm is also -competitive with a -speed augmentation. While this is easy for chains and trees, let us now consider the case when the partial order is the union of general DAGs, where each DAG may have several minimal jobs. Even though the sum of the rates over all the minimal jobs in any particular DAG should be proportional to the number of jobs in this DAG, running all minimal jobs at equal rates does not work. (Indeed, if many jobs depend on one of these minimal jobs, and many fewer depend on the other minimal jobs in this DAG, we want to prioritize the former.)

Instead, we use a convex program to find rates. Our approach assigns a “virtual rate” to each job in the DAG (regardless of whether it is minimal or not). This virtual rate allows us to ensure that even though this job may not run, it can help some minimal jobs to run at higher rates. This is done by an assignment problem where these virtual rates get translated into actual rates for the minimal jobs. The virtual rates are then calculated using Nash fairness, which gives us max-min properties that are crucial for our analysis.

Analysis Challenges: In typical applications of the dual-fitting technique, the dual variables for each job encode the increase in total flow-time caused by arrival of this job. Using this notion turns out to create problems. Indeed, consider a minimal job of low weight which is running at a high rate (because a large number of jobs depend on it). The increase in overall flow-time because of its arrival is very large. However the dual LP constraints require these dual variables to be bounded by the weights of their jobs, which now becomes difficult to ensure. To avoid this, we define the dual variables directly in terms of the virtual rates of the jobs, given by the convex program.

Having multiple machines instead of a single machine creates new problems. The actual rates assigned to any minimal job cannot exceed , and hence we have to throttle certain actual rates. Again the versatility of the convex program helps us, since we can add this as a constraint. Arguing about the optimal solution to such a convex program requires dealing with the suitable KKT conditions, from which we can infer many useful properties. We also show in §6 that the optimal solution corresponds to a natural “water-filling” based algorithm.

Finally, we obtain matching results for the case of -speed augmentation. Im et al. [IKM18] gave a general-purpose technique to translate a round-robin based algorithm to a LAPS-like algorithm. In our setting, it turns out that the LAPS-like policy needs to be run on the virtual rates of jobs. Analyzing this algorithm does not follow in a black-box manner (as prescribed by [IKM18]), and we need to adapt our dual-fitting analysis suitably.

### 1.3 Related Work and Organization

Completion Time.  Minimizing on parallel machines with precedence constraints has -approximations in the offline setting: Li [Li17] improves on [HSSW97, MQS98] to give a -approximation. For related machines, the precedence constraints make the problem much harder: there is a -approximation [Li17] improving on a prior result [CS99], and a hardness of under certain complexity assumptions [BN15]. In the online setting, any offline algorithm for (a dual problem to) gives an clairvoyant online algorithm, losing factors [HSSW97]. Two caveats: it is unclear (a) how to make this algorithm non-clairvoyant, and (b) how to solve the (dual of the) weighted completion time problem with precedences in poly-time.

Flow Time without Precedence.  To minimize , strong lower bounds are known for the competitive ratio of any online algorithm even on a single machine [MPT94]. Hence we use speed augmentation [KP00]. For the general setting of non-clairvoyant weighted flow-time on unrelated machines, Im et al. [IKMP14] showed that weighted round-robin with a suitable migration policy yields a -competitive algorithm using -speed augmentation. They gave a general purpose technique, based on the LAPS scheduling policy, to convert any such round-robin based algorithm to a -competitive algorithm while losing an extra factor in the competitive ratio. Their analysis also uses a dual-fitting technique [AGK12, GKP12]. However, they do not consider precedence constraints.

Flow Time with Precedence.  Much less is known for flow-time problems with precedence constraints. For the offline setting on identical machines, [KL18] give -approximations with -speedup, even for general delay functions. In the current paper, we achieve a -approximation with -speedup for flow-time. Interestingly, [KL18] show that beating a -approximation for any constant requires a speedup of at least the optimal approximation factor of makespan minimization in the same machine environment. However, this lower bound requires different jobs with a precedence relationship to have different release dates, which is something our model disallows. (Appendix §5 gives another lower bound showing why we disallow such precedences in the online setting.)

In the online setting, [RS08] introduced the DAG model where each job is a directed acyclic graph (of tasks) released at some time, and a job/DAG completes when all the tasks in it are finished, and we want to minimize the total unweighted flow-time. They gave a -speed -competitive algorithm, where is the largest antichain within any job/DAG. [ALLM16] show -competitiveness with -speedup, again in the non-clairvoyant setting. The case where jobs are entire DAGs, and not individual nodes within DAGs, is captured in our weighted model by putting zero weights for all original jobs, and adding a unit-weight zero-sized job for each DAG which now depends on all jobs in the DAG. Assigning arbitrary weights to individual nodes within DAGs makes our problem quite non-trivial—we need to take into account the structure of the DAG to assign rates to jobs. Another model to capture parallelism and precedences uses speedup functions [ECBD97, Edm99, EP12]: relating our model to this setting remains an open question.

Our work is closely related to Im et al. [IKM18] who use a Nash fairness approach for completion-time and flow-time problems with multiple resources. While our approaches are similar, to the best of our understanding their approach does not immediately extend to the setting with precedences. Hence we have to introduce new ideas of using virtual rates (and being fair with respect to them), and throttling the induced actual rates at . The analyses of [IKM18] and our work are both based on dual-fitting; however, we need some new ideas for the setting with precedences.

Organization. The weighted completion time case is solved in §2. A -speedup result for weighted flow-time is in §3; this is improved to a -speedup in §4. The proof that we need the “no-surprises” assumption on release dates is in §5. Finally, we show how to solve the convex program in §6. Some deferred proofs can be found in §7.

## 2 Minimizing Weighted Completion Time

In this section, we describe and analyze the scheduling algorithm for the problem of minimizing weighted completion time on parallel machines. Recall that the precedence constraints are given by a DAG , and each job has a release date , processing size and weight .

### 2.1 The Scheduling Algorithm

We first assume that each of the machines run at rate (i.e., they can perform 2 units of processing in a unit time). We will show later how to remove this assumption (at a constant loss of competitive ratio). We begin with some notation. We say that a job is waiting at time (with respect to a schedule) if , but has not been processed to completion by time . We use to denote the set of waiting jobs at time . Note that at time , the algorithm gets to see the subgraph of which is induced by the jobs in . We say that a job is unfinished at time if it is either waiting at time , or its release date is at least (and hence the algorithm does not even know about this job). Let denote the set of unfinished jobs at time . Clearly, . At time , the algorithm can only process those jobs in which do not have a predecessor in – denote these minimal jobs by : they are independent of all other current jobs. For every time , the scheduling algorithm needs to assign a rate to each job . We now describe how it decides on these rates.

Consider a time . The algorithm considers a bipartite graph with vertex set consisting of the minimal jobs on left and the waiting jobs on right. Since , a job in appears as a vertex on both sides of this bipartite graph. When there is no confusion, we slightly overload terminology by referring to a job as a vertex in . The set of edges are as follows: let be vertices on the left and the right side respectively. Then is an edge in if and only if there is a directed path from to in the DAG .

The following convex program now computes the rate for each vertex in . It has variables for each edge . For each job on the left side, i.e., for , define as the sum of values of edges incident to . Similarly, define for a job , i.e., on the right side. The objective function is the Nash bargaining objective function on the values, which ensures that each waiting job gets some attention. In §6 we give a combinatorial algorithm to efficiently solve this convex program.

 max ∑j∈JtwjlnRtj (CP) Ltj =∑j′∈Jt:(j,j′)∈Etztjj′ ∀j∈It (1) Rtj =∑j′∈It:(j′,j)∈Etztj′j ∀j∈Jt (2) Ltj ≤1 ∀j∈It (3) ∑j∈ItLtj ≤m (4) zte ≥0 ∀e∈Et (5)

Let be an optimal solution to the above convex program. We define the rate of a job as being .

Although we have defined this as a continuous time process, it is easy to check that the rates only change if a new job arrives, or if a job completes processing. Also observe that we have effectively combined the machines into one in this convex program. But assuming that all events happen at integer times, we can translate the rate assignment to an actual schedule as follows. For a time slot , the total rate is at most (using (4)), so we create time slots , one for each machine , and iteratively assign each job an interval of length within these time slots. It is possible that a job may get assigned intervals in two different time slots, but the fact that means it will not be assigned the same time in two different time slots. Further, we will never exceed the slots because of (4). Thus, we can process these jobs in the time slots on the parallel machines such that each job gets processed for amount of time and no job is processed concurrently on multiple machines. This completes the description of the algorithm; in this, we assume that we run the machines at twice the speed. Call this algorithm .

The final algorithm , which is only allowed to run the machines at speed , is obtained by running in the background, and setting to be a slowed-down version of . Formally, if processes a job on machine at time , then processes this at time . This completes the description of the algorithm.

### 2.2 A Time-Indexed LP formulation

We use the dual-fitting approach to analyze the above algorithm. We write a time-indexed linear programming relaxation (LP) for the weighted completion time problem, and use the solutions to the convex program (CP) to obtain feasible primal and dual solutions for (LP) which differ by only a constant factor.

We divide time into integral time slots (assuming all quantities are integers). Therefore, the variable will refer to integer times only. For every job and time , we have a variable which denotes the volume of processed during . Note that this is defined only for . The LP relaxation is as follows:

 min ∑j,twj⋅t⋅xj,tpj (LP) ∑t≥rjxj,tpj ≥1∀j (6) ∑jxj,t ≤m∀t (7) ∑s≤txj,spj ≥∑s≤txj′,spj′∀t,j≺j′ (8)

The following claim, whose proof is deferred to the appendix, shows that it is a valid relaxation.

###### Claim 2.1.

Let opt denote the weighted completion time of an optimal off-line policy (which knows the processing time of all the jobs). Then the optimal value of the LP relaxation is at most opt.

The (LP) has a large integrality gap. Observe that the LP just imagines the machines to be a single machine with speed . Therefore, (LP) has a large integrality gap for two reasons: (i) a job can be processed concurrently on multiple machines, and (ii) suppose we have a long chain of jobs of equal size in the DAG . Then the LP allows us to process all these jobs at the same rate in parallel on multiple machines. We augment the LP lower bound with another quantity and show that the sum of these two lower bounds suffice.

A chain in is a sequence of jobs such that . Define the processing time of , , as the sum of the processing time of jobs in . For a job , define as the maximum over all chains ending in of . It is easy to see that is a lower bound (up to a factor 2) on the objective of an optimal schedule.

We now write down the dual of the LP relaxation above. We have dual variables for every job , and for every time , and

 max∑jαj−m∑tβt (DLP) αj−wj⋅t+∑s≥t(∑j≺j′γs,j→j′−∑j′≺jγs,j′→j) ≤pj⋅βt∀j,t≥rj (9) αj,βt ≥0

We write the dual constraint (9) in a more readable manner. For a job and time , let denote , and define similarly. We now write the dual constraint (9) as

 αj−wj⋅t+∑s≥t(γouts,j−γins,j)≤pj⋅βt∀j,t≥rj (10)

### 2.3 Properties of the Convex Program

We now prove certain properties of an optimal solution to the convex program (CP). The first property, whose proof is deferred to the appendix, is easy to see:

###### Claim 2.2.

If , then for all .

We now write down the KKT conditions for the convex program. (In fact, we can use (1) and (2) to replace and in the objective and the other constraints.) Then letting be the Lagrange multipliers corresponding to constraints (3), (4) and (5), we get

 wj¯Rtj =θtj′+ηt−νte ∀e=(j′,j),j′∈It,j∈Jt (11) θtj(¯Ltj−1) =0 ∀j∈It (12) ηt(∑j∈It¯Ltj−m) =0 (13) νte⋅¯zte =0 ∀e∈Et (14)

We derive a few consequences of these conditions, the proofs are deferred to the appendix.

###### Claim 2.3.

Consider a job on the right side of . Then .

###### Claim 2.4.

Consider a job on the right side of . Suppose has a neighbor such that and . Then .

A crucial notion is that of an active job:

###### Definition 2.5 (Active Jobs).

A job is active at time if it has at least one neighbor in (in the graph ) running at rate strictly less than 1.

Let denote the set of active jobs at time . We can strengthen the above claim as follows.

###### Corollary 2.6.

Consider an active job at time . Then .

.

### 2.4 Analysis via Dual Fitting

We analyze the algorithm first. We define feasible dual variables for (DLP) such that the value of the dual objective function (along with the values that capture the maximum processing time over all chains ending in ) forms a lower bound on the weighted completion time of our algorithm. Intuitively, would be the weighted completion time of , and would be times the total weight of unfinished jobs at time . Thus, would be at times the total weighted completion time. This idea works as long as all the machines are busy at any point of time, the reason being that the primal LP essentially views the machines as a single speed- machine. Therefore, we can generate enough dual lower bound if the rate of processing in each time slot is . If all machines are not busy, we need to appeal to the lower bound given by the values.

We use the notation used in the description of the algorithm. In the graph , we had assigned rates to all the nodes in . Recall that a vertex on the right side of is said to be active at time if it has a neighbor for which . Otherwise, we say that is inactive at time . We say that an edge , where is active at time if the vertex is active. Let denote the set of active edges in . Let be an edge in . By definition, there is a path from to in – we fix such a path . As before, let denote the completion time of job . The dual variables are defined as follows:

• For each job and time , we define quantities . The dual variable would be equal to . Fix a job . If we set to 0. Now, suppose . Consider the job as a vertex in (i.e., right side) in the bipartite graph . We set to if is active at time , otherwise it is inactive.

• For each time , we set to (Recall that is the set of unfinished jobs at time ).

• We now need to define , where . If or does not belong to , we set this variable to 0. So assume that (and so the edge lies in ). We define

 γt,j′→j:=ηt⋅∑e:e∈At,(j′→j)∈Pe¯zte.

In other words, we consider all the active edges in the graph for which the corresponding path contains . We add up the fractional assignment for all such edges.

This completes the description of the dual variables.

We first show that the objective function for (DLP) is close to the weighted completion time incurred by the algorithm. The proof is deferred to the appendix.

###### Claim 2.8.

The total weighted completion time of the jobs in is at most .

We now argue about feasibility of the dual constraint (9). Consider a job and time . Since for all time , . Therefore, it suffices to show:

 ∑s≥tαj,s+∑s≥t(γout% s,j−γins,j)≤pj⋅βt (15)

Let be the first time when the job appears in the set . This would also be the first time when the algorithm starts processing because a job that enters does not leave before completion.

###### Claim 2.9.

For any time lying in the range , .

###### Proof.

Fix such a time . Note that . Thus appears as a vertex on the right side in the bipartite graph , but does not appear on the left side. Let be in active edge in such that the corresponding path contains as an internal vertex. Then gets counted in both and . There cannot be such a path which starts with , because then will need to be on the left side of the bipartite graph. There could be paths which end with – these will correspond to active edges incident with in the graph (this happens only if itself is active). Let denote the edges incident with . We have shown that

 γouts,j−γins,j=−ηt⋅∑e∈Γ(j)∩As~zse. (16)

If is not active, the RHS is 0, and so is . So we are done. Therefore, assume that is active. Now, contains all the edges incident with , and so, the RHS is same as . But then, Corollary 2.6 implies that . Since , we are done again. ∎

Coming back to inequality (15), we can assume that . To see this, suppose . Then by Claim 2.9 the LHS of this constraint is same as

 ∑s≥t⋆jαj,s+∑s≥t⋆j(γouts,j−γins,j).

Since (the set of unfinished jobs can only diminish as time goes on), (15) for time follows from the corresponding statement for time . Therefore, we assume that . We can also assume that , otherwise the LHS of this constraint is 0.

###### Claim 2.10.

Let be such that is inactive at time . Then

###### Proof.

We know that . As in the proof of Claim 2.9, we only need to worry about those active edges in for which either ends at or begins with . Since any edge incident with as a vertex on the right side is inactive, we get (let denote the edges incident with , where we consider on the left side)

 αj,s+γouts,j−γins,j=ηs⋅∑e∈Γ(j)∩A(s)¯zse≤ηs⋅¯Lsj,

because and . ∎

###### Claim 2.11.

Let be such that is active at time . Then

###### Proof.

The argument is very similar to the one in the previous claim. Since is active, . As before we only need to worry about the active edges for which either ends or begins with . Any edge which is incident with on the right side (note that there will only one such edge – one the one joining to its copy on the left side of ) is active. The following inequality now follows as in the proof of Claim 2.10:

 αj,s+γouts,j−γins,j≤wj+ηs⋅¯Lsj−ηs⋅¯Rsj.

The result now follows from Corollary 2.6. ∎

We are now ready to show that (15) holds. The above two claims show that the LHS of (15) is at most Note that for any such time , the rate assigned to is , and so, we perform amount of processing on during this time slot. It follows that . Now Claim 2.7 shows that , and so we get

 Cj∑s=tηs⋅¯Lsj≤pj⋅w(Ut)2m=pj⋅βt.

This shows that (15) is satisfied. We can now prove our algorithm is constant competitive.

###### Theorem 2.12.

The algorithm is 10-competitive.

###### Proof.

We first argue about . We have shown that the dual variables are feasible to (DLP), and so, Claim 2.8 shows that the total completion time of is at most , where opt denotes the optimal off-line objective value. Clearly, and . This implies that is 5-competitive. While going from to the completion time of each job doubles. ∎

## 3 Minimizing Weighted Flow Time

We now consider the setting of minimizing the total weighted flow time, again in the non-clairvoyant setting. The setting is almost the same as in the completion-time case: the major change is that all jobs which depend on each other (i.e., belong to the same DAG in the “collection of DAGs view” have the same release date). In §5 we show that if related jobs can be released over time then no competitive online algorithms are possible.

As before, let denote the jobs which are waiting at time , i.e., which have been released but not yet finished, and let be the union of all the DAGs induced by the jobs in . Again, let denote the minimal set of jobs in , i.e., which do not have a predecessor in and hence can be scheduled.

###### Theorem 3.1.

There exists an -approximation algorithm for non-clairvoyant DAG scheduling to minimize the weighted flow time on parallel machines, when there is a speedup of .

The rest of this section gives the proof of Theorem 3.1. The algorithm remains unchanged from §2 (we do not need the algorithm  now): we write the convex program (CP) as before, which assign rates to each job . The analysis again proceeds by writing a linear programming relaxation, and showing a feasible dual solution. The LP is almost the same as (LP), just the objective is now (with changes in red):

 ∑j,twj⋅\definecolorpgfstrokecolorrgb1,0,0\pgfsys@color@rgb@stroke100\pgfsys@color@rgb@fill100(t−rj)⋅xj,tpj.

Hence, the dual is also almost the same as (DLP): the new dual constraint requires that for every job and time :

 αj+∑s≥t(γouts,j−γins,j)≤βt⋅pj+wj\definecolorpgfstrokecolorrgb1,0,0\pgfsys@color@rgb@stroke100\pgfsys@color@rgb@fill100(t−rj). (17)

### 3.1 Defining the Dual Variables

In order to set the dual variables, define a total order on the jobs as follows: First arrange the DAGs in order of release dates, breaking ties arbitrarily. Let this order be . All jobs in appear before those in in the order . Now for each dag , arrange its jobs in the order they complete processing by our algorithm. Note that this order is consistent with the partial order given by the DAG. This also ensures that at any time , the set of waiting jobs in any DAG form a suffix in this total order (restricted to ).

For a time and , let denote the indicator variable which is 1 exactly if is active at time . The dual variables are defined as follows:

• For a job , we set , where the quantity as defined as:

 αj,t:=1m[wj⋅I[j∈Jactt]⋅(∑j′∈Jt:j′⪯j¯Rtj′)+¯Rtj⋅(∑j′∈Jactt:j′≺jwj′)].
• The variable . Recall that the machines are allowed -speedup.

• The definition of the variables changes as follows. Let be an edge in the DAG . Earlier we had considered paths containing only for the active edges . But now we include all edges. Moreover, we replace the multiplier by , where In other words, we define

 γt,j′→j:=ηtj⋅∑e:e∈Ht,(j′→j)∈Pe¯zte.

In the following sections, we show that these dual settings are enough to “pay for” the flow time of our solution (i.e., have large objective function value), and also give a feasible lower bound (i.e., are feasible for the dual linear program).

### 3.2 The Dual Objective Function

We first show that is close to the total weighted flow-time of the jobs. The quantity is defined as before. Notice that is still a lower bound on the flow-time of job in the optimal schedule because all jobs of a DAG are simultaneously released. The following claim, whose result is deferred to the appendix, shows that the dual objective value is close to the weighted flow time of the algorithm.

###### Claim 3.2.

The total weighted flow-time is at most

### 3.3 Checking Dual Feasibility

Now we need to check the feasibility of the dual constraint (17). In fact, we will show the following weaker version of that constraint:

 αj+\definecolorpgfstrokecolorrgb1,0,0\pgfsys@color@rgb@stroke100\pgfsys@color@rgb@fill1002∑s≥t(γouts,j−γins,j)≤βt⋅pj+\definecolorpgfstrokecolorrgb1,0,0\pgfsys@color@rgb@stroke100\pgfsys@color@rgb@fill1002wj(t−rj). (18)

This suffices to within another factor of : indeed, scaling down the and variables by another factor of then gives dual feasibility, and loses only another factor of in the objective function. We begin by bounding in two different ways.

###### Lemma 3.3.

For any time , we have .

###### Proof.

Consider the second term in the definition of . This term contains . By Corollary 2.6, for any we have . Therefore,

 ∑j′∈Jacts:j′≺jwj′≤ηs⋅∑j′∈Jacts:j′≺j¯Rsj′≤ηs⋅∑j′∈Js¯Rsj′.

Now we can bound by dropping the indicator on the first term to get

 1m⋅[(wj⋅∑j′∈Js:j′⪯j¯Rsj′)+¯Rsj⋅(ηs⋅∑j′∈Jacts:j′≺j¯Rsj′)]

the last inequality using Claim 2.3. Simplifying, . ∎

Here is a slightly different upper bound on .

###### Lemma 3.4.

For any time , we have .

###### Proof.

The second term in the definition of is at most , directly using the definition of . For the first term, assume is active at time , otherwise this term is 0. Now Corollary 2.6 shows that , so the first term can be bounded as follows:

 wjm⋅∑j′∈Js:j′⪯j¯Rsj′=¯Rsj⋅ηsm⋅∑j′∈Js:j′⪯j¯Rsj′\lx@stackrel(Claim~{}???)≤¯Rsjm⋅∑j′∈Js:j′⪯jwj′=¯Rsj⋅ηsj,

which completes the proof. ∎

To prove (18), we write , and use Lemma 3.3 to cancel the first summation with the term . Hence, it remains to prove

 ∑s≥tαj,s+2∑s≥t(γout% s,j−γins,j)≤βt⋅pj. (19)

Let be the time at which the algorithm starts processing . We first argue why we can ignore times on the LHS of (19).

###### Claim 3.5.

Let be a time satisfying . Then

###### Proof.

While computing , we only need to consider paths for edges in which have as end-point. Since does not appear on the left side of , this quantity is equal to . The result now follows from Lemma 3.4. ∎

So using Claim 3.5 in (19), it suffices to show

 ∑s≥max{t,t⋆j}αj,s+2∑s≥max{t,t⋆j}(γouts,j−γins,j)≤βt⋅pj. (20)

Note that we still have on the right hand side, even though the summation on the left is over times . The proof of the following claim is deferred to appendix.

###### Claim 3.6.

Let be a time satisfying . Then

Hence, the left-hand side of (20) is at most . However, since job is assigned a rate of and the machines run at speed , we get that this expression is at most , which is the right-hand side of (20). This proves the feasibility of the dual constraint (18).

###### Proof of Theorem 3.1.

In the preceding §3.3 we proved that the variables , and satisfy the dual constraint for the flow-time relaxation. Since is a feasible dual, it gives a lower bound on the cost of the optimal solution. Moreover, is another lower bound on the cost of the optimal schedule. Now using the bound on the weighted flow-time of our schedule given by Claim 3.2, this shows that we have an -approximation with -speedup. ∎

In §4 we show how to use a slightly different scheduling policy that prioritizes the last arriving jobs to reduce the speedup to .

## 4 An O(1/ε2)-competitive Algorithm with (1+ε)-speed

Theorem 3.1 requires speedup. In this section, we improve the speed scaling requirement to . We prove the following:

###### Theorem 4.1.

There exists an -approximation algorithm for non-clairvoyant DAG scheduling to minimize weighted flow time on parallel machines when there is a speedup of .

For ease of exposition, we assume a -speedup in the proof of Theorem 4.1.

### 4.1 The Algorithm

The algorithm remains unchanged – we shall assign rates to each job . These rates are derived by a suitable convex program. This convex program is again same as (CP), except that the objective function now changes to

 ∑j∈Jt\definecolorpgfstrokecolorrgb1,0,0\pgfsys@color@rgb@stroke100\pgfsys@color@rgb@fill100ˆwj,tln¯Rtj,

where we replace the weight of job by a new time dependent quantity defined as follows.

###### Definition 4.2 (Weight ˆwj,t).

Consider a time , and let