Worst-case Throughput Analysis for Parametric Rate and Parametric Actor Execution Time Scenario-Aware Dataflow Graphs

# Worst-case Throughput Analysis for Parametric Rate and Parametric Actor Execution Time Scenario-Aware Dataflow Graphs

## Abstract

Scenario-aware dataflow (SADF) is a prominent tool for modeling and analysis of dynamic embedded dataflow applications. In SADF the application is represented as a finite collection of synchronous dataflow (SDF) graphs, each of which represents one possible application behaviour or scenario. A finite state machine (FSM) specifies the possible orders of scenario occurrences. The SADF model renders the tightest possible performance guarantees, but is limited by its finiteness. This means that from a practical point of view, it can only handle dynamic dataflow applications that are characterized by a reasonably sized set of possible behaviours or scenarios. In this paper we remove this limitation for a class of SADF graphs by means of SADF model parametrization in terms of graph port rates and actor execution times. First, we formally define the semantics of the model relevant for throughput analysis based on (max,+) linear system theory and (max,+) automata. Second, by generalizing some of the existing results, we give the algorithms for worst-case throughput analysis of parametric rate and parametric actor execution time acyclic SADF graphs with a fully connected, possibly infinite state transition system. Third, we demonstrate our approach on a few realistic applications from digital signal processing (DSP) domain mapped onto an embedded multi-processor architecture.

## 1 Introduction

Synchronous dataflow (SDF) [20] was introduced as a restriction of Kahn process networks (KPN) [19] to allow compile-time scheduling. The term synchronous means static or regular. Synchronous dataflow graphs (SDFGs) are directed graphs where nodes are called actors and edges are called channels. The numbers of data samples produced or consumed are known at compile time. We refer to these data samples as tokens and to the token production and consumption numbers as rates. Although SDF is very fitted to model regular streaming applications, it is due to its static nature, very lacking in its ability to capture the dynamic behaviour of modern streaming applications. Therefore, a notable number of SDF extensions has been proposed over the years. Cyclo-static dataflow (CSDF) [7] allows token production and consumption to vary between actor firings as long as the variation forms a certain type of a periodic pattern, while models such as parametrized synchronous dataflow (PSDF) [6], variable-rate dataflow (VRDF) [24], variable-rate phased dataflow (VPDF) [24] and schedulable parametric dataflow (SPDF) [11] introduce parametric rates. Scenario-aware dataflow (SADF) [23] encodes the dynamism of an application by identifying a finite number of different behaviours called modes or scenarios. Each of the modes is represented by a single synchronous dataflow graph. The modes or scenarios can occur in known or unknown sequences. A finite state machine (FSM) is used to encode occurrence patterns. SADF is equiped with a technique that yields the tightest possible performance guarantees [13]. The power of this technique lies in its ability to consider transitions over all possible scenario sequences as given by the FSM. Considering only the worst-case scenario, i.e. the scenario with the lowest throughput, without considering scenario transitions could be too optimistic. On the other hand, merging all application SDFGs into one SDFG where an actor takes the worst-case execution time over all SDFGs in SADF would be too pessimistic. This is due to the fact that subsequent iterations belonging to different scenarios may overlap in time, i.e. execute in a pipelined fashion. However, SADF is limited by its finiteness. It can only handle a reasonably sized set of application scenarios.

To illustrate this, let us define an abstract parallel application consisting of a nested for loop with parametric affine loop bounds:

ProcessData.A(out g, out h);

for (i=0; i<=g; i++){
for (j=0; j<=h; j++){
// Perform two tasks in parallel
#region ParallelTasks
// Perform two tasks in parallel
Parallel.Invoke(() =>
{
ProcessData.B(i,j);
}, // close first parallel action
() =>
{
ProcessData.C(i,j);
} // close second parallel action
); // close Parallel.Invoke
#endregion

ProcessData.D(i,j);
}
}


The example application consists of 4 subtasks: ProcessData.A, ProcessData.B, ProcessData.C and ProcessData.D with known worst-case execution times. Data parallelism is elegantly specified using the Parallel.Invoke construct. Inside the Parallel.Invoke construct, an Action delegate is passed for each item of work. The application is mapped onto a multi-processor platform. The task assignment employed is purely static. In order to add complexity, we assume that the application executes in a pipelined fashion, i.e. more instances of the application can be active at the same time. Such an assumption introduces resource dependencies over subsequent activations of the application. In other words, a subtask of the activation of the application might have to wait for a certain subtask of the activation to complete and release the corresponding processing element. As specified by the example code, and can take different values during each application execution, i.e. they are data-dependant and are the result of input data processing performed by the subtask ProcessData.A. Let us assume we know that can take the value from the interval and can take the value from the interval . In that case, from a pure timing perspective, this application will exhibit as many behaviours as there are integer points in the rational 2-polytope given by the set of constraints . For and , to be able to use SADF to derive the tightest worst-case performance bounds, even for such a simple application executing in a pipelined fashion on a multi-processor platform, we would have to generate SDFGs [9]. The situation gets even worse when dealing with platforms that support dynamic voltage and frequency scaling (DVFS), which is a commonly used technique that adapts both voltage and frequency of the system in respect to changing workloads [21]. In this case also the execution times of the application subtasks would vary depending on the current DVFS setting of the processing element they are mapped to.

In our work, we will remove these limitations which hamper the use of SADF in important application domains. For this purpose, we will add parametrization to the basic SADF modeling approach both in terms of parametric rates and parametric actor execution times given over a parameter space, which is a totally non-trivial extension because the current core of the SADF framework relies strongly on the constant nature of the rates and actor execution times. We raise the problem of SADF parametrization in the scope of existing parametric dataflow models. PSDF [6] and SPDF [11] are two semantically very similar models that provide a high level of generalization. We prefer SPDF due to syntactical convenience. By incorporating SPDF semantics into the definition of our parametric rate and parametric actor execution time SADF (PSADF), we show that the SPDF model can at run-time be treated as a special case of a SADF. We then derive a technique for worst-case throughput analysis for PSADF. We demonstrate our approach on a few realistic applications from the digital signal processing (DSP) domain.

## 2 Related Work

Throughput analysis of SDFGs is studied by many authors. Reference [16] gives a good overview of the existing methods. Due to the static nature of SDF, these methods cannot be applied to any form of parametric dataflow. [15] presents three methods for throughput computation for an SDFG where actor execution times can be parameters. However, the technique does not consider parametric rates and can only handle the static case, i.e. the graph cannot change parameter values during its execution. [13] introduces the (max,+) semantics for the SADF model relevant for worst-case performance analysis, but is, as previously mentioned, practically limited to a reasonably sized set of scenarios. The most closely related work to ours can be found in [10]. It combines the approaches presented in [13] and [15] and yields a technique that finds throughput expressions for an SADFG where actors can have parameters as their execution times. However, the (max,+) semantics introduced in [10] can consider only parametric actor execution times and not parametric rates. A straightforward extension of [10] to cover the case of parametric rates is not possible because it is not clear how to symbolically execute the graph in the presence of parametric rates. In the scope of rate parametric dataflow models [6][11], little attention has been given to the aspect of time. Two examples of parametric models that explicitly deal with time are VRDF [24] and VPDF [24]. These address the problem of buffer capacity computation under a throughput constraint, but both have a structural constraint that each production of tokens must be matched by exactly one consumption of tokens. That drastically limits the scope of applications it can consider.

So, the current approaches in throughput analysis for dataflow MoCs either cannot consider parametric rates [16][15][13][10], or impose too hard structural constraints that severely limit the expressivity of the model [24]. In our work we will remove these limitations by embedding the SPDF model [11] which provides a high level of generalization into the SADF model [23][13].

## 3 Preliminaries

### 3.1 Synchronous Dataflow Graphs

SDFG is a directed graph where nodes represent actors which in turn represent functions or tasks, while edges represent their dependencies. We also refer to edges as channels. Execution of an actor is denoted as firing and it is assigned with a time duration. In SDF, the number of tokens consumed and produced by an actor is constant for each firing. We refer to these numbers as rates. Actors communicate using tokens sent over channels from one actor to another. Fig. a shows an example of an SDFG with 5 actors () and 9 channels (). Some channels might contain initial tokens, depicted with solid dots. The example graph contains 5 initial tokens that are labeled . Each actor is assigned with a firing time duration, denoted in the actor node, below the actor name, e.g. actor has a firing duration of time-units. Each port is assigned with a rate. When the value is omitted, it means that the value equals to . As rates in SDF are constant for each firing, it is possible to construct a finite schedule (if it exists) that can be periodically repeated [20]. Such a schedule assures liveness and boundedness [20]. We call such minimal sequence of firings an iteration of the SDFG. This is a sequence of firings that has no net effect on the token distribution in the graph. The numbers of firings of each actor within an iteration constitute the repetition vector of an SDFG. We only consider dataflow graphs that are bounded and live. Throughput is considered in terms of the number of iterations per time-unit, i.e. the number of iterations executed in one period normalized by the repetition vector divided by the duration of the period [16]. It is natural to do so, because an iteration represents a coherent set of calculations, e.g. decoding of a video frame. For more details we refer to [20][16].

### 3.2 (max,+) Algebra for SDFGs

Let , for . By max-algebra we understand the analogue of linear algebra developed for the pair of operations extended to matrices and vectors [5]. Let denote the vector of production times of tokens that exist in their different channels in between iterations, i.e. it has an entry for each initial token in the graph. Then denotes the vector of production times of initial tokens after iterations of the graph. These vectors then can be found using (max,+) algebra [5]. The evolution of the graph is then given by the following equation: , where is a (max,+) characteristic matrix of the graph. Entry specifies the minimal elapsed time from the production time of the token in the previous iteration to the production time of the token in the current iteration. When the token is not dependent on the token, then . The specification of the algorithm for obtaining can be found in [14]. The (max,+) characteristic matrix for the example SDFG in Fig. a takes the form:

 G=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣29−∞−∞29−∞334−∞33−∞63−∞3063−∞−∞−∞−∞−∞06453164−∞⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

For example, can be calculated as below:

 →γ1=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣29−∞−∞29−∞334−∞33−∞63−∞3063−∞−∞−∞−∞−∞06453164−∞⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣00000⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣max(29+0,29+0)max(33+0,4+0,33+0)max(63+0,30+0,63+0)max(0+0)max(64+0,5+0,31+0,64+0)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣293363064⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

Paper [13] explains how to obtain the throughput of an SDFG from the matrix . Briefly, matrix defines a corresponding (max,+) automaton graph (MPAG) [12]. MPAG has as many nodes as there are initial tokens in the graph. An edge with the weight is created from the node to the if . The maximum cycle mean (MCM) of the MPAG identifies the critical cycle of the SDFG. The critical cycle limits the throughput of the SDFG which takes the value . MPAG of the example SDFG graph is displayed in Fig. b. The cycle with weights (denoted with bold arrows) determines the throughput which takes the value of iterations per time-unit.

### 3.3 Scenario-Aware Dataflow Graphs (SADFG)

SADF models the dynamism of an application in terms of modes or scenarios. Every scenario is modeled by an SDFG, while the occurence patterns of scenarios are given by an FSM. We give the following definition of an SADFG.

###### Definition 1.

A Scenario-aware dataflow graph (SADFG) is a tuple
, where:

• is a set of ordered pairs of scenarios and their corresponding SDFGs;

• is the scenario finite state machine consisting of a finite set of states, an initial state , a transition relation , a scenario labelling and a set of final states , where .

Fig. a shows an example SADFG with two scenarios, and . In this example both scenarios use the same scenario graph, but the actor execution times differ. For example, actor has a firing duration of time-units in scenario and time units in scenario . The scenario FSM is fully connected and thus allowing arbitrary scenario order.

Every finite path of arbitrary length over the FSM corresponds to a sequence with . When the FSM performs a transition, the SDFG graph associated with the destination state is executed for exactly one iteration. Let denote the (max,+) characteristic matrix for the scenario , where is the number of initial tokens in the SADFG. Then the completion time of a -long sequence of scenarios can then be defined as a sequence of (max,+) matrix multiplications , where specifies the initial enabling times of the graph’s initial tokens and usually . The worst case increase of for a growing length of specifies the worst-case throughput for any sequence of scenarios [12] [13]. Reference [13] explains how to build the MPAG of an SADFG. Again, the inverse of the MCM () of the obtained MPAG denotes the worst-case throughput of that particular SADFG. A special case that arises in practice, which will be of the utmost importance in our SADF parametrization, is when scenarios can occur in arbitrary order, yielding the SADF FSM to be fully connected and with a single state for each scenario. In that case, the throughput of an SADFG equals to the maximum cycle mean of the MPAG that corresponds to the (max,+) matrix [13]. The operator denotes taking the maximum of the elements of the individual scenario matrices. The corresponding scenario matrices for the example SADFG in Fig. a are:

 G(a)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣29−∞−∞29−∞334−∞33−∞63−∞3063−∞−∞−∞−∞−∞06453164−∞⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦G(b)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣28−∞−∞28−∞346−∞34−∞72−∞2472−∞−∞−∞−∞−∞082163482−∞⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

The critical cycle of the corresponding MPAG obtained from the maximized matrix , is denoted by bold arrows in Fig. b. Throughput in this case equals iterations per time-unit. This example also demonstrates that the worst-case throughput value cannot simply be obtained by only considering the ‘worst-case’ scenario, or by analysing the graph where each actor takes its worst-case execution time over all scenarios.

## 4 Parametric Rate and Actor Execution Time SADF Analysis

We start this section by formally defining the PSADF model and showing the (max,+) equivalence between SADF and PSADF. We use this result in defining the PSADF worst-case throughput calculation problem as a constrained optimization problem over the PSADF graph (PSADFG) parameter space, where the objective functions are elements of the symbolic PSADFG (max,+) characteristic matrix. We conclude by giving the theoretical foundation and the algorithm for symbolic PSADFG (max,+) characteristic matrix extraction.

### 4.1 Motivation and Model Definition

SADF becomes impractical or even infeasible when it faces applications with a vast set of possible behaviours. We overcome this limitation by parametrization. The problem of parametrization of a dataflow model in terms of rates is not an easy task as it raises questions about properties like liveness, boundedness and schedulability. A naive approach in just declaring any rate of interest as parametric, could render the graph to deadlock, be unbounded or unschedulable. Therefore we start from SPDF [11]. The liveness and boundedness properties for SPDF are decidable. SPDF extends SDF by allowing rates to be parametric while preserving static schedulability. Rates are products of static natural numbers and/or parameters that can change dynamically. The changes of each parameter are made by a single actor called its modifier each time it fires using ’’ annotation. We re-define SPDF [11] by adding the notion of time of SDF/SADF to it.

###### Definition 2.

A schedulable parametric dataflow graph (SPDFG) is a tuple , where:

• is a directed connected graph with set of actors and set of edges (channels);

• is a set of rate parameters (symbolic variables) used to define SPDF rates by the grammar , where , ;

• is a set of actor execution time parameters (symbolic variables) used to define SPDF actor execution times by the grammar , where , ;

• returns for each edge channel its number of initial tokens;

• returns for each port (represented by an actor and one of its edges) its rate;

• returns for each actor its execution time;

• and returns for each rate parameter its modifier and its change period.

We consider only live SPDFGs as defined in [11]. We allow parameters (rates and actor execution times) to change in between iterations. The introduction of parametric actor execution times to SPDF does not influence the liveness property. We define actor execution times as linear combinations of parameters. This gives us the ability to encode dependence, e.g. in case two actors are mapped onto the same processor, the ratio of their execution times will always be constant within an iteration.

Fig. a shows an example of a SPDF graph where actors have parametric () or constant rates and parametric execution times (). Parametric rates and are modified by the actor every time it fires, while the parametric rate is modified by the actor every time it fires.

Now we can define our parametric SADF model, by subjecting SPDF to the operational semantics of SADF.

###### Definition 3.

A parametric rate and parametric actor execution time SADFG (PSADFG) is a tuple , where:

• is a live SPDFG;

• is a bounded and closed set of all allowed parameter values (rates and actor execution times) for or shortly the parameter space;

• is the scenario state transition system consisting of a possibly infinite set of states, an initial state , a transition relation and a scenario labelling .

In contrast to SADF, which explicitly defines scenarios as a finite collection of SDF graphs, in PSADF scenarios are implicitly defined over the bounded and closed vector parameter space . Elements of are vectors . Let be the PSADF (max,+) characteristic matrix for the parameter space point , where is the number of initial tokens in PSADFG. The operational semantics of the model is as follows: every finite path of arbitrary length over the scenario transition system corresponds to a sequence with . This is a sequence of parameters space points, i.e. . The evaluation of the PSADFG’s SPDFG at a parameter space point is nothing else but an SDFG. The characteristic (max,+) matrix of this SDFG equals to (evaluation at a concrete ). When the scenario state transition system performs a transition, the SDFG obtained by the evaluation of the PSADFG at that exact point is executed for exactly one iteration. Given previous reasoning, the analogy to SADF is obvious. We can say that PSADF is a compact representation of SADF. From the performance analysis perspective, by using the provision of an infinite (max,+) automaton [12] we can define the completion time of a -long sequence of parameter point activations as a sequence of (max,+) matrix multiplications as it is done in [13] for SADF. The worst case increase of for a growing length of represents the worst-case throughput for any sequence of parameters points allowed by the scenario transition system.

As already mentioned, PSADF is a compact representation of SADF. We use it to model the behaviour of applications characterized by vast number of scenarios where it is impossible to determine the scenario occurrence pattern even if such exists. Therefore, in terms of PSADF we will be considering the case of a fully connected scenario state transition system, i.e. , and where every state of the transition system corresponds to one parameter space point, i.e. there is a bijective mapping . This way we will always be able to give a conservative bound on the worst-case throughput. This is due to the simple fact that the language recognized by an arbitrary PSADF is always included in the language recognized by the PSADF where and there exists a bijection .

###### Proposition 1.

The worst-case throughput of a PSADFG for which and for which exists a bijective mapping equals to the inverse of the maximum cycle mean of the MPAG defined by the matrix .

###### Proof.

Given the operational semantics of PSADF previously described and the fact that is bounded and closed, it follows straightforwardly from [13][12]. ∎

### 4.2 Worst-Case Throughput Analysis

#### Problem Definition.

Given as a matrix of continuous function over the closed and bounded parameter space that possesses an appropriate mathematical formulation, e.g. as equalities and inequalities over a certain -dimensional vector space, using Proposition 1, our worst-case throughput calculation problem becomes a set of maximally () constrained optimization problems with as the objective function(s) and as the constraint set:

 maximize→p gij(→p) subject to →p∈Ω.

A continuous function over a bounded and closed set admits a maximum. Of course, the term continuous includes also discrete functions that are continuous in the Heine sense. After maximizing all the element functions of , the worst-case throughput will equal to the MCM of the MPAG given by the maximized PSADFG (max,+) characteristic matrix. Our main challenge is thus to derive a technique for the analytical formulation of the symbolic PSADFG (max,+) characteristic matrix . is a matrix of functions that in the (max,+) sense encodes the time distances between initial tokens in adjacent iterations of a PSADFG. We will show that this is a matrix of polynomial functions of . Polynomial functions are continuous. Then the problem can be solved as a polynomial programming problem over . There exists a variety of techniques for solving such problems depending on the ‘shape’ of . Do note here that these optimization problems are solved independently as we are interested in the worst-case increase of for a growing length of (over a growing number of iterations).

#### (max,+) Algebra for PSADF.

In PSADF we only allow parameters to change between graph iterations, i.e for parametric rates in the context of SPDF. The same goes for parametric actor execution times. Currently, our extraction technique requires that the considered PSADFG is ‘acyclic within an iteration’. If we take a PSADFG and convert it to a directed acyclic graph (PSADFG-DAG) by removing the edges with initial tokens, we require that only the PSADFG-DAG sink actors can produce tokens on the removed edges, and only the PSADFG-DAG source actors can consume from those edges. We do not include self-edges in this restriction. That is to say that we only allow cyclic dependencies tied to one actor. However, we can still consider PSADFGs that are serial compositions of subgraphs that are ‘acyclic within an iteration’ if the subgraph performs only one iteration during an iteration of the composite PSADFG. Our extraction process will depend on the PSADFG quasi-static schedule which can be obtained using the procedure from [11]. Basically, the PSADFG-DAG is sorted topologically. Result of the topological sorting is a string of actors. For PSADFG in Fig. a this string equals to . Now we replace every actor with , where is the PSADFG repetition vector entry for actor . For PSADFG in Fig. a the final quasi-static schedule takes the form .

We continue by giving an appropriate (max,+) model of the PSADF actor as displayed in Fig. b. First let us briefly explain the (max,+) semantics of a dataflow actor firing. If is the set of tokens needed by an actor to perform its firing and for every , is the time that token becomes available, then the starting time of the actor firing is given by . If is the execution time of that actor then the tokens produced by the actor firing become available at . Now, let be the completion time of the firing of actor . This annotation is present in Fig. a for each of the actors. In order for an actor to fire, it must have all its input dependencies satisfied. We can now derive the expression for :

 γ(Ai,k)=⎛⎝⨁Ah∣(Ah,Ai)∈Eγ(Ah,⌈r(Ai,(Ah,Ai))k−i(Ah,Ai)r(Ah,(Ah,Ai))⌉)⎞⎠⊗e(Ai). (1)

The completion time of the firing of actor corresponds to the maximal completion times of appropriately indexed firings of actors that feed its input edges increased by its own execution time . The quotient is used to index the appropriate firing of the actors that feed its input edges. The member in the nominator of the fraction accounts for initial tokens. Initial tokens have the semantics of the initial delay and form the initial conditions used to solve (max,+) difference equations, analogue to the initial conditions in classical linear difference (recurrence) equations. We comply with the liveness criteria from [11] which among others requires that all SPDFG cycles are live, i.e. within a cycle there is an edge with initial tokens to fire the actor the needed number of times to complete an iteration, either a global one or a local one. Liveness and the ‘acyclic within an iteration’ restriction render (1) solvable and we can always obtain a solution for (1) in terms of initial conditions. The analytical solution of a system of such (max,+) linear difference equations evaluated at the iteration boundary for every actor of the graph will exactly give us the needed symbolic PSADFG characteristic (max,+) matrix. We follow the order of actors from the quasi-static schedule. This guarantees that we respect data/resource dependencies. Element tells us that we have to solve (1) for actor at . The obtained solution is propagated to the next iteration of the algorithm. We continue until we reach the end of the quasi-static schedule. At this point we will obtain solutions for all actors in terms of dependence of their completion times at the iteration boundary on initial conditions. From these solutions we can then easily construct the symbolic PSADFG characteristic (max,+) matrix.

Let us consider the PSADFG example in Fig. a. We write down (max,+) equations for each actor (we omit the sign , i.e. will be denoted as ):

 γ(A,k)=(γ(A,k−1)⊕γ(E,k−2))a=aγ(A,k−1)⊕aγ(E,k−2), (2)
 γ(B,k)=bγ(A,⌈kp⌉), (3)
 γ(C,k)=(γ(B,⌈kq⌉)⊕γ(C,k−1))c=cγ(B,⌈kq⌉)⊕cγ(C,k−1), (4)
 γ(D,k)=(γ(A,⌈ks⌉)⊕γ(D,k−1))d=dγ(A,⌈ks⌉)⊕dγ(D,k−1), (5)
 γ(E,k)=(γ(C,pqk)⊕γ(D,sk))e=eγ(C,pqk)⊕eγ(D,sk). (6)

The initial conditions are:

 γ(A,0)=t1,γ(D,0)=t2,γ(C,0)=t3,γ(E,−1)=t4,γ(E,0)=t5. (7)

We can now evaluate and solve them at an iteration boundary given by the sequential schedule . Firing actor using (2) with we obtain:

 γ(A,1)=aγ(A,0)⊕aγ(E,−1)=at1⊕at4. (8)

Firing using (3) with and using (8) we obtain:

 γ(B,p)=abt1⊕abt4. (9)

Firing using (4) with and (9) we obtain (backward substitution):

 γ(C,pq)=abct1⊕abct4⊕cγ(C,pq−1)=abcpqt1⊕cpqt3⊕abcpqt4. (10)

Firing using (5) with similarly evaluates to:

 γ(D,s)=adst1⊕dst2⊕adst4. (11)

Firing using (6) with and (10) (11) we obtain:

 γ(E,1)=aet1(bcpq⊕ds)⊕dset2⊕cpqet3⊕aet4(bcpq⊕ds). (12)

In (12) initial conditions and are (max,+) multiplied by a symbolic (max,+) summation term . We refer to this situation as a conflict. The production time of the tokens generated by actor will depend on the relationship between and . Before proceeding, we have to consider two cases. One given by and the other by . We must check the intersection of newly added constraints and the already existing ones to reason against feasibility. If there are no feasible points in one of the subregions, we drop the further evaluation within the same subregion. In this example let us assume that both subregions contain feasible points. We easily construct the symbolic matrices from the solutions that are all expressed in terms of their dependence on initial conditions at an iteration boundary. We write down once more the solutions of the equations at the iteration boundary for actors that reproduce the initial tokens. Those are actors . We will change the notation from to depending on the indexes of initial conditions (tokens) and the producing actor. We obtain for :

 t′1 =at1⊕at4, (13) t′2 =adst1⊕dst2⊕adst4, (14) t′3 =abcpqt1⊕cpqt3⊕abcpqt4, (15) t′4 =t5, (16) t′5 =abcpqet1⊕dset2⊕cpqet3⊕abcpqet4. (17)

From (13)-(17) we then easily obtain the rows of the symbolic (max,+) matrix:

 G(b+pqc≥sd)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣a−∞−∞a−∞a+sdsd−∞a+sd−∞a+b+pqc−∞pqca+b+pqc−∞−∞−∞−∞−∞0a+b+pqc+esd+epqc+ea+b+pqc+e−∞⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

The same procedure is used for the case. The evolution of the PSADF graph is then governed by the following equations over the parameter space : and , depending in which region of is the iteration scheduled. If , any of the two can be chosen. In the definition of both regions we use the and operators to have them remain closed. The functions that constitute the symbolic (max,+) matrices are polynomial functions of .

In order to obtain the worst case throughput we will have to solve a mixed-integer polynomial programming problem for and over and , respectively. A collection of techniques that solve such problems for a variety of definitions of , e.g. convex, non-convex or restricted to take only a few discrete values, can be found in [22]. The matrix will define the MPAG of the example PSADFG. The inverse of the MCM of this MPAG equals to the worst-case throughput.

At this point we present our recursive algorithm for symbolic PSADF (max,+) characteristic matrix extraction (Algorithm 1).

The inputs to the algorithm are the pre-computed sequential quasi-static schedule , the set of PSADF (max,+) difference equations , the initial parameter space and the initial solution set . The solution set is a set of ordered pairs , where is the symbolic (max,+) matrix that governs the evolution of the PSADF in the region generated by adding conflict resolving constraints to during the execution of the algorithm. Algorithm traverses the sequential schedule taking one actor with its repetition count at a time (Line 3). Function Solve (Line 5) solves Equation (1) for the considered actor. If there are no conflicts in the solution, the algorithm updates the equation set with the current solution that can be used in later iterations (Line 18). If there are conflicts, i.e. there are terms multiplying the initial conditions, we have to split the parameter space (Line 8). For example, if the term is multiplying an initial condition, we have to consider three cases: , and . Function FeasibilityCheck (Line 9) checks the emptiness of the intersection of the current constraint set and the new constraints. If the intersection is non-empty, new constraints are added to the current set for this branch of exploration (Line 11), conflicts are resolved (Line 13) and SymbolicExtract is recursively called again (Line 14). If the intersection is non-feasible, this branch is dropped. If we continue in this fashion we will eventually reach a non-branching node (Line 22).

We demonstrate our approach on the example PSADF graph in Fig. a. The example models a dynamic streaming application consisting of loops with interdependent parametric affine loop bounds. We define the ranges for parametric loop bounds (PSADF rates) as: and . We also define linear dependencies between them: and . Our application is run on a multi-processor platform where each loop body (actor) is mapped onto a different processor. Let PSADF actor execution times take the values of their nominal execution times multiplied by the parameter to account for six different possible platform dynamic voltage and frequency scaling (DVFS) settings. We obtain: . These constraints define for our example. To obtain the worst-case throughput value we must maximize the matrices and over as given by the previously listed constraints. These become two mixed integer polynomial programming problems over and and can be solved using the technique from [22]. Throughput is given by the inverse of the MCM of the MPAG defined by the matrix and equals to iterations per time-unit.

## 5 Experimental results

We demonstrate our throughput analysis technique on five representative DSP applications with parametric interdependent affine loop bounds listed in Table 1. The first column shows the number of PSADFG actors, the second denotes the number of initial tokens, the third shows the number of parametric rates, the fourth gives the number of parametric actor execution times and the last shows the number of scenarios as the number of points in the PSADFG parameter space . All applications, except the bounded block parallel lattice reduction algorithm for MIMO-OFDM [4], are mapped onto a two-processor scalar architecture. The latter is mapped onto a vector/SIMD architecture. To obtain the nominal actor execution times for our benchmark set, we used the AVR32 [2] simulator under a reference frequency of 32 . For bounded block parallel lattice reduction algorithm [4] we used random numbers for nominal actor execution times, as the source code of the algorithm is not publicly available. We assume that the frequency of each platform processor can be placed inside the range from 32 to 64 , with the step of 1 . For a 2 processor platform this will give 32 possible combinations. In contrast to the conventional SADF approach from [13] which would have to generate SDFGs, our approach in each of these cases will solve maximally () polynomial programming problems without the need for the enumeration of which is a difficulty by itself. Actually, in practice this number is usually less than (), because not all initial tokens depend on all other initial tokens in the graph rendering the matrices to be quite sparse. Moreover, sometimes the entries in the symbolic PSADF (max,+) characteristic matrix are repetitive, so we only have to solve the corresponding problem once. The symbolic PSADF (max,+) characteristic matrices of the benchmark applications were extracted manually using Algorithm 1, while the corresponding optimization problems were solved using CVX, a package for specifying and solving convex programs [18][17].

## 6 Conclusion

In this paper we have presented an extension to SADF that allows to model applications with vast or infinite sets of behaviours. We refer to our model as PSADF. We have proven the semantical equivalence of the two models and used that result in the formulation of worst-case throughput calculation problem for PSADF graphs with a fully connected state transition system within a generic optimization framework. The objective functions are functionals that represent the elements of the symbolic PSADF (max,+) characteristic matrices. Furthermore, we have derived a (max,+) linear theory based algorithm that is able to generate these matrices by combining a (max,+) difference equation solver and a recursive parameter space exploration for a subclass of PSADF graphs that are ‘acyclic within an iteration’. As future work, we want to fully automate our technique and investigate the problem of parametric throughput analysis of PSADF graphs.

### References

1. Atmel AVR. Available at http://www.atmel.com/images/doc32000.pdf.
2. ICST Signal Processing Library Ver. 1.2. Available at http://www.icst.net/research/projects/dsp-library/.
3. U. Ahmad, Min Li, S. Pollin, R. Fasthuber, L. Van der Perre & F. Catthoor (2010): Bounded Block Parallel Lattice Reduction algorithm for MIMO-OFDM and its application in LTE MIMO receiver. In: Signal Processing Systems (SIPS), 2010 IEEE Workshop on, pp. 168–173, doi:http://dx.doi.org/10.1109/SIPS.2010.5624784.
4. François Baccelli, Guy Cohen, Geert Jan Olsder & Jean-Pierre Quadrat (1992): Synchronization and linearity: an algebra for discrete event systems. John Wiley & Sons, Inc.
5. B. Bhattacharya & S.S. Bhattacharyya (2000): Parameterized dataflow modeling of DSP systems. In: Acoustics, Speech, and Signal Processing, 2000. ICASSP ’00. Proceedings. 2000 IEEE International Conference on, 6, pp. 3362–3365 vol.6, doi:http://dx.doi.org/10.1109/ICASSP.2000.860121.
6. G. Bilsen, M. Engels, R. Lauwereins & J. Peperstraete (1996): Cycle-static dataflow. Signal Processing, IEEE Transactions on 44(2), pp. 397–408, doi:http://dx.doi.org/10.1109/78.485935.
7. Rulph Chassaing (1999): Digital Signal Processing: Laboratory Experiments Using C and the TMS320C31 DSK, 1st edition. John Wiley & Sons, Inc., New York, NY, USA.
8. Philippe Clauss & Vincent Loechner (1998): Parametric Analysis of Polyhedral Iteration Spaces. Journal of VLSI signal processing systems for signal, image and video technology 19(2), pp. 179–194, doi:http://dx.doi.org/10.1023/A:1008069920230.
9. M. Damavandpeyma, S. Stuijk, M. Geilen, T. Basten & H. Corporaal (2012): Parametric throughput analysis of scenario-aware dataflow graphs. In: Computer Design (ICCD), 2012 IEEE 30th International Conference on, pp. 219–226, doi:http://dx.doi.org/10.1109/ICCD.2012.6378644.
10. P. Fradet, A. Girault & P. Poplavko (2012): SPDF: A schedulable parametric data-flow MoC. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pp. 769–774, doi:http://dx.doi.org/10.1109/DATE.2012.6176572.
11. S. Gaubert (1995): Performance evaluation of (max,+) automata. Automatic Control, IEEE Transactions on 40(12), pp. 2014–2025, doi:http://dx.doi.org/10.1109/9.478227.
12. M. Geilen & S. Stuijk (2010): Worst-case performance analysis of Synchronous Dataflow scenarios. In: Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on, pp. 125–134.
13. Marc Geilen (2011): Synchronous Dataflow Scenarios. ACM Trans. Embed. Comput. Syst. 10(2), pp. 16:1–16:31, doi:http://dx.doi.org/10.1145/1880050.1880052.
14. A.-H. Ghamarian, M. C W Geilen, T. Basten & S. Stuijk (2008): Parametric Throughput Analysis of Synchronous Data Flow Graphs. In: Design, Automation and Test in Europe, 2008. DATE ’08, pp. 116–121, doi:http://dx.doi.org/10.1109/DATE.2008.4484672.
15. A.-H. Ghamarian, M. C W Geilen, S. Stuijk, T. Basten, A. J M Moonen, M.J.G. Bekooij, B.D. Theelen & M.R. Mousavi (2006): Throughput Analysis of Synchronous Data Flow Graphs. In: Application of Concurrency to System Design, 2006. ACSD 2006. Sixth International Conference on, pp. 25–36, doi:http://dx.doi.org/10.1109/ACSD.2006.33.
16. Michael Grant & Stephen Boyd (2008): Graph Implementations for Nonsmooth Convex Programs. In: Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences 371, Springer London, pp. 95–110, doi:http://dx.doi.org/10.1007/978-1-84800-155-8˙7.
17. Michael Grant & Stephen Boyd (2013): CVX: Matlab Software for Disciplined Convex Programming, version 2.0 beta. Available at http://cvxr.com/cvx.
18. Gilles Kahn (1974): The Semantics of Simple Language for Parallel Programming. In: IFIP Congress, pp. 471–475.
19. E.A. Lee & D.G. Messerschmitt (1987): Synchronous data flow. Proceedings of the IEEE 75(9), pp. 1235–1245, doi:http://dx.doi.org/10.1109/PROC.1987.13876.
20. P. Macken, M. Degrauwe, M. Van Paemel & H. Oguey (1990): A voltage reduction technique for digital systems. In: Solid-State Circuits Conference, 1990. Digest of Technical Papers. 37th ISSCC., 1990 IEEE International, pp. 238–239, doi:http://dx.doi.org/10.1109/ISSCC.1990.110213.
21. Hanif D. Sherali & W.P. Adams (1998): A Reformulation-Linearization Technique for Solving Discrete and Continuous Nonconvex Problems.
22. B.D. Theelen, M. C W Geilen, T. Basten, J. P M Voeten, S. V. Gheorghita & S. Stuijk (2006): A scenario-aware data flow model for combined long-run average and worst-case performance analysis. In: Formal Methods and Models for Co-Design, 2006. MEMOCODE ’06. Proceedings. Fourth ACM and IEEE International Conference on, pp. 185–194, doi:http://dx.doi.org/10.1109/MEMCOD.2006.1695924.
23. Maarten Hendrik Wiggers (2009): Aperiodic Multiprocessor Scheduling for Real-Time Stream Processing Applications. Ph.d. dissertation, doi:http://dx.doi.org/10.3990/1.9789036528504.
Comments 0
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

Loading ...
123018

You are asking your first question!
How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description