Best-effort networks: modeling and performance analysis via large networks asymptotics
In this paper we introduce a class of Markov models, termed best-effort networks, designed to capture performance indices such as mean transfer times in data networks with best-effort service. We introduce the so-called min bandwidth sharing policy as a conservative approximation to the classical max-min policy. We establish necessary and sufficient ergodicity conditions for best-effort networks under the min policy. We then resort to the mean field technique of statistical physics to analyze network performance deriving fixed point equations for the stationary distribution of large symmetrical best-effort networks. A specific instance of such networks is the star-shaped network which constitutes a plausible model of a network with an overprovisioned backbone. Numerical and analytical study of the equations allows us to state a number of qualitative conclusions on the impact of traffic parameters (link loads) and topology parameters (route lengths) on mean document transfer time.
est-effort service, max-min fairness, min policy, mean field, star-shaped network.
Consider a network handling data flows from several users, and assume no quality of service commitments (such as minimum bandwidth allocations) have been made by the network to the users. Such a situation has been prevalent in the Internet until now, and is likely to remain so for another few years.
The preferred service model in that situation, known as best effort service, consists in allocating a fair proportion of bandwidth to contending users; see, e.g., Bertsekas and Gallager . There are actually several possible notions of fairness available for this bandwidth allocation problem (see, e.g., Mo and Walrand  for a parametric family of fairness criteria covering all other notions proposed so far), although the classical notion proposed in  is the so-called max-min fairness.
Recent work has led to a relatively good understanding of how bandwidth is shared between network users when a given congestion control algorithm is used; see, e.g., Massoulié and Roberts  and references therein. The question of what type of fairness is achieved in the current Internet, where Jacobson’s congestion avoidance algorithm—as implemented in TCP—is responsible for congestion control, has been studied in depth by Hurley et al. . These studies all assume the number of flows remains fixed.
In comparison, there is little work accounting for the random nature of traffic and its impact on user perceived quality of service. Consider for instance the transfer of digital documents (Web pages, files, emails,…) using a transport protocol like TCP. This constitutes the bulk of Internet traffic today. The performance criterion relevant to such transfers is the overall document transfer time. This time is clearly highly dependent on the number of ongoing transfers on the links shared by the considered connection. This number varies as a random process as new connections are established and existing ones terminate in a way which depends on how bandwidth is allocated, as well as on the underlying traffic parameters.
In the case of a single bottleneck resource shared perfectly fairly, simple traffic assumptions of Poisson arrivals and identically and independently distributed document size lead to a processor sharing queueing model . This fluid flow model provides useful results on expected response times as a function of the load of an access link or a Web server, for instance. It also shows how a form of congestion collapse can occur when demand (arrival rate mean document size) exceeds capacity. The processor sharing queue is then no longer ergodic leading to unbounded response times. To understand the impact of multiple bottlenecks and to investigate the effect of different sharing strategies, one would like to dispose of similar analytical results for multiple resource systems.
To the best of our knowledge, the only analytical results available so far are in Massoulié and Roberts , where the so-called linear network topology is investigated. Simulation results for the linear network can be found both in  and in de Veciana et al. . The main motivation for the present paper is to study the performance of best-effort networks with alternative topologies, the ultimate objective being the derivation of heuristics enabling the performance evaluation of bandwidth sharing in a general network.
In the present paper we report the results of our preliminary investigations. These include an analysis of the stability conditions under which the expected response time remains finite in a general network. We also apply mean field techniques to evaluate the performance of large symmetrical networks. Numerical results derived from the model illustrate how response times depend on the number of bottleneck links and their utilization. These results are of some practical interest and aide our understanding of the behavior of best effort networks. A further significant contribution is the insight provided into the inherent difficulty of deriving performance estimates when more than one bottleneck limits throughput.
Section 2 introduces a general class of Markov models for best-effort networks which is intended to capture the impact of network topology, traffic parameters and bandwidth sharing (fairness) criteria on document transfer times. A brief account of the results obtained in  is given, and the so-called “min” bandwidth allocation is introduced as a conservative approximation to max-min fairness. Section 3 then establishes the necessary and sufficient ergodicity criteria for best-effort networks under the “min” policy. Section 4 introduces the so-called “star topology”. Its relevance as a model of real networks is discussed and a mean field heuristic is proposed. This heuristic is expected to be accurate in the asymptotic regime where the number of star branches is large. The derived fixed point equations are investigated numerically in Section 5. Simulation is used to verify the accuracy of the heuristics. Extensions to the star-shaped network are also considered in Sections 4 and 5, which notably allow an evaluation of the impact of the number of bottlenecks on the mean transfer time.
2 Best-effort networks
Consider the following network model: a set of links is given, where each link has an associated capacity or bandwidth . A set of routes is given, each route being identified with a subset of links. Fig. 1 illustrates the so-called linear network: it consists of links with equal capacity, route which crosses each link, and routes which cross a single link.
To each route are associated two parameters: is the arrival rate of new transfer requests along route , and is the average document size. We make the following standard simplifying assumptions: requests for document transfers along route arrive at the instants of a Poisson process with intensity , while the corresponding document sizes are mutually independent, independent of the arrival times, and drawn from an exponential distribution of mean .
These traffic assumptions make the process specifying the number of transfers in progress on different routes Markovian (see below) and thus greatly simplify analysis. The Poisson arrivals assumption is not unreasonable in a large network. In view of the insensitivity of performance results for an isolated link to the exact document size distribution, we do not expect divergence of the real distribution from the exponential size assumption to invalidate the derived conclusions. However, the main reason for assuming an exponential distribution is clearly one of analytical tractability.
The network state is summarized by the variables , where denotes the number of transfers in progress along route . It remains to specify at what speed documents are transmitted in any given state in order to turn into a Markov process with well defined dynamics. Indeed, given the rate at which documents along route are transferred when the network state is , is a Markov process with non-zero transition rates given by
A natural assumption would be to consider that each document receives its fair share of bandwidth. For instance, if as in  fairness is understood as max-min fairness, each transfer along route receives a bandwidth share , where
and for every route , there is at least one link such that
These two conditions uniquely determine the bandwidth shares . Having specified the Markov process , one can then attempt to study its steady state properties, identifying the conditions on the load parameters
under which it is ergodic and, when it is, determining the stationary distribution. Mean transfer times along each route can then be computed using Little’s law: .
It turns out that explicit formulas for steady state distributions are typically beyond reach. A notable exception is the linear network, with bandwidth shares being allocated to realize proportional rather than max-min fairness; see . In order to obtain formulas in other cases, one therefore has to resort to asymptotics on various parameters. For instance, for the linear network with max-min fair rate sharing, the regime where the arrival rate along route goes to zero (essentially, a form of light traffic analysis) is considered in ; this leads to approximate formulas for . It can be shown, in particular, that increases as the logarithm of the number of links when increases. This is in contrast to the case of proportionally fair sharing where it increases linearly in .
The main purpose of this paper is to investigate an alternative asymptotic regime where it is the network topology which evolves. The precise description of this limiting regime will be given in Section 4.
In the following sections, we consider bandwidth allocations according to the following “min” policy: given the network state , each transfer along route receives a bandwidth share given by
where we have introduced the notation to represent the total number of transfers making use of link .
It is easy to check that this policy satisfies the capacity constraints (1). Moreover, it is sub-optimal with respect to the max-min fairness policy, as shown in the next theorem.
Under the same initial conditions, the vector for the system under the allocation policy is stochastically smaller than , corresponding to the allocation.
Proof Assume that, for some , . Then, with the notation of (2),
Thus, using a coupling argument, one can define the processes and in such a way that for all .
The previous theorem motivates the study of the min policy, as it implies for instance that mean transfer times under the min policy provide upper bounds on the corresponding transfer times under the max-min policy.
3 Ergodicity conditions
In the following we demonstrate that min and max-min bandwidth sharing policies have a stationary regime under the usual conditions, i.e. when the load on each link is less than 1:
Under the allocation policies and , the network is
ergodic if ;
transient if .
This result has already been proven for the max-min policy in ; we note that, by Theorem 1, ergodicity under the min policy implies ergodicity under the max-min policy, thus the treatment of the min policy given below provides an alternative proof to that of . However we feel that, since the proof below is simpler and uses only elementary Lyapunov functions results, it should be easier to adapt to a more complicated situation. Transience under condition (ii) is in fact valid for any allocation policy which meets the capacity constraints (1).
Proof Consider the discrete time chain describing the sequence of states visited by the continuous time jump process . Transitions from a given state satisfy
Ergodicity of the continuous time process will follow from that of and from the fact that the mean sojourn times in each state are bounded from above uniformly in (or equivalently, that the jump rates out of each state are bounded away from zero), a property which is easily verified.
Sufficient condition. Assume , and define the Lyapunov function
where will be chosen later. The structure of the function, which may seem unnatural, has been chosen for the sake of computation; it is in fact of the general form , for appropriate constants and .
In order to express the transition rates in terms of and , remark that
Then, using the notation
we have, ,
Let , where is such that
for some real number satisfying . The following inequality sums up what we have so far:
Let be a real number such that . The following quantities will be evaluated separately
Since is a sum of negative terms, the following bound holds, for any is such that ,
Bounding is straightforward:
Now, if and are chosen to satisfy the inequality
we have, ,
Since is a compact set, Foster’s theorem applies (see e.g. ) and the Markov chain is ergodic.
Necessary condition. Assume now that there exists such that . Defining
we immediately have,
Since the jumps are bounded, the chain is transient.
4 Mean field analysis of large networks
It does not appear possible to obtain closed form expressions for the stationary distribution of the best-effort network state under the min policy. We therefore turn to the study of these stationary distributions under a limiting regime on network size and topology. A similar approach has previously been successfully applied to loss networks (see [8, 9], and references therein), and to queueing networks in [10, 11]. It is inspired by the so-called mean field models of statistical physics.
Mean field analysis in the present context is best illustrated by the star-shaped network of Fig. 2. This network has branches, each consisting of one inbound and one outbound link (thus implicitly is an even number), and all links have unit capacity. Each route connects the endpoints of two branches via the center node. It has an associated arrival rate and mean message length . The factor is introduced to make the total load on each link, , independent of the number of links . As discussed below, when goes to infinity the number of ongoing transfers on any link becomes independent of the number of ongoing transfers on any finite collection of the other links (this was termed the “chaos propagation” property in ). This allows us to derive fixed point equations for the probability distribution of the number of transfers in progress on any link.
4.1 Symmetrical star-shaped networks
Although amenability to a mean field analysis is a significant motivation for considering the star-shaped topology, it should be noted that it is also relevant to the study of real networks. Any overprovisioned links in a real network are largely transparent to the throughput of elastic best effort flows. Only bottleneck resources, typically located in the access network and within Web servers, need to be included in the network model. The star-shaped network may thus be considered to represent any network with a well provisioned backbone where throughput is limited by bottlenecks at the source and destination edges. For example, inbound links might represent Web server CPU, while outbound links correspond to the last hop of an ISP’s interconnection network. This discussion not only motivates the consideration of such a topology, but also suggests that letting go to infinity might indeed be realistic if represents the number of Web servers over the Internet. Of course, there would be no reason in practice to assume symmetry. This assumption is introduced solely for reasons of tractability.
Although our focus is on the star-shaped topology, the mean field approach can be applied to other symmetrical topologies. It thus allows one to consider routes with more than two hops. The corresponding extended model is described in detail in Section 4.2 below, where the corresponding fixed point equations are derived. Section 4.3 then presents analytical results for the star-shaped network. Results of numerical investigation of the fixed point equations are reported in Section 5.
4.2 Fixed point equations for large symmetrical networks
We use the following notation in the sequel.
: total number of links;
: length of a route through the network;
: number of routes going through a given link;
: number of active connections on link , in stationary state;
: number of active connections on route , going through links ;
: arrival rate on a link;
: mean message length;
: load of a link.
We have implicitly assumed here that the number of routes going through a link, , is the same for all links. We shall in fact assume further that the network topology is the same, as seen from any route. We do not attempt to give a formal definition of this symmetry assumption here. The reader is referred to  for a thorough discussion on the minimal symmetry assumptions required. Symmetry implies notably that each route has the same number of hops and the same traffic parameters. The star-shaped network discussed above constitutes an example of such a symmetrical network when (with ).
It is more difficult to come up with meaningful examples of symmetrical networks supporting routes with : in particular, the network should not be fully connected, since routes longer than one hop would then be pointless. One reasonable model is a hypercube (Fig. 3) of large dimension, in which each edge contains two one-way links.
The hypercube is a classical structure with many symmetries. It is characterized by its dimension . Its vertices are represented by -tuples of s and s (e.g., ) and its edges connect two vertices differing in only one coordinate.
The total number of links in such a network is . The number of routes going through any link is
where the only routes considered are the shortest paths between two vertices which differ in exactly coordinates. Note that the results below do not depend on the precise topology of the network.
We now derive the fixed point equations. It should be stressed that this derivation is heuristic. We clearly mention which steps need further justification in the course of the derivation. We do believe that the equations are very good approximations, however, especially in view of the numerical and simulation results presented in the following section.
Assume now that and that the system is in stationary state . For any , the proportion of links in state is
By symmetry, it holds that
The chaos propagation assumption111We have not proven that this assumption holds. It seems, however, that the techniques developed in  could be applied to prove that this is the case, provided the parameter goes to infinity with . implies that obeys a law of large numbers:
It appears that the dynamics of the system are driven by , traditionally referred to as the mean field. The following notation will also be useful:
In order to derive the equation satisfied by the limit stationary distribution , we must first describe the possible transitions for . The two cases of interest are
arrival on a link with connections:
departure from a link with connections:
The transition corresponding to a new connection arrival has rate . The main problem is to compute the departure rate from a link , given that it has ongoing transfers. This can be written as
Since the total number of routes is much larger than the size of the network, we assume that the probability of having more than one connection on a route is negligible222This fact is easy to prove in finite time, but requires more work for the stationary regime., and that the links on route are independent, conditioned on 333This is the point where the heuristic is not completely exact; it is however likely to be true when tends either to or .. The first property allows to rewrite (4) as
Let be a given link and let be one route using link , i.e., . The distribution of , conditioned on there being one connection on , is then
By symmetry, it is possible to sum both sides of the above fraction over all the routes going through link ,
Departure rate (5) then becomes, in view of the assumed independence property between the given ,
Taking the limit , the invariant measure equations follow. We have:
for , where
The two sets of equations (8) and (9) together constitute the fixed point equations we require. As noted in the introduction of this section, these equations do not depend on the topology of the network. The expression for can be simplified.
Let be random variables with distribution
and let . Then (8) reads
Straightforward calculations yield
The simplified form for is thus, from the basic properties of the minimum of independent random variables:
Note that, in the case (i.e., for the star-shaped network), the original equation (8) is perhaps simpler than the equivalent expression (11). It yields the following form for the fixed point equations:
When considering the star-shaped network as a model for Web transfers over the Internet, as suggested in Section 4.1, inbound links could be seen as the CPU of Web servers and outbound links as the last hop between the ISP’s backbone and the end customers. It thus makes sense to relax the symmetry assumption we had made between inbound and outbound links, as the two types of bottlenecks are of a different nature. We might thus consider a star-shaped network with inbound links, outbound links, inbound (resp. outbound) links having capacity (resp. ), see Fig. 4. Assume the mean message length is the same for each two-hop route, and the link capacities are fixed. The arrival rate on each route has the form , and the load is less that . The capacity of a “backbone” link is chosen to ensure is fixed when the size of the system grows. Then, as and increase, with small, the inbound links have many active connections and a large capacity, while the outbound links remain in “normal” utilization. The same approach as above can then be applied, to yield the set of fixed point equations
and (resp. ) represents the proportion of inbound (resp. outbound) links with ongoing transfers.
4.3 Analytical results for
While equation (9) looks superficially like a “birth and death process” equation, it is in fact non-linear due to the fact that and both depend on the .
From (10), one clearly sees that is increasing in , and tends to when . Therefore, is increasing as long as , and decreasing after that. This means that the form a modal distribution, which maximal value is attained at
We now present analytical results on the solution of the fixed point equations for . The proof of these results can be found in . It relies heavily on functional analysis.
where and are non-negative constants.
Thus, under the min policy, any link in the star-shaped network has a mean queue length which is one order of magnitude larger than for a single server queue with the same load (). Its tail distribution is still geometrical with factor .
It is possible to give an expression for the constant : if and are solutions of the following system of differential equations,
then can be written as follows:
Since this system is numerically highly unstable, it has proven difficult (with the “Livermore stiff ODE” solver from MAPLE) to derive a better estimate for .
5 Numerical analysis and simulations
While the analytical results of Section 4.3 give some good estimates, they are only valid in the heavy traffic regime . In addition, similar results for are not available. We thus resort to numerical resolution of the equations to gain a better understanding of the performance of transfers across large symmetrical networks. The very form of the equations suggests the use of a fixed-point method for this numerical resolution: starting from a priori values , the algorithm computes the corresponding from (8), and then new values from (9). Provided special care is taken to avoid instabilities, the iteration of this process converges rapidly (less than steps). Sample results are shown in Fig. 5, corresponding to a large symmetrical network with routes of length .
As clearly seen in the figure, the distributions are very different from what would be obtained for routes of length . In this case, the system consists of a collection of independent M/M/ queues and the associated distribution is geometric. For , the distributions are markedly modal and the positions of the peak values are roughly proportional to , a fact has only been proven in Section 4.3 for the case . Moreover, since the shape of the distribution is rather narrow, this position roughly coincides with the mean number of active connection (as can be seen from the raw data).
The impact of route length is illustrated in Fig. 6. It seems that the mean number of active connections (which is again approximately the peak value of the distribution) is roughly proportional to . Note that a logarithmic growth rate is very slow suggesting that, beyond 2 or 3, the number of bottlenecks does not have a significant impact on mean transfer times. They depend much more on the load .
The results presented so far only concern the solution of the fixed point equations. As mentioned earlier, there are gaps in the derivation of these equations. To assess their quality and to investigate the accuracy of the asymptotic approximation for finite size networks, we ran a number of simulations of the star-shaped network. Fig. 7 displays the corresponding results when the load on each link is set to for a varying number of links. The agreement between the simulation results and the fixed point equation results is excellent for links and improves as increases.
In this paper, we have considered a class of Markov processes called best-effort networks which constitutes a natural probabilistic model for evaluating the performance of document transfers over data networks such as the Internet. Unlike almost all previous work, this model accounts for the random nature of traffic: document transfers begin at the epochs of a certain arrival process and the size of each document is drawn from a given probability distribution. In the interests of tractability we assumed Poisson arrivals and exponentially distributed sizes. We introduced the “min” bandwidth sharing policy as a conservative approximation to the more classical max-min policy. Necessary and sufficient ergodicity conditions for best effort networks under the min and max-min policies have been established.
In order to pursue the analysis of the stationary distributions of the number of transfers in progress, we have resorted to large network asymptotics applying the mean field approach of statistical physics. This enabled us to derive fixed point equations for the probability distribution of the number of ongoing transfers on a given network link. The validity of these equations has been established by comparing their solution with the results of simulations.
Analytical and numerical results show how the mean transfer time depends on the number of bottleneck links and their load. The steady state distribution in networks where routes have several bottlenecks () has a marked modal behavior. This is significantly different to the geometric distribution which holds when routes have a single bottleneck (). Performance is also much more sensitive to link load for multiple bottleneck routes: as , mean transfer time increases like in the case , whereas the dependence is in when . Finally, the impact of the number of hops per route appears small (given that ) compared to that of parameter . This suggests that the star-shaped network is perhaps a sufficiently complex model, and that the study of symmetrical networks with is less relevant.
The work presented here can be pursued in several directions. On the theoretical side, the analytical results presented in Section 4.3 constitute a first step to understanding the solution of the fixed point equations which could be taken further. Another challenging theoretical question is to improve the fixed point equations in a rigorous way. On a more practical side, the fixed point equations might be simplified so as to find simple approximate formulas for mean transfer times as a function of key parameters (such as , in the case of the asymmetrical star-shaped network described in Remark 1). Such approximate formulas could then lead to engineering rules for capacity planning.
We view the present study as a preliminary investigation into the performance of best effort networks with multiple bottleneck links. A significant result of this investigation is the discovery that the extension of the processor sharing model valid for a single bottleneck proves to be very hard. There appears to be no simple parallel to the familiar fixed point techniques used in loss networks. The problem is, however, of considerable practical importance for providers seeking to engineer their network to ensure adequate throughput for document transfers. We hope therefore that this paper will incite further work and the development of alternative heuristic approaches.
-  D. Bertsekas and R. Gallager, Data Networks, Prentice-Hall International, 2nd edition, 1992.
-  J. H. Mo and J. Walrand, “Fair end-to-end window-based congestion control,” in SPIE ’98 International Symposium on Voice,Video and Data Communications, 1998.
-  L. Massoulié and J. Roberts, “Bandwidth sharing: objectives and algorithms,” IEEE Infocom, 1999.
-  P. Hurley, J.-Y. Leboudec, and P. Thiran, “A note on the fairness of additive increase and multiplicative decrease,” ITC 16, 1999.
-  J. Roberts and L. Massoulié, “Bandwidth sharing and admission control for elastic traffic,” ITC specialists seminar, 1998.
-  G. De Veciana, T.-J. Lee, and T. Konstantopoulos, “Stability and performance analysis of networks supporting services with rate control – could the internet be unstable?,” IEEE Infocom, 1999.
-  G. Fayolle, V. A. Malyshev, and M. V. Menshikov, Topics in the Constructive Theory of Countable Markov Chains, Cambridge University Press, 1995.
-  F. Kelly, “Loss networks,” Ann. Appl. Probab., vol. 1, pp. 319–378, 1991.
-  C. Graham and S. Méléard, “Chaos hypothesis for a system interacting through shared resources,” Probability Theory and Related Fields, 1994.
-  N. D. Vvedenskaya, R. L. Dobrushin, and F. I. Karpelevich, “A queueing system with a choice of the shorter of two queues—an asymptotic approach,” Problems Inform. Transmission, vol. 32, pp. 15–27, 1996.
-  F. Delcoigne and G. Fayolle, “Thermodynamical limit and propagation of chaos in polling systems,” Markov Processes and Related Fields, vol. 5, no. 1, pp. 89–124, 1999.
-  G. Fayolle and J.-M. Lasgouttes, “A nonlinear integral operator encountered in the bandwidth sharing of a star-shaped network,” in Mathematics and Computer Science: Algorithms, Trees, Combinatorics and Probabilities, Trends in Mathematics, pp. 231–242. Birkhaüser, 2000.
-  H. Blasius, “Grenzschichten in flüssigkeiten mit kleiner reibung,” Z. Math Phys., vol. 56, pp. 1–37, 1908, English translation in NACA TM 1256.
-  H. Schlichting, Boundary layer theory, McGraw-Hill Book Co., Inc., New York, 1960, Translated by J. Kestin. 4th ed. McGraw-Hill Series in Mechanical Engineering.