Dynamic Cloud Network Control under Reconfiguration Delay and Cost

Dynamic Cloud Network Control under Reconfiguration Delay and Cost

Chang-Heng Wang1, Jaime Llorca2, Antonia M. Tulino2, and Tara Javidi1
1University of California, San Diego, CA. Email:{chw009, tjavidi}@ucsd.edu
2Nokia Bell Labs, NJ. Email:{jaime.llorca, a.tulino}@nokia-bell-labs.com
Abstract

Network virtualization and programmability allow operators to deploy a wide range of services over a common physical infrastructure and elastically allocate cloud and network resources according to changing requirements. While the elastic reconfiguration of virtual resources enables dynamically scaling capacity in order to support service demands with minimal operational cost, reconfiguration operations make resources unavailable during a given time period and may incur additional cost. In this paper, we address the dynamic cloud network control problem under non-negligible reconfiguration delay and cost. We show that while the capacity region remains unchanged regardless of the reconfiguration delay/cost values, a reconfiguration-agnostic policy may fail to guarantee throughput-optimality and minimum cost under nonzero reconfiguration delay/cost. We then present an adaptive dynamic cloud network control policy that allows network nodes to make local flow scheduling and resource allocation decisions while controlling the frequency of reconfiguration in order to support any input rate in the capacity region and achieve arbitrarily close to minimum cost for any finite reconfiguration delay/cost values.

I Introduction

The emergence of network function virtualization (NFV) and software defined networking (SDN) enables network services to be deployed in the form of interconnected software functions instantiated over commercial off-the-shelf servers at multiple cloud locations and interconnected via a programmable network fabric. This allows cloud network operators to host a large variety of services over a common general purpose infrastructure and dynamically allocate resources according to changing demands, reducing both capital and operational expenses.

The unprecendented flexibility of the cloud networking paradigm provides exciting opportunities for future service scenarios and stimulates research in key technical areas such as optimal function placement, service flow routing, and joint cloud/network resource allocation. One line of research addressed the virtual network functions placement problem from a static global optimization point of view, in which the goal is to find the placement of virtual functions and the routing of network flows that meet service demands with minimum cost [1, 2, 3]. However, requirements for the prior knowledge of global system information and service demands restrict the use of such centralized policies to relatively small-scale scenarios with relatively static demands. In contrast, recent works have leveraged ideas from dynamic network control to design distributed control policies for computing networks, in which nodes make local decisions on processing and transmission flow scheduling [4], as well as associated compute and network resource allocation [5, 6], with global system guarantees. The work in [4] proposes a backpressure-based algorithm for maximizing the rate of queries for a computation operation on remote data, while [5, 6] present cloud network control policies for service function chains that guarantee throughput-optimality and minimun average cloud network cost. While the dynamic cloud network control (DCNC) algorithm presented in [6] shows promise in serving varying workloads with minimum cost by dynamically adjusting resource allocation and scheduling decisions, it overlooks the fact that the reconfiguration of virtual compute and network resources takes a non-negligible amount of time and may incur additional cost. As an example, starting up a virtual machine (VM) can take up to 5-10 minutes [7]. A control policy that is unaware of the reconfiguration delay and cost associated with the cloud and network resources, may perform excessive reconfigurations that can lead to increased congestion and overall operational cost.

The reconfiguration delay associated with flow scheduling has been studied in the context of the switch model [8, 9, 10], and multi-hop networks [11, 12], and signal control in transportation systems [12]. In these works, throughput optimal scheduling policies under any finite reconfiguration delay have been proposed. However, resource allocation, and thus cost minimization, is not considered in the settings of these works. Regarding reconfiguration cost, [13] addressed the cost of flow reconfigurations in SDN by designing a control policy that minimizes total flow allocation cost subject to a given reconfiguration cost budget. In [14], the reconfiguration cost associated with switching base stations on and off in a dynamic wireless network setting was considered. The proposed approach requires arrival and channel statistics for activation decisions, and leverages an explore-exploit policy in the case that this information is not available.

In this paper, we address the problem of optimal control of multi-hop multi-commodity cloud networks in practical settings characterized by non-negligible reconfiguration delay and cost. The contributions of this work can be summarized as follows:

  • We show that the capacity region and the minimum time average cost remains the same even in the presence of reconfiguration delay and cost, given that reconfiguration delay and cost values are finite.

  • We show that a reconfiguration-agnostic policy that is throughput optimal and achieves arbitrarily close to the minimum time average cost in the no reconfiguration delay/cost regime does not necessarily retain these properties when reconfiguration delay/cost exists.

  • We propose a distributed flow scheduling and resource allocation policy that is able to guarantee cloud network throughput and cost optimality for any finite values of reconfiguration delay/cost. The proposed Adaptive Dynamic Cloud Network Control (ADCNC) policy adapts the frequency of reconfiguration utilizing the queue length information, and does not require prior knowledge on arrival statistics or the exact value of the reconfiguration overheads.

  • The problem considered in this work is a combination of the cost-minimizing flow scheduling and the multi-hop scheduling with reconfiguration delay, where known solutions of each does not trivially apply in this generalization. The proposed ADCNC policy generalizes the applicability of adaptive policy to incorporate the regime of cost-minimizing flow scheduling, which requires an appropriate modification in the reconfiguration criterion.

The rest of the paper is organized as follows. We introduce the system model and formulate the cloud network control problem in Section II. With the formulated problem, we compare our setting to the existing literature in Section III. The impact of the reconfiguration delay/cost is illustrated in Section IV to motivate the problem considered in this work. In Section V, we introduce ADCNC policy, and characterize its performance guarantee. Simulation results are presented in Section VI. We then conclude with some discussions and future directions in Section VIII.

Notation: Throughout the paper, we use to denote the indicator function, and to denote the cardinality of a set . We also use as a shorthand for .

Ii System Model

A cloud network is modeled as a directed graph , with vertices and edges representing cloud nodes and network links, respectively. Cloud and network resources are characterized by their processing and transmission capacities and costs, as follows:

  • : the set of possible processing resource units at node

  • : the set of possible transmission resource units at link

  • : the processing capacity resulting from the allocation of processing resource units at node

  • : the transmission capacity resulting from the allocation of transmission resource units at link

  • : the cost of maintaining processing resource units at node

  • : the cost of maintaining transmission resource units at link

  • : the cost per processing flow unit at node

  • : the cost per transmission flow unit at link

Throughout the rest of the discussion, we make the following assumption on the capacities and costs of cloud and network resources:

Assumption 1.

For any node and any link , we assume that both the capacity and the cost are strictly increasing with the amount of resource assigned. In other words, given any node , for any such that , we have and ; similarly, given any link , for any such that , we have and .

Ii-a Service model

Cloud network offers a set of services . Each service is described by a chain of service functions. We let denote the ordered set of functions of service , hence the tuple represents the -th function of service .

In order to describe the flow of packets through a service chain, we adopt a multi-commodity-chain flow model as in [2, 5, 6], in which a commodity represents the flow of packets at a given stage of a service chain. In particular, a commodity- flow is specified by source node , destination node , and function , indicating the flow of packets with origin at and destination at that have been processed by the first functions of service . For ease of exposition, we let and denote the commodities that succeed and preceed commodity in its service chain, respectively.

Each service function has potentially distinct processing requirement, which may also vary between cloud locations. We let denote the processing-transmission flow ratio of function at node . That is, when one transmission flow unit of commodity goes through function at node , it occupies processing flow units. In addition, our service model also captures the possibility of flow scaling. We denote by the scaling factor of function , indicating that function generates an average of output packets of commodity per input packet of commodity .

Ii-B Reconfiguration Delay and Cost

We consider cloud network control policies that adjust the configuration of cloud and network resources, as well as the schedule of commodity flows, according to changing demands. We assume that such reconfigurations may incur the following two types of overhead:

  • Reconfiguration delay (time): This is the time duration for the reconfiguration process to complete. We assume that during the reconfiguration process, the associated function (transmission or processing of commodity flows) is not available. We denote by the reconfiguration delay for node , and by the reconfiguration delay for link .

  • Reconfiguration cost: This is the cost/penalty associated with each reconfiguration operation. Let denote the reconfiguration cost for node , and denote the reconfiguration cost for link .

In the rest of the paper, we use to denote the reconfiguration delay and cost structure of a cloud network, where .

We consider a time slotted system with slots normalized to integral units . Suppose that node reconfigures the processing resource allocation or the commodity being processed at time . Then, flow processing at node becomes unavailable during time period , and a reconfiguration cost is incurred at time . Similarly, suppose that link reconfigures the transmission resource allocation or the commodity being transmitted at time . Then, flow tranmission is unavailable during , and a reconfiguration cost is incurred at time .

Note that we consider a worst-case reconfiguration delay model in that we assume complete unavailability of packet processing or transmission functionality at a node or link undergoing reconfiguration. Importantly, a throughput-optimal policy for this worst-case reconfiguration delay model will guarantee throughput-optimality for any other less restrictive model. Extensions to this model are discussed in Section VII.

For ease of discussion in the following, we also define and to denote the reconfiguration status:

  • : the time remaining in the reconfiguration process at node

  • : the time remaining in the reconfiguration process at link , where

By definition, these processes evolve as follows: At any time , if node (or link ) reconfigures, then set (or , respectively); otherwise, set (or , respectively).

Ii-C Queueing Model

Let denote the queue backlog of commodity- packets at node at the beginning of time slot . We denote by the exogenous arrivals of commodity- packets at node during time slot . Throughout this paper, we make the following assumptions for the exogenous arrival processes.

Assumption 2.

Each exogenous arrival process is independent and identically distributed (i.i.d.) over time, with . Furthermore, each exogenous arrival process has bounded support. In other words, there exist such that , , .

At each time slot , each node makes the following transmission and processing scheduling and resource allocation decisions:

  • : the flow rate of commodity being processed at node at time

  • : the flow rate of commodity on link at time

  • : the number of processing resource units allocated to node at time

  • : the number of transmission resource units allocated to link at time

With the aforementioned setup, we may write the queue dynamics for each commodity at each node :

(1)

where and denote the set of outgoing and incoming neighbors of node , respectively.

Observe from (1) that the serving rate of the queue of commodity at node is composed of the transmission rate of commodity of all outgoing links and the local processing rate of commodity . On the other hand, the arrival rate is composed of the transmission rate of commodity of all incoming links and the local processing rate of the preceeding commodity in the service chain . It is important to note that there is no contribution to both serving and arrival rates from those transmission and processing resources that are undergoing reconfiguration (i.e., or ), indicating the inability to transmit or process packets during the reconfiguration process.

Ii-D Problem Formulation

Given a set of service demands with average input rate matrix , the goal is to support the demand while minimizing the average cloud network cost.

In order to formalize the problem, we first introduce the following notion of rate stability, which dictates the ability of a cloud network control policy to support the demand:

Definition 1.

A cloud network is rate stable if

(2)

With the notion of rate stability, we may then define the capacity of a cloud network and the throughput optimality of a cloud network control policy as follows:

Definition 2.

For a given cloud network, the capacity region of the cloud network is defined as the closure of all input rate matrix that could be rate stable under some cloud network control policy.

Definition 3.

A cloud network control policy for a cloud network is throughput optimal if the cloud network operated under the control policy is rate stable for any input rate matrix in the capacity region.

Besides the ability to support the demand, the total operational cost of a cloud network is of concern in many practical settings. The total cloud network cost consists of the total processing and transmission cost. We assume that when a processing/transmission resource is undergoing reconfiguration, processing/transmission allocation cost is not incurred since the resource is not operative until the reconfiguration process completes. Hence, we can write the total cloud network cost at time as

(3)

We then formulate the dynamic cloud network control problem under reconfiguration delay/cost as the following. Given an input rate matrix in the capacity region:

(4a)
s.t. The cloud network is rate stable with input rate
and under queue length dynamics (1) (4b)
(4c)
(4d)
(4e)
(4f)

Iii Related Work

Given the cloud network control with reconfiguration overhead defined in the previous section, we now discuss the relation between the current work and the related literature.

In the NFV literature, many works consider the cloud network planning problem in the static (or quasi-static) regime. In this regime, the traffic demands are assumed to be fixed data demands, and the proposed approaches focus on optimization through network function placement and flow allocation. A thorough comparison of the static NFV placement and routing could be found in [15]. In the static regime, the underlying assumption of slow (or no) variation in traffic demands allows the proposed approaches to ignore the reconfiguration overhead as the reconfiguration typically occurs in longer time scale. However, the very same assumption limits the applicability of these approaches in many practical settings where traffic demands constantly change over time.

On the other hand, Dynamic Cloud Network Control (DCNC) policy [6] addresses the dynamic setting of cloud network problem. It is shown in [6] that DCNC policy is throughput optimal and could achieve minimum mean cloud network cost under zero reconfiguration overhead. However, under the practical setting where reconfiguration overhead exists, DCNC policy may suffer serious performance degradation since it is unaware of the reconfiguration overhead. When reconfiguration delay is not negligible, since DCNC policy has no control on the frequency of reconfiguration, it would waste too much time in the reconfiguration delay and lose throughput optimality.

While DCNC policy ignores the practical reconfiguration overhead issue, we note that there are works in the literature that deal with scheduling / control problems under reconfiguration delay [10, 11, 12]. For example, Adaptive MaxWeight policy [10] is proposed for input-queued switch model with reconfiguration delay; on the other hand, Adaptive Backpressure policy is a distributed policy proposed for multi-hop networks with reconfiguration delay. These adaptive policies utilize queue lengths information to determine appropriate timing for reconfiguration, and thus implicitly adapt the frequency of reconfiguration through queue conditions.

Note that in all the above works that address the reconfiguration delay, the cost minimization is not taken into account. Therefore, we may view these approaches as solutions to a special case of the cloud network control with reconfiguration overhead problem. One of the main contribution of this work is to extend the applicability of the adaptive policies in order to incorporate the cloud network cost minimization.

Iv Impact of Reconfiguration Delay/Cost

In this section, we discuss the impact of reconfiguration delay and cost on the performance of a cloud network control policy that is unaware of such reconfiguration delay/cost.

In order to formalize the notion of performance measure, we start with the characterization of the cloud network capacity region and the minimum average cloud network cost required for network stability. The cloud network capacity region is defined as the closure of all input rate matrices that can be stabilized by some cloud network control policy, given the cloud network structure . For each rate matrix , we denote by the minimum average cost required for network stability.

The following theorem establishes that the capacity region and the minimum average cost for each arrival rate in the capacity region remains the same for any finite reconfiguration delay and cost. The proof of Theorem 1 is given in Appendix D.

Theorem 1.

Given any finite reconfiguration delay/cost structure , the capacity region remains the same. In particular, , where is the capacity region of the cloud network without reconfiguration delay, as characterized in [5, Theorem 1]. Furthermore, given any exogenous arrival rate matrix , we have .

While it was shown in [6] that under the setup of no reconfiguration delay and cost, DCNC policy is throughput optimal and achieves a cost-delay tradeoff, the result does not hold for the case in which reconfiguration delay or cost exists. In fact, as it will be shown in section VI (Figs. 2 and 6), DCNC policy loses throughput optimality and the ability to achieve minimum average cost under the presence of reconfiguration delay or cost.

In the next section, we propose Adaptive DCNC (ADCNC) policy, which is an online distributed policy for cloud network control under reconfiguration delay and cost. We then establish theoretical performance guarantees of ADCNC policy, specifically throughput optimality and the cost-delay tradeoff. In other words, ADCNC policy recovers the performance guarantees that DCNC policy loses when reconfiguration overhead exists.

V Dynamic Cloud Network Control under Reconfiguration Delay/Cost

V-a Adaptive DCNC policy

At each time slot , each cloud network node makes local processing and transmission decisions on its corresponding outgoing interfaces.

We select a function with which is strictly increasing and sublinear (i.e. ), and a parameter . Given function and parameter , node makes the following decisions at time :

  • Transmission decisions: For each neighbor

    1. Compute the transmission max-utility-weight as

      (5)

      with , being its maximizers.

    2. Let denote the schedule at time . Compute the transmission weight as

      (6)

      and the transmission weight differential at time as

      (7)
    3. Define the transmission weight differential threshold at time as

      (8)

      and determine the transmission resource-commodity schedule at time as

    4. Allocate transmission resource units and set transmission flow rates as:

  • Processing decisions:

    1. Compute the processing max-utility weight as

      (9)

      with , being its maximizers.

    2. Let denote the schedule at time . Compute the processing weight as

      (10)

      and the processing weight differential at time as

      (11)
    3. Define the processing weight differential threshold at time as

      (12)

      and determine the processing resource-commodity schedule at time as

    4. Allocate processing resource units and set processing flow rates as:

V-B Performance Analysis

In this subsection, we extend the drift-plus-penalty analysis of [16] to show that Adaptive DCNC is throughput-optimal and achieves average cost-delay tradeoff with probability 1 (w.p. 1) under any finite reconfiguration delay/cost.

The stability of Adaptive DCNC relies on the fact that it allows each node and link to adjust the frequency of reconfiguration according to its maximal queue length differential. In particular, Adaptive DCNC decreases the frequency of reconfiguration if the maximal queue length differential increases. This behavior may be characterized by the following lemma.

Lemma 1.

Suppose Assumption 1 and 2 hold, and the cloud network is operated under Adaptive DCNC with parameter and sublinear function . Given any fixed integer , if the maximal queue length differential at a link at time , , is greater than a constant as defined below in (13) , then link reconfigures at most once during .

Similarly, if the maximal queue length differential at a node at time , , is greater than a constant as defined below in (14) , then node reconfigures at most once during .

(13)
(14)

where , , , , and .

The proof of Lemma 1 is given in Appendix A.

With Lemma 1 limiting the frequency of reconfiguration, and the weight differentials , being bounded by local thresholds that are growing sublinearly with the local maximal queue length differential, we then extend the drift-plus-penalty analysis of [16] to prove the following performance guarantee.

Theorem 2.

Suppose the arrival rate matrix is strictly interior to the capacity region , and suppose all reconfiguration delays and costs in are finite. Then Adaptive DCNC stabilizes the cloud network, while achieving arbitrarily close to minimum average cost w.p. 1, i.e.

(15)
(16)

where is a constant that is dependent on the system parameters ; is a positive constant satisfying , and is a matrix of all ones.

The proof of Theorem 2 is given in Appendix B.

Vi Simulations

Fig. 1: Abilene US network topology.

In this section, we present simulation results for the proposed Adaptive DCNC policy and compare with benchmark policies. The sublinear function for Adaptive DCNC is selected as for all simulations.

We consider a cloud network with network topology based on the Abilene US Network, as shown in Fig. 1. We assume that all nodes are clouds that can host all service functions. We assume both nodes and links have homogeneous processing and transmission resources, respectively. In particular, we assume , and for each node ; while for each link , we assume , , and .

We consider 2 services, each composed of 2 functions. We assume each service is requested by one source-destination pair. For Service 1, the source is in Seattle and the destination in New York; while for Service 2, the source is in Sunnyvale and the destination is in Atlanta. The arrival processe for both flows are i.i.d. Poisson with arrival rates denoted by and , respectively. Throughout the simulation, we set both arrival rates to the same value, denoted by , i.e. .

For ease of discussion, we separate cases where only reconfiguration delays exist and where only reconfiguration costs exist in the following subsections.

Vi-a Reconfiguration Delay

We first consider the case of cloud networks with reconfiguration delay only, in other words, the reconfiguration costs are set to zero. In this subsection, the reconfiguration delay of all the processing and transmission resources to be the same value, denoted by .

Fig. 2 compares the mean (time average) total queue length for DCNC and ADCNC under various flow arrival rates . The parameter is fixed as for both policies. Given the topology and the processing and transmission capacity setting, the rate pair is at the boundary of the capacity region, hence we consider the interval . It is clear from the figure that when the reconfiguration delay is nonzero, DCNC loses throughput-optimality, and the maximum arrival rate it can stabilize reduces as the reconfiguration delay increases. Note that while ADCNC shows smaller mean queue length than DCNC even when , ADCNC incurs slightly higher resource cost in this setting. Cost-delay tradeoff comparisons for input rates that both algorithms can stabilize are presented next.

Fig. 2: Mean total queue length for DCNC and ADCNC under various flow arrival rates. Parameter .
Fig. 3: Mean cost versus mean queue length for DCNC and ADCNC under various reconfiguration delays. Arrival rate .

In Fig. 3, we plot the mean (time average) network cost versus the mean total queue length for DCNC and ADCNC under various reconfiguration delays. The arrival rate is fixed as . Note that for each curve, the control parameter tunes the tradeoff between network cost and total queue length. The closer a curve is to the lower-left corner, the better the performance (cost-delay tradeoff). Note that without reconfiguration delay, , DCNC and ADCNC have the similar performance. As increases, the performance of the two policies starts to degrade. Nevertheless, ADCNC policy always guarantees throughput-optimality, and is able to push the mean network cost arbitrarily close to minimum at the expense of increased mean queue length. In contrast, DCNC has significantly larger performance degradation as increases, and does not guarantee throughput-optimality. In fact, for , DCNC cannot even stabilize the arrival rate of , hence the absence of the associated cost-delay curve.

Fig. 4: Mean fraction of time under reconfiguration for various parameter V.
Fig. 5: Mean total queue length for ADCNC and BMP under various flow arrival rates. Sublinear function for both policies, and parameter for ADCNC.

In Fig. 4, we further look into the reconfiguration behavior of both DCNC and ADCNC policy under various values of the control parameter , with reconfiguration delay fixed to , and arrival rate . The vertical axis represents the fraction of time that a given transmission/processing resource is under reconfiguration, averaged over all resources, i.e., the time overhead caused by the reconfiguration delay. We first notice that ADCNC spends much less time under reconfiguration, which is one of the key reasons for ADCNC to preserve throughput-optimality under finite reconfiguration delay. We then notice that while increasing the parameter helps reducing reconfiguration overhead for both policies, DCNC spends a significanlty higher fraction of time under reconfiguration even for large .

To close the discussion of the reconfiguration delay case, we consider the comparison between ADCNC policy and Biased MaxPressure (BMP) policy proposed in [12]. Recall from Section III that BMP policy does not consider the cost minimization, which could be considered as a special case of the cloud network model in this paper. In this special case, we could set the parameter for ADCNC policy in order to ignore the cost minimization, and make a fair comparison to BMP policy. In Fig. 5, we show the delay performance of ADCNC and BMP policies under varying arrival rates and under different reconfiguration delay. Since both policies are guaranteed to be throughput optimal, we could see that the delay remains finite for arrival rates up to . We could also see that while both policies have comparable delay performance, ADCNC performs slightly better especially for small reconfiguration delay.

Fig. 6: Mean cost versus mean queue length for DCNC and ADCNC under various reconfiguration costs.

Vi-B Reconfiguration Cost

In this subsection, we set the reconfiguration delay to be zero, and set the reconfiguration cost of all the processing and transmission resource to be the same value, denoted by . Since there is no reconfiguration delay, both DCNC and ADCNC are throughput-optimal and can support the same arrival rates. We hence focus our attention in comparing their cost-delay tradeoff performance.

Fig. 6 shows the cost-delay tradeoff achieved by DCNC and ADCNC under various reconfiguration costs . We first notice that as the reconfiguration cost increases, DCNC can no longer achieve arbitrarily close to the minimum average cost, even when the parameter is tuned to endure large mean total queue length. On the other hand, Adaptive DCNC is able to achieve arbitrarily close to the minimum cost under any reconfiguration cost .

Vii Extensions

Vii-a Adaptive DCNC Policy for Generalized Setting

In this subsection, we briefly discuss some interesting extensions to the current model that could be captured with slight modification of the analysis.

(1) Different reconfiguration delay/cost for resource allocation and commodity allocation: In this paper, we have assumed that the same reconfiguration delay and cost are incurred upon any change in either the allocation of resources or the commodity being processed/transmitted. In practice, different delays and costs can be associated with different reconfiguration operations. It is rather straightforward to show that ADCNC would preserve throughput and cost optimality for any finite values of such heterogeneous delays and costs by treating any change as incurring the maximum of such delays/costs. However, improved policies (in terms of cost-delay tradeoff) could be designed in such settings. In the next subsection, we provide an example heuristic variant of ADCNC policy to improve the cost-delay performance under this setting.

(2) Partial reconfiguration: In this paper, we consider a worst-case reconfiguration delay model in the sense that we assume complete unavailability of packet processing or transmission functionality at a node or link undergoing reconfiguration. In practice, there may be cases in which adding or removing resources without changing the allocated commodity only reduces the available processing or transmission rate to the minimum between the available rates before and after reconfiguration. Importantly, a throughput-optimal policy for this worst-case reconfiguration delay model will guarantee throughput optimality for any other less restrictive model. Improved policies (in terms of cost-delay tradeoff) for this setting are of interest for future work.

Vii-B A Heuristic Variant of Adaptive DCNC Policy

In the previous subsection, we introduced a more generic setting of cloud network where different reconfiguration overheads are associated with different reconfiguration operations, i.e. resource reconfiguration and commodity reconfiguration. While the throughput optimality guarantee for the Adaptive DCNC extends to this case as mentioned earlier, it is possible to improve the performance, i.e. cost-delay tradeoff, by exploiting the unequal reconfiguration overhead. We now introduce a heuristic variant of Adaptive DCNC policy as an example for improving the cost-delay performance.

Recall that Adaptive DCNC policy reconfigures both resource allocation and scheduled commodity at the same time. This approach is reasonable when the reconfiguration overhead is the same for both operations, since when one reconfiguration operation is performed, the other could be performed at the same time without incurring additional overhead. However, when the reconfiguration overhead is different for different operations, intuitively one may benefit from performing the reconfigure operation with smaller overhead more frequently. For this reason, we modify the reconfiguration criterion in Adaptive DCNC to a two-stage criterion. The first stage is the same as the reconfiguration criterion as in Adaptive DCNC, while the additional stage is used to decide whether to perform the reconfiguration operation with smaller overhead. With the additional stage of reconfiguration criterion, we could expect the reconfiguration operation with smaller overhead to be performed more frequently.

Fig. 7: Mean cost versus mean queue length for ADCNC and ADCNC-2stage under various commodity reconfiguration delay. Resource reconfiguration delay is fixed as .

To be more specific, consider an example where the resource reconfiguration overhead is larger than the commodity reconfiguration overhead. For each cloud network node , for the processing decision at each time instance , node first follows steps 1) and 2) as described in section IV.A to compute and . Then at step 3), node first checks if the criterion is met. If so, then reconfigure both resource and commodity allocation; otherwise, it further checks the following. Node computes , and compares it with a threshold . If is above the threshold, then reconfigures the commodity (while the resource allocation remains the same), otherwise no reconfiguration is performed. We refer this policy as ADCNC-2stage policy as it is a variant of ADCNC policy where the reconfiguration criterion becomes a 2 stage decision.

In Fig. 7, we show the simulation result for ADCNC and ADCNC-2stage under different commodity reconfiguration delay, while the resource reconfiguration delay is fixed as . Again for simplicity we set all the reconfiguration cost to be zero. We first note that the performance of ADCNC policy (solid lines) remains similar. This aligns with the interpretation that ADCNC treats the reconfiguration overhead to be the maximum of the two different overheads. On the other hand, we could see that ADCNC-2stage policy (dashed lines) exploits the smaller commodity reconfiguration delay and improves the cost-delay performance as the commodity reconfiguration delay becomes smaller.

Viii Conclusion

This paper addressed the dynamic control of network service chains in cloud networks with non-negligible resource reconfiguration delay and cost. We showed that while the capacity region and the minimum achievable time average cost remains unchanged regardless of the value of reconfiguration delay or cost, the throughput and cost optimality of existing policies (in the regime without reconfiguration delay/cost) is compromised when reconfiguration delay/cost exists. We then proposed Adaptive DCNC, a distributed flow scheduling and resource allocation policy that controls the frequency of reconfiguration based only on local queue length information. We showed that ADCNC is throughput optimal and achieves a cost-delay tradeoff, and validated the result via numerical simulations.

References

  • [1] R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “Near optimal placement of virtual network functions,” in 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 1346–1354, April 2015.
  • [2] M. Barcelo, J. Llorca, A. M. Tulino, and N. Raman, “The cloud service distribution problem in distributed cloud networks,” in 2015 IEEE International Conference on Communications (ICC), pp. 344–350, June 2015.
  • [3] H. Feng, J. Llorca, A. M. Tulino, and A. F. M. D. Raz, “Approximation algorithms for the nfv service distribution problem,” in 2017 IEEE Conference on Computer Communications (INFOCOM), April 2017.
  • [4] A. Destounis, G. S. Paschos, and I. Koutsopoulos, “Streaming big data meets backpressure in distributed network computation,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9, April 2016.
  • [5] H. Feng, J. Llorca, A. M. Tulino, and A. F. Molisch, “Dynamic network service optimization in distributed cloud networks,” in 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 300–305, April 2016.
  • [6] H. Feng, J. Llorca, A. M. Tulino, and A. F. Molisch, “Optimal dynamic cloud network control,” in 2016 IEEE International Conference on Communications (ICC), pp. 1–7, May 2016.
  • [7] M. Mao and M. Humphrey, “A performance study on the vm startup time in the cloud,” in 2012 IEEE Fifth International Conference on Cloud Computing, pp. 423–430, June 2012.
  • [8] G. Celik, S. C. Borst, P. A. Whiting, and E. Modiano, “Dynamic scheduling with reconfiguration delays,” Queueing Syst. Theory Appl., vol. 83, pp. 87–129, June 2016.
  • [9] C. W. Chan, M. Armony, and N. Bambos, “Maximum weight matching with hysteresis in overloaded queues with setups,” Queueing Syst. Theory Appl., vol. 82, pp. 315–351, Apr. 2016.
  • [10] C. H. Wang and T. Javidi, “Adaptive policies for scheduling with reconfiguration delay: An end-to-end solution for all-optical data centers,” IEEE/ACM Transactions on Networking, vol. 25, pp. 1555–1568, June 2017.
  • [11] L. Tassiulas, “Adaptive back-pressure congestion control based on local information,” IEEE Transactions on Automatic Control, vol. 40, pp. 236–250, Feb 1995.
  • [12] P.-C. Hsieh, X. Liu, J. Jiao, I.-H. Hou, Y. Zhang, and P. R. Kumar, “Throughput-optimal scheduling for multi-hop networked transportation systems with switch-over delay,” in Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Mobihoc ’17, (New York, NY, USA), pp. 16:1–16:10, ACM, 2017.
  • [13] S. Paris, A. Destounis, L. Maggi, G. S. Paschos, and J. Leguay, “Controlling flow reconfigurations in sdn,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, pp. 1–9, April 2016.
  • [14] S. Krishnasamy, A. P. T., A. Arapostathis, S. Shakkottai, and R. Sundaresan, “Augmenting max-weight with explicit learning for wireless scheduling with switching costs,” in IEEE INFOCOM 2017 - The 36th Annual IEEE International Conference on Computer Communications, pp. 1–9, May 2017.
  • [15] Y. Xie, Z. Liu, S. Wang, and Y. Wang, “Service Function Chaining Resource Allocation: A Survey,” ArXiv e-prints, July 2016.
  • [16] M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan and Claypool Publishers, 2010.
  • [17] M. J. Neely, “Stability and probability 1 convergence for queueing networks via lyapunov optimization,” Journal of Applied Mathematics, 2012.

In the following, given a time , we denote by the transmission resource-commodity pair scheduled at time on link , and by the processing resource-commodity pair scheduled at time at node . In addition, we denote by the transmission resource-commodity pair that maximizes the weight, at time on link , and by the processing resource-commodity pair that maximizes the weight, , at time at node .

-a Proof of Lemma 1

Proof.

We prove the result by contradiction. Suppose that under the assumption of Lemma 1, there are two or more reconfigurations within the time period . Therefore we may select two consecutive reconfiguration instances with .

Before we proceed with the proof, we state the following lemmas that will be handy in the following. The proofs of these lemmas are given in Appendix C.

Lemma 2.

Given Assumption 2, for any commodities , any nodes , and any , , the (weighted) queue length differential between and can change only by a finite amount over one time slot, given as

(17)

where is the maximum transmission or processing rate, and is the maximum number of incoming or outgoing links over all nodes. We also take and .

Similarly, the change in the maximal queue length differential for transmission on link over one time slot is bounded as

(18)

and the change in the maximal queue length differential for processing in node over one time slot is bounded as

(19)
Lemma 3.

Given Assumption 1, and any fixed , define . Then,

  1. is Lipschitz continuous with Lipschitz constant .

  2. If , then and the maximizing satisfies ; otherwise, and the maximizer is .

In the following, we show that under the assumption Lemma 1, , the weight difference at time can not exceed the threshold . This, hence, contradicts the assumption that Adptive DCNC reconfigures at time .

To do this, starting from (7) we rewrite the transmission weight differential as:

with

and

(20)

where in (20) we have used the fact that, given the assumption that Adaptive DCNC reconfigures at time slots , , during the resource allocation and the transmitted commodity remains and , respectively.

Now, using Lemma 3 (a), we have

(21)

On the other hand we have that: