Design and Analysis of Dynamic Auto Scaling Algorithm (DASA) for virtual EPC (vEPC) in 5G Networks
Abstract
Network Function Virtualization (NFV) enables mobile operators to virtualize their network entities as Virtualized Network Functions (VNFs), offering finegrained ondemand network capabilities. VNFs can be dynamically scalein/out to meet the performance requirements for future 5G networks. However, designing an autoscaling algorithm with low operation cost and low latency while considering the capacity of legacy network equipment is a challenge. In this paper, we propose a VNF Dynamic Auto Scaling Algorithm (DASA) considering the tradeoff between performance and operation cost. We also develop an analytical model to quantify the tradeoff and validate the analysis through extensive simulations. The system is modeled as a queueing model while legacy network equipment is considered as a reserved block of servers. The VNF instances are powered on and off according to the number of job requests. The results show that the proposed DASA can significantly reduce operation cost given the latency upperbound. Moreover, the models provide a quick way to evaluate the costperformance tradeoff without wide deployment, which can save cost and time.
1 Introduction
Cellular networks have been evolved to 4th generation (4G). Long Term EvolutionAdvanced (LTEA) has become a commonly used communication technology worldwide and is continuously expanding and evolving to 5th generation (5G). A recent report states that 95 operators have commercially launched LTEA networks in 48 countries and the total smartphone traffic is expected to rise 11 times from 2015 to 2021 [1]. Accordingly, operators are improving their network infrastructure to increase capacity and to meet the demand for fastgrowing data traffic in 5G networks.
One of the most important technologies for 5G networks is to utilize Network Function Virtualization (NFV) to virtualize the network components in the core network which is called Evolved Packet Core (EPC). The virtualized EPC is commonly referred to as virtual EPC (vEPC) [2]. The emergence of NFV enables operators to manage their network equipment in a finegrained and efficient way [3]. Indeed, legacy network infrastructure suffers from the nature that data traffic usually has peaks during a day while having relative low utilization in the rest of time (e.g., in the midnight). To guarantee the Quality of user Experience (QoE), operators usually leave spare capacities to tackle the peak traffic while deploying network equipment. Accordingly, the network equipment are under low utilization during nonbusy periods. NFV enables operators to virtualize hardware resources. It also makes specialpurpose network equipment toward software solutions, i.e., Virtualized Network Function (VNF) instances. A VNF instance can run on several Virtual Machines (VMs) which can be scaledout/in to adjust the VNF’s computing and networking capabilities to save both energy and resources. Although the idea is just being applied to cellular networks, it has been used in the community of cloud computing. A classic case is Animoto, an imageprocessing service provider, experienced a demand of surging from 50 VM instances to 4000 VM instances (Amazon EC2 instances) in three days in April 2008. After the peak, the demand fell sharply to an average level [4]. Animoto only paid for 4000 instances for the peak time. In future 5G networks, it is expected that there will be heterogeneous types of traffic, including traffic from HumantoHuman (H2H) and MachinetoMachine (M2M) communications. With such diverse traffic types, it is very likely similar case as that in Animoto will also happen to future 5G networks.
Given the fact that autoscaling VNF instance can decrease operation cost while meeting the demand for VNF service, it is critical to design good strategies to allocate VNF instances adaptively to fulfill the demands of service requirements. However, it is not a trivial task. Specifically, the operation cost is reduced by decreasing the number of poweredon VNF instances. On the other hand, resource underprovisioning may cause Service Level Agreement (SLA) violations. Therefore, the goal of a desirable strategy is to reduce operation cost while also maintaining acceptable levels of performance. Thus, a costperformance tradeoff is formed: The VNF performance is improved by scalingout VNF instances while the operation cost is reduced by scalingin VNF instances.
Given that legacy equipment usually is expensive in cellular networks, network operators usually power on network equipment all the time and try to use the equipment as long as they can to maximize Return On Investment (ROI). Thus, nowadays most operators operate old generation systems and new generation systems simultaneously. For example, many operators offer 3G and 4G services at the same time. In future 5G systems, virtualized resources would be added to boost the system performance of legacy 4G systems, leading to the evolution from EPC to vEPC^{1}^{1}1Details will be discussed in Section 2.. It is expected that 5G and legacy systems will coexist.
In this paper, we study the costperformance tradeoff while considering both the VM setup time and legacy equipment capacity. In [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], they either ignore VM setup time or only consider virtualized resource itself while overlooking legacy (fixed) resources. However, this is not practical for future 5G cellular networks as follows:

Although a scaleout request can be sent right way, a VNF instance cannot be available immediately. The lag time could be as long as 10 minutes or more to start an instance in Microsoft Azure and it could be varied from time to time [18]. It could happen that the instance is too late to serve the VNF if the lag time is not taken into consideration.

The capacity of legacy network equipment is another issue. For example, a network operator with legacy network equipment wants to increase network capacities by using NFV technique. The desired solution should consider the capacities of both legacy network equipment and VNFs. If the capacity of a legacy network equipment is only equal to that of one VNF, scalingout from one VNF instance to two VNF instances increases capacity. However, if the capacity of a legacy network equipment is equal to that of 100 VNF, its capacity only grows less than when adding one more VNF instance. Current cloud autoscaling schemes usually ignore this problem which is called nonconstant issue [19]. In other words, the capacity of legacy network equipment has a significant impact on the desired autoscaling solution for future 5G systems.
In this paper, we propose Dynamic Auto Scaling Algorithm (DASA) to solve the problems. In the proposed DASA, we consider that legacy 4G network equipment is powered on all the time as a block, while virtualized resources (VNF instances) are added to or deleted from the system dynamically. To the best of our knowledge, this has not been discussed in any previous literature. The VNF instances are scaled in and out depending on the number of jobs in the system. A critical issue is how to specify a suitable , the number of VNF instances, for the costperformance tradeoff. We propose detailed analytical models to answer this question. The costperformance tradeoff is quantified as operation cost metric and performance metric of which closedform solutions are derived and validated against extensive discreteevent simulations. Moreover, we develop a recursive algorithm that reduces the computational complexity from to , where is the total capacity of the system. Without our algorithm, it is difficult to solve the problem in a short time. Our models enable wide applicability in various scenarios, and therefore, have important theoretical significance. Furthermore, this work offers network operators guidelines to design an optimal VNF autoscaling strategy based on their management policies in a systematical way.
The rest of this paper is organized as follows. In Section 2, we briefly introduce some background on mobile networks and NFV architecture. Section 3 reviews the related work. Challenges and contributions are addressed in Section 4. In Section 5, we presents the proposed DASA for VNF autoscaling applications, followed by numerical results illustrated in Section 6. Section 7 concludes this paper.
2 Background
A cellular network typical is composed of Radio Access Network (RAN) and Core Network (CN) as shown in Fig. 1 [20]. Starting from release 8 in 3GPP, the RAN and CN are referred to as EvolvedUTRAN (EUTRAN) and EPC, respectively. The main target of NFV we consider in this paper is to virtualize the functions in the EPC. Here, we use an example to explain EPC and vEPC when NFV is deployed. Fig. 1 shows a simplified example of NFV enabled LTE architecture which consists of RAN, EPC, and external Packet Data Network (PDN). In particular, the EPC is composed of legacy EPC and vEPC. In the following, we brief introduce them.
2.1 Legacy EPC
In EUTRAN, a User Equipment (UE) connects to EPC through an eNB, which essentially is a base station. Here, we only show basic functions in EPC, including Serving Gateway (SGW), PDN Gateway (PGW), Mobility Management Entity (MME), and Policy and Charging Rules Function (PCRF) in the EPC. Please refer to [21] for details.
The PGW is a gateway providing connectivity between EPC and an external PDN. The SGW is responsible for user data functions enabling routing and packet forwarding to the PGW. MME handles UE mobility and other control functions. PCRF is a policy and charging control element for policy enforcement, flowbased charging, and service data flow detection.
2.2 vEPC
Generally, the vEPC can be divided into two main components: NFV Management and Orchestration and Element Manager (EM) which are parts of 3GPP Management Reference Model [22].
2.2.1 NFV Management and Orchestration
The NFV Management and Orchestration consists of orchestrator, VNF manager, and Virtualized Infrastructure Manager (VIM). It controls the lifecycles of VNFs and decides whether a VNF should be scaledout/in/up/down. Additionally, it manages both hardware and software resources to support VNFs. In other words, it can be considered as a bridge between network resources and VNFs. The actions of VNF scaling are described as follows.

VNF scalein/out: As shown in Fig. 2, a VNF can have many VNF instances, inside which there may be many VMs. VNF scaleout refers to increase the number of VNF instances. In contrast, VNF scalein is an action to remove existing VNF instances in a sense that virtualized hardware resources are freed and no longer needed.

VNF scaleup/down: VNF scaleup allocates more VMs into an existing VNF instance. Whereas, VNF scaledown releases some VMs from an existing VNF instance.
2.2.2 Element Manager (EM)
3GPP introduces NFV management functions and solutions for mobile core networks based on ETSI NFV specification [22]. In vEPC, each Network Element (NE) in legacy EPC such as SGW, PGW, MME, and PCRF is virtualized as a VNF. As shown in Fig. 1, a Network Manager (NM) provides enduser functions for network management for each NE. Element Manager (EM) is responsible for the management of a set of NMs.
2.3 VNF Instance Scaling Procedures
VNF manager allocates resources by using two scaling procedures: VNF instance expansion (scaleout and scaleup) procedure to add resources to a VNF, and VNF instance contraction (scalein and scaledown) procedure to release the resources from a VNF.
2.3.1 VNF instance expansion procedure
Fig. 3 illustrates the VNF instance expansion procedure. Here we briefly describe the flows. Please refer to [22] for details.

Step 1: NM/EM (via NFV Orchestrator, NFVO) sends capability expansion request to the VNF Manager, see 1(a), 1(b), and 1(c).

Step 2: The VNF Manager sends lifecycle change notification to EM and NFVO indicating the start of the scaling operation.

Step 314: The VNF Manager sends a request to the NFVO for the VNF expansion. The NFVO then checks whether free resources are available and send ACK/NACK to the VNF Manager for VNF expansion.

Step 15: EM configures the VNF with application specific parameters.

Step 16: EM notifies the newly updated and configured capacity to NM.
2.3.2 VNF instance contraction procedure
The idea of contraction procedure is similar to that of expansion procedure. Please refer to [22].
3 Related Work
In cloud computing community, autoscaling strategies have been studied intensively [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 23, 24, 25, 26, 27]. To deal with the delay in VM setup, researchers have proposed various approaches to predict the VM load in order to boot VMs before existing VMs are overloaded. The approaches include Exponential weighted Moving Average (EMA) [5, 6], AutoRegressive Moving Average (ARMA) [7, 8], AutoRegressive Integrated Moving Average (ARIMA) [9, 10, 11, 12], machine learning [13, 14, 15], Markov model [16, 17], and queueing model [23, 24, 25, 26, 27].
The basic idea of EMA, ARMA, and ARIMA is moving average, where the most recent input data within a moving window are used to predict the next input data. Specifically, in [5], the authors proposed an EMAbased scheme to predict the CPU load. The scheme was implemented in Domain Name System (DNS) server and evaluation results showed that the capacities of servers are well utilized. The authors of [6] introduced a novel predictionbased dynamic resource allocation algorithm to scale video transcoding service in the cloud. They used a twostep prediction to predict the load, resulting in a reduced number of required VMs.
ARMA adds autoregressive (AR) into moving average. A resource allocation algorithm based on ARMA model was reported in [7], where empirical results showed significant benefits both to cloud users and cloud service providers. In [8], the authors addressed a load forecasting model based on ARMA, which achieved around prediction error rate and saved up to hardware resources compared with random contentbased distribution policy.
Unlike ARMA and ARIMA which differentiate input data, the authors of [9] proposed a predictive and elastic cloud bandwidth autoscaling system considering multiple data centers. This is the first work for linear scaling from multiple cloud service providers. The work of [10] took the VM migration overhead into account when designing their autoscaling scheme, where extensive experiments were conducted to demonstrate the performance. In [11], a new problem of dynamic workload fluctuations of each VM and the resource conflict handling were addressed. The authors further proposed an ARIMAbased server state predictor to adaptively allocate resource to VMs. Experiments showed that the state predictor achieved excellent prediction results. Another ARIMAbased workload prediction scheme was proposed in [12], where real traces of requests to web servers from the Wikimedia Foundation was used to evaluate its prediction accuracy. The results showed that the model achieved up to accuracy.
Machine learning approaches are also used for the design of cloud autoscaling algorithms [13, 14, 15]. The authors of [13] proposed a neural network and linear regression based autoscaling algorithm. The author of [14] implemented a Bayesian Network based cloud autoscaling algorithm. In [15], the authors evaluated three machine learning approaches: linear regression, neural network, and Support Vector Machine (SVM). Their results showed that SVMbased scheme outperforms the other two.
Markov model also has been widely used in cloud autoscaling algorithms [16, 17]. The authors of [16] developed CloudScale, an automatic elastic resource scaling system for multiple cloud service providers, saving total energy consumption and workload energy consumption with little impact on the application performance. In [17], the authors proposed a novel multiple time series approach based on Hidden Markov Model (HMM). The technique well characterized the temporal correlations in the discovered VM clusters to predict variations of workload patterns.
However, the mechanisms in [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] either ignore VM setup time or only consider virtualized resource itself while overlooking legacy (fixed) resources. As aforementioned discussion, this is not practical for future 5G cellular networks. Perhaps the closest models to ours were studied in [23, 24, 25, 26, 27] that both the capacities of fixed legacy network equipment and dynamic autoscaling cloud servers are considered. The authors of [23, 24] considered setup time without defections [23] and with defections [24]. Our recent work [26] relaxes the assumption in [23, 24] such that after a setup time, all of the cloud servers in the block are active concurrently. We further consider a more realistic model that each server has an independent setup time. However, in [23, 24, 26], all of the cloud servers were assumed as a whole block, which is not practical because each cloud server should be allowed to scaleout/in individually and dynamically. In [25, 27], it was relaxed to subblocks without considering all cloud servers as a whole block. However, either setup time is ignored [25], or legacy network capacity is not considered [27].
4 Challenges and Contributions
In this section, we summarize the challenges. How do we tackle the challenges and our contributions are also discussed.

The first challenge lies in the tradeoff between the operation cost and system performance, which is referred to as costperformance tradeoff. Keeping redundant poweredon VNF instances increases system performance and QoE. When a job arrives at the system, redundant poweredon VNF instances can serve the job immediately, which reduces job waiting time. On the other hand, the redundant poweredon VNF instances lead to extra operation cost. In this paper, we develop an analytical model to quantify the tradeoff. Given the analytical model, operators can quickly obtain the operation cost and system performance without real deployment to save cost and time.

The second challenge is how to count the capacity of legacy equipment and how to choose a suitable value for the number of VNF instances to balance the costperformance tradeoff. As aforementioned discussion, when the capacity of legacy equipment is counted as 10 VNF instances, a new powerup VNF instance increases system capability. Whereas, only capability is added if the legacy equipment are considered as 100 VNF instances. In addition, the powerup process is not instantaneous. During this setup process, the VNF instances consume power but cannot serve jobs. Based on our proposed analytical model, one can easily obtain the impacts of the capacity of legacy equipment and the number of VNF instances to minimize the cost function.

The third challenge is to propose a lightweight analytical model to quantify the tradeoff. In general, the computational cost to solve a Markov chain with states by a naive algorithm is . Thus, when is large, it is difficult to solve it in a short time. In our proposed analytical model, we propose a novel recursive algorithm to reduce the computational complexity from to , which is the same as the number of states of the Markov chain. The reduction is significant.

Another challenge is how to adjust the autoscaling algorithm in terms of different weighting factors for operation cost and system performance. Because different mobile operators may have different management policies and operational interests, the weighting factors should be determined by a mobile operator. The adjustment of the algorithm according to different weighting factor is critical and nontrivial. Our proposed autoscaling algorithm takes the weighting factors into consideration.
5 Proposed Dynamic AutoScaling Algorithm (DASA)
In this section, we first introduce the system model. We then discuss the proposed DASA. The parameters used in the model are listed in Table I.
5.1 System Model
We consider a 5G EPC comprised of both legacy network entities (e.g., MME, PCRF) and VNFs. A VNF, consisting of VNF instances, offers finegrained ondemand network capabilities to its corresponding legacy network entity. As shown in Fig. 4, we assume that the capacity of the legacy network entity equals to numbers of VNF instances. The total capacity of the system is numbers of VNF instances (), which can be adjusted adaptively depending on the number of . We assume that and (). It should be noted that . User request arrives with rate . A VNF instance accepts one job at a time with service rate . There is a limited FirstComeFirstServed (FCFS) queue for those requests that have to wait to be processed. The legacy network equipment is always on while VNF instances will be added (or removed) according to the number of waiting jobs in the queue. It is worth to mention that the VNF instances need some setup time to be available so as to process waiting requests. During the setup time, the VNF instance consumes power^{2}^{2}2Here, the power may include server operation cost or VM cost charged by a cloud service provider. but cannot serve jobs.
the total capacity of the system  

the maximum number of jobs can be accommodated in the system  
the number of VNF instances  
average response time per job  
average response time in the queue per job  
average VNF cost  
weighting factor for  
weighting factor for  
the capacity of a legacy network entity  
the up threshold to control the VNF instances  
the down threshold to control the VNF instances  
job arrival rate  
service rate for each VNF instance  
setup rate for each VNF instance 
5.2 Cost Function
Our goal is to design the best autoscaling strategy to minimize operation cost while providing acceptable levels of performance. We use two thresholds, up and down, or and , to denote the control of the VNF instances, where .

, power up the th VNF instance: If the th VNF instance is turned off and the number of requests in the system increases from to , the VNF instance is powered up after a setup time to support the system. During the setup time, a VNF instance cannot serve user requests, but consumes power (or money for renting cloud services). Here, we specify .

, power down the th VNF instance: If the th VNF instance is operative, and the number of requests in the system drops from to , the VNF instance is powered down instantaneously. Here, we define .
While powering up/down the VNF instance, the question is how many VNF instances we need such that the cost is minimized while the required level of performance is also met. Here, the cost function could be evaluated by two metrics: the average response time in the queue per request, , and the average cost of VNF instances, . The mathematical formulation of the cost function^{3}^{3}3A variance of the cost function with more parameters can be found in Appendix. can be written as:
(1)  
subject to 
where is the upper bound of , which can be determined by mobile operators according to their business policies. The coefficients of and denote the weighting factors for and , respectively. Increasing (or ) emphasizes more on (or ). Here, we do not specify either or because such a value should be determined by mobile operators and should take management policies into consideration. An algorithm for finding the optimal solution is introduced in Section 5.4 if and are specified.
5.3 Derivation of Cost Function
We model the system as a queueing model with servers divided into two blocks: fixed block and dynamic block. The servers in fixed block are always on (refer to the capacity of legacy equipment). The dynamic block denotes VNF instances in which servers are in either BUSY, OFF, or SETUP state. The queueing model has a capacity of , i.e., the maximum number of jobs can be accommodated in the system is . Job arrivals follow Poisson distribution with rate . A VNF instance, which is referred to as a server in the queueing system, accepts one job at a time, and its service rate follows exponential distribution with rate . There is a limited FCFS queue for those jobs that have to wait to be processed.
In dynamic block, a server is turned off immediately if it has no job to serve. Upon arrival of a job, an OFF server is turned on if the job is placed in the buffer. However, a server needs some setup time to be active in order to serve waiting jobs. We assume that the setup time follows exponential distribution with mean . Let denotes the number of customers in the system and denotes the number of active servers in the dynamic block. The number of servers in SETUP status is . We assume that waiting jobs are served according to FCFS. We call this model an //// queue.
Here, we present a recursive algorithm to calculate the joint stationary distribution. Let and denote the number of active servers in the dynamic block and the number of customers in the whole system, respectively. It is easy to see that forms a Markov chain on the state space:
(2) 
Fig. 5 shows the transition among states for the case where , , , and . Let
(3) 
denote the joint stationary distribution of . Here, we derive a recursion to calculate the joint stationary distribution , .
First, we consider a recursion for (). The balance equations for states with are given as follows:
(4)  
(5)  
(6) 
leading to:
(7) 
The sequence is given as follows:
(8) 
and
(9)  
where
Furthermore, it should be noted that is calculated using the local balance equation in and out the set as follows:
(10) 
Remark.
We have expressed () and in terms of .
We consider the general case for where . Lemma 1 below shows that for a fixed , can be expressed in terms of (). As a result, () is expressed in terms of .
Lemma 1.
We have:
(11) 
where
(12)  
(13) 
and
(14) 
Proof.
The balance equation for state is given as follows:
(15) 
It leads to the fact that Lemma 1 is true for . Assuming that:
(16) 
Substituting this into the next balance equation:
(17)  
we obtain:
(18) 
∎
Remark.
In Corollary 1 below, we will show that and are positive. Thus, the recursive algorithm is stable because it manipulates only positive numbers. Furthermore, we can also prove that is bounded from above. Although we cannot obtain an explicit upper bound for , from numerical experiments, we observe that is not so large. One reason may be that the coefficient of in (19) is less than 1. These upper bounds are the rationale for the stability in our recursive algorithm because we deal with numbers that are not too large so that overflow is avoided.
Corollary 1.
We have the following bound.
(19)  
(20) 
for .
Proof.
It should be noted that is calculated using the following local balance equation in and out the set of states:
(22) 
as follows:
(23) 
Remark.
We have expressed () and in terms of .
Finally, we consider the case . The balance equation for state () leads to Lemma 2.
Lemma 2.
We have:
(24) 
where
(25)  
(26)  
and
(27) 
Proof.
The global balance equation in state is given by:
(28) 
leading to:
(29) 
Assuming that , it follows from this formula and the global balance equation in state :
(30)  
that for . ∎
Corollary 2.
We have the following bound.
(31)  
Proof.
We have expressed all of the probabilities () in terms of which is uniquely determined by the normalizing condition.
(33) 
Remark.
In summary, we can calculate all probabilities () in the following order. First, we set . We then calculate all the probabilities () using Equation (7). Next, is calculated using (10). After that, we apply Lemma 1 and (23) repeatedly for . At this point, for and are obtained. Furthermore, we use Lemma 2 in order to obtain for . Finally, we divide all () by in order to get the stationary distribution.
Let denote the mean number of jobs in the system. We have:
(34) 
It follows from Little’s law that:
(35) 
Therefore, we obtain:
(36) 
Let denote the blocking probability. We have:
(37) 
The mean number of VNF instances is given by:
(38) 
where the first term is the number of VNF instances that are active already while the second term is the mean number of VNF instances in setup mode.
Summary of the derivation: In this section, we have developed a mathematical model to derive the metrics and in the cost function (1), where and are shown in Equations (36) and (38), respectively. Given the closed forms, one can easily find the optimal to balance the cost function if the rest of parameters and are given. The reason is that and are the functions of those parameters. We have:
(39)  
subject to 
We can find the local maximum/minimum of the cost function at point when is satisfied. In addition, at point has the local minimum if . The optimal is then obtained. In next section, we will use as an example to show how to use the derived metrics to decide the optimal and the number of VNF instances according to their weighting factors. Please note that we can also set as other parameters, such as , which can also be easily applied to the algorithm introduced in the next section.
Moreover, other metrics such as , , and are also given in Equations (37), (35), and (34), which can be used for variants of the cost function (See Appendix). The derivations in this section and Appendix are generic models which can be easily extended to any number of metrics. The changes in the number of metrics will not alter our analysis although they may affect the optimal policies set by operators.
It is also worth to mention that we have solved a system of = unknown variables. The computational complexity by a conventional method is . It is easy to see that the computational complexity of our recursive algorithm is only . Furthermore, our algorithm is numerically stable since it manipulates positive numbers.
5.4 Algorithm for Deciding
Given the analytical model above, one can quickly obtain the operation cost and system performance and design optimal strategies without real deployment to save cost and time. Without our model, it is difficult to obtain the results in a short time. For instance, even in our simplified simulation settings (few arrival rate) in Section 6, it is still very timeconsuming to get simulation results, e.g., tens of hours per simulation.
We propose Algorithm 1 for operators to specify based on the weighting factors. For ease of understanding, we use two weighting factors and two parameters, i.e., the cost function in (1). The Algorithm 1 takes input , , and outputs the optimal value . We denote and as the maximum values of and in the system. Note that and are the constraints set by the operators. Initially, we set to 0 and is bound by . As starts from 0, the ratio of increases in every loop accordingly. The loop does not stop until it finds the lowest value for .
Fig. 6 illustrates a graphical plot of and to demonstrate how to get for different settings. The three dotted black lines are with different weighting factors (). Each point in the blue curve is depicted from paired and associated with different when . The blue curve is then plotted with different values of . Similarly, we can plot the brown curve when . The intersections of the dotted lines and the curves are then the optimal values of with the chosen parameters. Take the brown curve () as an example, the optimal values of are , , and for , , and , respectively. Please note we simply use Fig. 6 to explain the idea. Operators can use Algorithm 1 to get the optimal value of .
6 Numerical Results
In this section, we show the numerical results. The analytical results in Section 5 are validated by extensive simulations by using ns2, version 2.35 [28]. In simulation, we use real measurement results for parameter configuration: by Facebook data center traffic [29], by the base service rate of a Amazon EC2 VM [30], and by the average VM startup time [31]. If not further specified, the following parameters are set as the default values for performance comparison: , , , , (see Table 1 for details). There were million job requests generated during the simulations. Please note those parameters can be replaced by other values. We simply use them to validate our mathematical model and demonstrate the numerical results.
Figs. 78 illustrate both the simulation and analytical results in terms of average VNF cost and average response time in a queue per job , respectively. In the figures, the lines denote analytical results and the points represent simulation results. Each simulation result in the figures is the mean value of the results in 300,000 seconds with 95% confidence level. In the following sections, we show the impacts of , , , , on the performance metrics and , respectively.
6.1 Impacts of arrival rate,
Figs. 6(a)6(d) show the impacts of on . Generally, one can see that is 0 at the beginning. It then grows sharply, and later raises smoothly and reaches at a bound when increases. The reasons are as follows. When , the incoming jobs are handled by the legacy equipment. No VNFs are turned on. Later, VNFs are turned on as approaches to . Accordingly, the server cost increases as grows. When , stops growing. Because all of the available VNFs are turned on, is bounded as VNFs’ costs.
Figs. 7(a)7(d) illustrate the impacts of on . Interestingly, the trend of the curves can generally be divided into three phases^{4}^{4}4In Figs. 7(b) and 7(d), only two phases are displayed due to the range of . Given a larger , all of the three phases will appear.: ascent phase, descent phase, and saturation phase. In the first phase, grows sharply due to the setup time of VNFs. Specifically, when , is almost 0 because all jobs are handled by legacy equipment. As approaches to and then is larger than , VNFs start to be turned on. In this phase, however, still raises due to the setup time of VNFs. The reason is that VNFs just start to be turned on and do not reach their full capacities. In the second phase, we can see that starts to descend because the VNFs start serving jobs. In the third phase, however, starts to ascend again and then saturate at a bound. The reason of ascent is that when , the system is not able to handle the coming jobs. Finally, the curves go to saturation because the capacity of the system is too full to handle the jobs and the value of is limited by .
6.2 Impacts of the number of VNF instances,
Fig. 6(a) shows the impacts of on . All of the four curves increase and then reach a bound when increase. A larger leads to a bigger gap between the initial point and the bound. When increases, a larger means that more VNFs can be used to handle the growing job requests. Therefore, increases accordingly. If an operator wants to bound VNF budget to , the operator can specify a suitable based on (38).
Fig. 7(a) illustrates the impacts of on . The impacts of are shown as the length of the second phase as discussed in Sec. 6.1. The length of the second phase prolongs when increases. A larger gives the system more capability to handle the riseing job requests. That is, it delays the time that the system capacity reaches its bound. If an operator wants to bound the response time to of job request, the operator can choose a suitable based on (36).
6.3 Impacts of VNF setup rate,
Recall that is the setup rate of VNFs. To change setup rate, one can adjust resources (e.g., CPU, memory) for VNFs. Fig. 6(b) shows the impacts of on . The impacts of are shown as the slope of the curves. A larger means smaller slope, but has no effects at the beginning and the end of the curves. The reasons are as follows. A larger means smaller VNF setup time. A smaller VNF setup time helps VNF to be turned on and to handle jobs faster so that the system is more efficient than that with larger VNF setup time.
Fig. 7(b) illustrates the impacts of on . Again, the impacts of are shown as the slope of curves. A larger leads to smaller slopes. Also, decides the maximum value of . The reason is that smaller setup time enables VNF to handle jobs faster.
6.4 Impacts of system capacity,
Fig. 6(c) and Fig. 7(c) depict the impacts of on and , respectively. Based on our observation on Fig. 6(c), has limited impacts on . As discussed in Sec. 6.1, is mainly decided by . As shown in Fig. 7(c), the impacts are significant on . Different makes huge gaps between the curves. The curves also has three phases. A larger leads to a larger . The reason is that it enables more jobs waiting in the queue rather than dropping them.
6.5 Impacts of legacy equipment capacity,
Fig. 6(d) and Fig. 7(d) illustrate the impacts of on and , respectively. We observe that the curves initiate at 0, and then fix at 0 for a period and start to grow when increases. The value of decides the length of the period when the curves start to grow. The reason is that the legacy equipment can handle jobs within its capacity. When exceeds the capacity of the legacy equipment, both and start to increase.
6.6 Summary of Sections 6.1–6.5
Overall, Figs. 78 not only demonstrate the correctness of our analytical model, but also show the impacts of , , , , , on the performance metrics and . Moreover, although service time is assumed to be exponential distributed, the proposed analytical model is also compatible for service time with deterministic, normal, uniform, Erlang, Gamma distribution. Due to page limit, more simulation results in terms of service time with various distributions are given in [32]. Accordingly, the analytical model enables wide applicability in various scenarios. With our model, operators can quantify the performance easily. Therefore, it has important theoretical significance.
6.7 Relation between and
In Figs. 78, we have demonstrated the correctness of our model. Now, we show how to obtain the optimal in various situations using the proposed model.
Figs. 9(a) and (b) illustrate the results of two metrics in yaxis. The left yaxis in blue color is the cost function specified in (39), which is corresponding to the blue curved line in the figure. The right yaxis in red color is the average response time in the queue, , which is corresponding to the dotted curved line in red color. The xaxis is the number of VNF instances, .
Let us first take a look at the red dotted line in Fig. 9(a). It is shown that decreases when grows. Increasing will make arriving jobs experience lower waiting time and better QoS, and decrease SLA violation. Specifically, we observe that first declines sharply, and then at a point (around ) it starts to decrease slowly. At the same point, however, the cost function shown in blue curved line starts to grow sharply, leading to higher cost. The optimal is 28 when (job/s). By applying the optimal , the corresponding is 1.17 s (see the green dotted straight line). If a mobile operator defines a latency greater than 1.17 s as SLA violation, that is, in (39) is set to be greater than 1.17 s, we can just set as 28 to perfectly balance and fulfill the latency requirement. Otherwise, we can use the green dotted line to find the which is corresponding to the latency less than the requirement set by the operator. In this case, although the is not the minimum, the is the optimal value that can satisfy the latency requirement. Similarly, one can find another example of setting the optimal with in Fig. 9(b).
Here, we only demonstrate that any latency requirement can be met by setting the optimal value by using (39). Please note that other parameters, such as , , , and in can be easily applied to (39) or (LABEL:eq:optimal_xxx) using the same method. Due to space limit, we do not explain it in this paper.
7 Conclusions
In this paper, we propose DASA to address the tradeoff between performance and operation cost. We develop analytical and simulation models to study the average job response time and operation cost . The results quantify the performance metrics. Our model fills the research gap by taking both VNF setup time and the capacity of legacy equipment into consideration in vEPC. Besides, we propose a novel recursive algorithm to reduce the computational complexity significantly. Our study provides guidelines for mobile operators to bound their operation cost and configure job response time in a systematic way. Based on our performance study, operators can design optimization strategies to quickly obtain the operation cost and system performance without real deployment to save cost and time.
For future work, one extension is to generalize the VNF setup time, and the arrival time and service time. Right now, there is no literature to support that when they are exponential random variables. These results could be generalized by Markovian Arrival Processes [33] or approximated by using orthogonal polynomial approaches [34]. Also, we plan to relax the assumption of VNF scaling in/out capability, that is, from one VNF instance per time to arbitrary numbers of instances per time. Moreover, another extension is to consider impatient customer. That is, waiting jobs have timelimit. After that, they will be discarded from the system.
Acknowledgments
We especially thank ZhengWei Yu and YiHao Lin for their help in our simulations.
References
 [1] Ericsson, “Mobility report on the pulse of the network society,” Ericsson, Tech. Rep., Nov. 2015.
 [2] ETSI GS NFVINF 001 v.1.1.1, Network Functions Virtualisation (NFV); Infrastructure Overview, ETSI Std., Jan. 2015.
 [3] H. Hawilo, A. Shami, M. Mirahmadi, and R. Asal, “NFV: state of the art, challenges, and implementation in next generation mobile networks (vEPC),” IEEE Network, vol. 28, no. 6, pp. 18–26, Nov./Dec. 2014.
 [4] Animoto’s facebook scaleup. [Online]. Available: http://blog.rightscale.com/2008/04/23/animotofacebookscaleup/.
 [5] Z. Xiao, W. Song, and Q. Chen, “Dynamic resource allocation using virtual machines for cloud computing environment,” IEEE Trans. Parallel and Distributed Systems, vol. 24, no. 6, pp. 1107–1117, 2013.
 [6] F. Jokhio, A. Ashraf, S. Lafond, I. Porres, and J. Lilius, “Predictionbased dynamic resource allocation for video transcoding in cloud computing,” in Proc. 21st Euromicro Int’l Conf. Parallel, Distributed and NetworkBased Processing (PDP), 2013, pp. 254–261.
 [7] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud using predictive models for workload forecasting,” in Proc. IEEE Int’l Conf. Cloud Computing (CLOUD), 2011, pp. 500–507.
 [8] J. M. Tirado, D. Higuero, F. Isaila, and J. Carretero, “Predictive data grouping and placement for cloudbased elastic server infrastructures,” in Proc. 11th IEEE/ACM Int’l Symp. Cluster, Cloud and Grid Computing (CCGrid), 2011, pp. 285–294.
 [9] D. Niu, H. Xu, B. Li, and S. Zhao, “Qualityassured cloud bandwidth autoscaling for videoondemand applications,” in Proc. IEEE INFOCOM, 2012, pp. 460–468.
 [10] Q. Huang, S. Su, S. Xu, J. Li, P. Xu, and K. Shuang, “Migrationbased elastic consolidation scheduling in cloud data center,” in Proc. IEEE 33rd Int’l Conf. Distributed Computing Systems Workshops (ICDCSW), 2013, pp. 93–97.
 [11] Q. Huang, K. Shuang, P. Xu, J. Li, X. Liu, and S. Su, “Predictionbased dynamic resource scheduling for virtualized cloud systems,” Journal of Networks, vol. 9, no. 2, pp. 375–383, 2014.
 [12] R. Calheiros, E. Masoumi, R. Ranjan, and R. Buyya, “Workload prediction using ARIMA model and its impact on cloud applications’ QoS,” IEEE Trans. Cloud Computing, vol. 3, no. 4, pp. 449 – 458, Aug. 2014.
 [13] S. Islam, J. Keung, K. Lee, and A. Liu, “Empirical prediction models for adaptive resource provisioning in the cloud,” Future Generation Computer Systems, vol. 28, no. 1, pp. 155–162, 2012.
 [14] A. Bashar, “Autonomic scaling of cloud computing resources using BNbased prediction models,” in Proc. IEEE 2nd Int’l Conf. Cloud Networking (CloudNet), 2013, pp. 200–204.
 [15] A. A. Bankole and S. A. Ajila, “Cloud client prediction models for cloud resource provisioning in a multitier web application environment,” in Proc. IEEE 7th Int’l Symp. Service Oriented System Engineering (SOSE), 2013, pp. 156–161.
 [16] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “Cloudscale: elastic resource scaling for multitenant cloud systems,” in Proc. 2nd ACM Symposium on Cloud Computing, 2011, p. 5.
 [17] A. Khan, X. Yan, S. Tao, and N. Anerousis, “Workload characterization and prediction in the cloud: A multiple time series approach,” in Proc. IEEE NOMS, 2012, pp. 1287–1294.
 [18] Z. Hill, J. Li, M. Mao, A. RuizAlvarez, and M. Humphrey, “Early observations on the performance of Windows Azure,” in Proc. 19th ACM HPDC, Jun. 2010, pp. 367–376.
 [19] M. Mao, J. Li, and M. Humphrey, “Cloud autoscaling with deadline and budget constraints,” in Proc. IEEE/ACM Int’l Conf. on Grid Computing (Grid), 2010, pp. 41–48.
 [20] J.C. Chen and T. Zhang, IPbased nextgeneration wireless networks: systems, architectures, and protocols. John Wiley & Sons, 2004.
 [21] 3GPP TS 23.002 V13.6.0, Network architecture (Release 13), 3GPP Std., Jun. 2016.
 [22] 3GPP TR 32.842 V13.1.0, “Telecommunication management; Study on network management of virtualized networks (Release 13),” Tech. Rep., Dec. 2015.
 [23] I. Mitrani, “Managing performance and power consumption in a server farm,” Annals of Operations Research, vol. 202, no. 1, pp. 121–134, 2013.
 [24] I. Mitrani, “Service center tradeoffs between customer impatience and power consumption,” Elsevier Performance Evaluation, vol. 68, no. 11, pp. 1222 – 1231, Nov. 2011.
 [25] I. Mitrani, “Trading power consumption against performance by reserving blocks of servers,” Springer Computer Performance Engineering, vol. LNCS 7587, pp. 1–15, 2013.
 [26] J. Hu and T. PhungDuc, “Power consumption analysis for data centers with independent setup times and threshold controls,” in Proc. Int’l Conf. Numerical Analysis And Applied Mathematics (ICNAAM’14).
 [27] T. PhungDuc, “Multiserver queues with finite capacity and setup time,” in Proc. 22nd. Int’l Conf. Analytical and Stochastic Modelling Techniques and Applications, ASMTA’15, May 2015, pp. 173–187.
 [28] ”The network simulator  ns2.” Available: http://www.isi.edu/nsnam/ns/.
 [29] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter) network,” in Proc. ACM SIGCOMM, 2015.
 [30] M. Gilani, C. Inibhunu, and Q. H. Mahmoud, “Application and network performance of Amazon elastic compute cloud instances,” in Proc. IEEE 4th Int’l Conf. Cloud Networking (CloudNet), 2015, pp. 315–318.
 [31] M. Mao and M. Humphrey, “A performance study on the VM startup time in the cloud,” in Proc. IEEE 5th Int’l Conf. Cloud Computing (CLOUD), 2012, pp. 423–430.
 [32] Y. Ren, T. PhungDuc, and J.C. Chen, “Design and analysis of Dynamic Auto Scaling Algorithm (DASA) for 5G mobile networks,” National Chiao Tung University, Tech. Rep., 2017. [Online]. Available: https://arxiv.org/abs/1604.05803
 [33] J. Pender and Y. M. Ko, “Approximations for the queue length distributions of timevarying manyserver queues,” Cornell, Tech. Rep., 2016.
 [34] J. Pender, “Gram charlier expansion for time varying multiserver queues with abandonment,” SIAM Journal on Applied Mathematics, vol. 74, no. 4, pp. 1238–1265, 2014.
Yi Ren (S’08M’13) has been an Assistant Researcher at National Chiao Tung University (NCTU), Taiwan since 2012. He received his Ph.D. in Information Communication and Technology from the University of Agder (UiA), Norway in April 2012. His current research interests include security and performance analysis in wireless sensor networks, ad hoc, and mesh networks, LTE, smart grid, and ehealth security. He received the Best Paper Award in IEEE MDM 2012. 
Tuan PhungDuc is an Assistant Professor at Faculty of Engineering, Information and Systems, University of Tsukuba. He received a Ph.D. in Informatics from Kyoto University in 2011. He is in the Editorial Board of the KSII Transactions on Internet and Information Systems and two other international journals. He served a Guest Editor of the special issue of Annals of Operations Research on Retrial Queues and Related Models and currently is serving as a Guest Editor of the Special Issue of the same journal on Queueing Theory and Network Applications. He was the Chairman of 10th International Workshop on Retrial Queues (WRQ’2014) and the TPC cochair of 23rd International Conference on Analytical and Stochastic Modelling Techniques and Applications (ASMTA’16). Dr. PhungDuc received the Research Encourage Award from The Operations Research Society of Japan in 2013. His research interests include Applied Probability, Stochastic Models and their Applications in Performance Analysis of Telecommunication and Service Systems. 
JyhCheng Chen (S’96M’99SM’04F’12) received the Ph.D. degree from the State University of New York at Buffalo, USA, in 1998. He was a Research Scientist with Bellcore/Telcordia Technologies, Morristown, NJ, USA, from 1998 to 2001, and a Senior Scientist with Telcordia Technologies, Piscataway, NJ, USA, from 2008 to 2010. He was with the Department of Computer Science, National Tsing Hua University (NTHU), Hsinchu, Taiwan, as an Assistant Professor, an Associate Professor, and a Professor from 2001 to 2008. He was also the Director of the Institute of Network Engineering with National Chiao Tung University (NCTU), Hsinchu, from 2011 to 2014. He has been a Faculty Member with NCTU since 2010. He is currently a Distinguished Professor with the Department of Computer Science, NCTU. He is also serving as the Convener, Computer Science Program, Ministry of Science and Technology, Taiwan. Dr. Chen received numerous awards, including the Excellent Teaching Award from NCTU, the Outstanding I. T. Elite Award, Taiwan, the Mentor of Merit Award from NCTU, the K. T. Li Breakthrough Award from the Institute of Information and Computing Machinery, the Outstanding Professor of Electrical Engineering from the Chinese Institute of Electrical Engineering, the Outstanding Research Award from the Ministry of Science and Technology, the Outstanding Teaching Award from NTHU, the best paper award for Young Scholars from the IEEE Communications Society Taipei and Tainan Chapters, and the IEEE Information Theory Society Taipei Chapter, and the Telcordia CEO Award. He is a Fellow of the IEEE and a Distinguished Member of the ACM. He is a member of the Fellows Evaluation Committee, IEEE Computer Society, 2012 and 2016. 
See pages  of appendix.pdf