Design and Analysis of Deadline and Budget Constrained Autoscaling (DBCA) Algorithm for 5G Mobile Networks
Abstract
In cloud computing paradigm, virtual resource autoscaling approaches have been intensively studied recent years. Those approaches dynamically scale in/out virtual resources to adjust system performance for saving operation cost. However, designing the autoscaling algorithm for desired performance with limited budget, while considering the existing capacity of legacy network equipment, is not a trivial task. In this paper, we propose a Deadline and Budget Constrained Autoscaling (DBCA) algorithm for addressing the budgetperformance tradeoff. We develop an analytical model to quantify the tradeoff and crossvalidate the model by extensive simulations. The results show that the DBCA can significantly improve system performance given the budget upperbound. In addition, the model provides a quick way to evaluate the budgetperformance tradeoff and system design without wide deployment, saving on cost and time.
1 Introduction
The emergence of Network Functions Virtualization (NFV) is changing the way of how mobile operators increase the capacities of their network infrastructures. NFV offers finegrained ondemand adjustment of network capabilities. Virtualized Network Functions (VNFs) can be scaledout/in (turn on/off) to adjust computing and network capabilities for saving energy and resources. A classic case is Animoto, an imageprocessing service provider, experienced a demand surging from 50 VM instances to 4,000 instances in three days, April 2008. After the peak, the demand fell sharply to an average level. Animoto only paid for 4,000 instances for the peak time [1].
Designing good autoscaling strategies for budget constraints while meeting performance requirements is challenging. In particular, operation cost is decreased by reducing the number of poweron VNF instances. On the other hand, resource underprovisioning may cause Service Level Agreements (SLAs) violations, leading to low Quality of user Experience (QoE). Therefore, the goal of desired autoscaling strategies is to meet the budget constraint while maintaining an acceptable level of performance. Then, a budgetperformance tradeoff is formed: The system performance is improved by adding more VNF instances while operation cost is reduced by the opposite way.
Designing autoscaling strategies for 5G mobile networks is different from that for traditional cloud computing scenarios. Specifically, in previous cloud autoscaling schemes (e.g., [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] ), only virtualized resources are considered. This is not suitable for typical cellular networks. Given widely deployed existing legacy network equipment, the desired solution should consider the capacities of both legacy network equipment and VNFs. For example, consider VNF only case that a VNF scalingout from 1 VNF instance to 2 VNF instances increases 100% capacity. Whereas, its capacity only grows less than 1% if legacy network equipment (say 100 VNF instance capability) is counted. Current cloud autoscaling schemes usually ignore the nonconstant issue.
In this paper, we investigate the budgetperformance tradeoff in terms of deadline constraint, VM setup time, and the legacy equipment capacity. We improve our recent work [14] by further considering deadline constraint for incoming requests, i.e., a request will be dropped if a prespecified timer is expired. This is a more practical assumption compared with that in [14], in which no deadline constraint is considered. To the best of our knowledge, this is the first work from this perspective. We then propose a Deadline and Budget Constrained Autoscaling (DBCA) algorithm for addressing the tradeoff. The DBCA considers available legacy equipment powered on all the time, while virtualized resources are divided into VNF instances. Then the DBCA scales out/in (turns on/off) VNF instances depending on job arrivals. Here, a central issue is how to choose a suitable for balancing the tradeoff. We then derive a detailed analytical model to answer this question. The analytical model quantifies the budgetperformance tradeoff and crossvalidates against extensive ns2 simulations. Furthermore, we propose a recursive approach to reduce the complexity of the computational procedure from to where the system capacity. Our model provides mobile operators with guidelines to design optimal VNF autoscaling strategies by their management policies in a systematical way, and enable wide applicability in various scenarios, and therefore, have important theoretical significance.
The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 briefly introduces some background material on mobile networks and NFV architecture. Section 4 presents the proposed optimal algorithm for VNF autoscaling applications. Section 5 addresses the analytical models, followed by numerical results illustrated in Section 6. Section 7 offers conclusions.
2 Related Work
Recent years, autoscaling mechanisms have been intensively studied [2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 10, 11, 15, 16, 17, 18, 19]. A straightforward and commonly used autoscaling approach is that autoscaling decisions are made based on resource utilization indicators (e.g., CPU, memory usage, etc). An example is the default autoscaling approaches offered by Amazon EC2 and Microsoft Azure. A scaleout request is sent right way if the current CPU usage exceeds a predefined threshold. However, specifying the threshold value is not easy while considering VM setup time. Indeed, the setup lag time could be as long as 10 min or more to start an instance in Microsoft Azure; and the lag time could be various from time to time [20]. Thus it may happen that the instance is too late to serve the VNF so that one needs to leave more redundant while setting the threshold. To handle the setup time, prediction/learning models are utilized to estimate the workload arrivals for autoscaling decision making, such as Exponential weighted Moving Average (EMA) [2, 3], AutoRegressive Moving Average (ARMA) [4, 5], AutoRegressive Integrated Moving Average (ARIMA) [6, 7], machine learning [8, 9], Markov model [10, 11], recursive renewal reward [12], and matrix analytic method [13]. However, the mechanisms [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] only consider virtualized resource itself (cloud resource) while overlooking legacy (fixed) resources, which are not suitable for typical cellular networks.
Perhaps the closest models to ours were studied in [14, 15, 16, 17, 18, 19] that both the capacities of fixed legacy network equipment and dynamic autoscaling cloud servers are considered. The authors in [15, 16] consider setup time without defections [15] and with defections [16]. Our recent work [18] relaxes the assumption in [15, 16] that after a setup time, all the cloud servers in the block are active concurrently. We further consider a more realistic model that each server has an independent setup time. However, in [15, 16, 18], all the cloud servers were assumed as a whole block, which is not practical where each cloud server should be allowed to scaleout/in dynamically. Considering all cloud servers as a whole block was relaxed to subblocks in [17, 19]. However, either setup time is ignored [17], or fixed legacy network capacity is not considered [19]. Our recent work [14] fixes the research gap, whereas job deadline constraint is not considered.
3 Background
Mobile Core Network (CN) is one of the most important parts in mobile networks. The main target of NFV is to virtualize the functions in the CN. The most recent CN is the Evolved Packet Core (EPC) introduced in Long Term Evolution (LTE). Here, we use an example to explain EPC and virtualized EPC (vEPC) when NFV is deployed. Fig. 1 shows a simplified example of NFV enabled LTE architecture consisted of Radio Access Network (RAN), EPC, and external Packet Data Network (PDN). In particular, the EPC is composed of legacy EPC and vEPC. In the following, we brief introduce them respectively.
3.1 Legacy EPC and vEPC
EPC is the CN of the LTE system. Here, we only show basic network functions, such as Serving Gateway (SGW), PDN Gateway (PGW), Mobility Management Entity (MME), and Policy and Charging Rules Function (PCRF) in the EPC.
To virtualize the above network functions, 3GPP introduces NFV management functions and solutions for vEPC based on ETSI NFV specification [21], as shown in Fig. 1. The network functions (e.g., MME, PCRF) are denoted as Network Elements (NE), which are virtualized as VNF instances. Network Manager (NM) provides enduser functions for network management of NEs. Element Manager (EM) is responsible for the management of a set of NMs. NFV management and orchestration controls VNF instance scaling procedure, which are detailed as follows.

VNF scalein/out: VNF scaleout adds additional VMs to support a VNF instance, adding more virtualized hardware resources (i.e., compute, network, and storage capability) into the VNF instance. In contrast, VNF scalein removes existing VMs from a VNF instance.

VNF scaleup/down: VNF scaleup allocates more hardware resources into a VM for supporting a VNF instance (e.g., replace a Onecore with Dualcore CPU). Whereas, VNF scaledown releases hardware resources from a VNF instance.
4 Proposed Deadline and Budget Constrained Autoscaling Algorithm
4.1 System Model and DBCA: Deadline and Budget Constraint Autoscaling
In general, we consider that a 5G EPC consists of legacy network entities (e.g., MME, PCRF) and VNFs [22, 23]. For a network entity, its capacity is supported by both legacy network equipment and VNF instances. Fig. 2 illustrates a simplified example of a network entity queueing model considering the capacities of both VNF instances and legacy network equipment. Specifically, the capacity of its legacy network equipment is assumed to be VNF instance capacities while denotes the number of VNF instances for supporting the network entity. That is, the total capacity of the network entity is .
From the network entity’s point of view, we assume that user requests arrive according to a Poisson process with rate . The capacity of a VNF instance is assumed to accept one job at a time and the service time is assumed to follow the exponential distribution with mean . When a user request arrives, the job first enters a limited FirstComeFirstServed (FCFS) queue waiting for processing. Each job has deadline constraint, which is a random variable following the exponential distribution with mean . In other words, the job will quit the queue if its waiting time exceeds its deadline. Without loss of generality, the legacy network equipment is always on while VNF instances will be powered on (or off) according to the number of waiting jobs in the queue. Moreover, a VNF instance needs a setup time to be available to serve user requests, which is assumed to be an exponentially distributed random variable with mean value .
DBCA utilizes two thresholds, ’Up’ and ’Down’, or and to control the VNF instances . Further, let and (), i.e., . In other words, DBCA sends orders to NFV management and orchestration to turn on/off VNF instances to adjust network capacities.

, power up the th VNF instances: If the th VNF instance is turned off and the number of requests in the system increases from to , then the VNF instance is powered up after a setup time to support the system. During the setup time, a VNF instance cannot serve user requests, but consumes power (or money for renting cloud services). Here, we specify . It is equivalent to that when the number of requests increases from to , the th VNF instance is powered up.

, power down the th VNF instances: If the th VNF instance is operative, and the number of requests in the system drops from to , then the VNF instance is powered down instantaneously. In this paper, we choose . It is equivalent to that when the number of requests drops from to , we turn off the th VNF instance.
4.2 Performance Metrics
The system performance is evaluated by four metrics: the average response time in the queue , the average number of running VNF instance , user request blocking probability , and user request dropping probability . We define them as follows.

The average response time in queue is defined as a job request’s waiting time in queue. In other words, it reveals how long time a job request can be served.

The average number of running VNF instances addresses the operation cost of virtual equipment.

Dropping probability is the probability that a request’s waiting time in queue exceeds its deadline constraint.

Blocking probability is the probability that a request is denied due to system busy.
The closedform solutions of , , , and are given as (12), (13), (10), and (14) in Section 5. Thus, the system performance has the form
(1) 
where coefficients , , , and denote the weight factors for , , , and , respectively. Increasing (or , , ) emphasizes more on (or , , ). Here, we do not specify either or (, ) due to the fact that such a value should be determined by a mobile operator and must take management policies into consideration.
5 Analytical Model
Notation  Explanation 

The total capacities of a network entity  
The number of maximum jobs can be accommodated in the system  
The number of VNF instances  
System performance  
Average response time  
Average response time in queue  
Average VM cost  
Blocking probability  
Dropping probability  
Weight factor for  
Weight factor for  
Weight factor for  
Weight factor for  
The capacities of legacy network equipment  
The Up threshold to control the VNF instances  
The Down threshold to control the VNF instances  
The th reserve subblock ().  
Job arrival rate  
Service rate for each server  
Setup rate for each virtual server  
Abandonment rate of each job 
In this section, we propose the analytical model for DBCA. The goal of the analytical model is to crossvalidate the accuracy of the simulation experiments and to analyze both the operation cost and the system performance for DBCA. Given the analytical model, one can quickly obtain the operation cost and system performance for DBCA, without real deployment, saving on cost and time.
We model the system as a queueing model with servers and a capacity of , i.e., the maximum of jobs can be accommodated in the system. Job arrivals follow the Poisson distribution with rate . A VNF instance (server) accepts one job at a time, and its service time follows the exponential distribution with mean . There is a limited FCFS queue for those jobs that have to wait for processing.
In this system, a server is turned off immediately if it has no job to do. Upon arrival of a job, an OFF server is turned on if any and the job is placed in the buffer. However, a server needs some setup time to be active so as to serve waiting jobs. We assume that the setup time follows the exponential distribution with mean . Let denotes the number of customers in the system and denotes the number of active servers. The number of reserves (server) in setup process is . Here, , where for all (block size is one). Therefore, in this model a server in reserve blocks is in either BUSY or OFF or SETUP. We assume that waiting jobs are served according to an FCFS manner. We call this model an Setup queue.
Here, we present a recursive scheme to calculate the joint stationary distribution. Let and denote the number of active servers and the number of customers in the system, respectively. It is easy to see that forms a Markov chain on the state space:
Fig. 3 shows the transition among states for the case where , and . Let () denote the joint stationary distribution of . Here, we derive a recursion for calculating the joint stationary distribution (). The balance equations for states with read as follows.
leading to
The sequence, is given as follows.
and
where and
Furthermore, it should be noted that is calculated using the local balance equation in and out the set as follows.
Remark.
We have expressed () and in terms of .
Next, we consider the case .
Lemma 1.
We have
where
(2)  
(3) 
for
Proof.
Theorem 1.
We have the following bound.
for .
Proof.
We use mathematical induction. It is easy to see that the theorem is true for . Assuming that the theorem is true for , i.e.,
where
It should be noted that can be calculated using the local balance between the flows in and out the set of states as follows.
Remark.
We have expressed () and in terms of .
We consider the general case where . Similar to the case , we can prove the following result by mathematical induction.
Lemma 2.
We have
where
(6)  
(7) 
and
Proof.
The balance equation for state is given as follows.
leading to the fact that Lemma 2 is true for . Assuming that
It then follows from
that
∎
Theorem 2.
We have the following bound.
for .
Proof.
It should be noted that is calculated using the following local balance equation in and out the set of states:
as follows.
Remark.
We have expressed () and in terms of .
Finally, we consider the case . Balance equation for state yields,
Lemma 3.
We have
where
(8)  
(9)  
and
Proof.
The global balance equation at state is given by
leading to
Assuming that , it follows from the global balance equation at state ,
that for . ∎
Theorem 3.
We have the following bound.
Proof.
We have expressed all the probability () in terms of which is uniquely determined by the normalizing condition.
Let denote the mean number of jobs in the system. We have
Let denote the blocking probability. We have
(10) 
It follows from Little’s law that
(11) 
We obtain
(12) 
The mean number of VNF instances is given by
(13) 
where the first term is the number of VNF instances that are already active while the second term is the mean number of VNF instances in setup mode.
Let denote the mean number of waiting jobs in the system. We have
Let denote the reneging probability that a waiting job abandons from the system. We have
(14) 
where the numerator and the denominator are the abandonment rate and the arrival rate of accepted jobs, respectively.
Again, based on the above derived performance metrics , , , and , mobile operators can easily design network optimization strategies according to (1).
Remark.
Theorems 1, 2, and 3 allow us to calculate the joint stationary distribution by a numerically stable algorithm because we deal with only positive numbers.
6 Simulation and Numerical Results
This section provides both simulation and numerical results for the analytical model addressed in Section 5. The analytical model is crossvalidated by extensive simulations by using ns2, version 2.35 [24] with real measurement results for parameter configuration^{1}^{1}1Due to simulation time limitation, and are scaled down accordingly with the same ratio .: by Facebook data center traffic [25], by the base service rate of a Amazon EC2 VM [26], and by the average VM startup time [27]. If not further specified, the following parameters are set as the default values for performance comparison: , , , , (see Table 1 for details). The results are based on exponential distribution for job request inter arrival time and VNF instance service time with mean and . The simulation time is 300,000 seconds. And millions job requests were generated during the extensive simulations.
Figs. 4, 5, 6, 7, 8 not only demonstrate the correctness of our analytical model, but also illustrate the impacts of , , , , , on the performance metrics: average VM cost , average response time in queue , blocking probability , and dropping probability , respectively. In the figures, the lines denote analytical results, and the points represent simulation results. Each simulation result in the figures is the mean value of the results in 300,000 seconds with 95% confidence level.
6.1 Impacts of Arrival Rate
We first look into the impacts of job request arrival rate . Mobile operators cannot adjust but are able to monitor it and configure network parameters , , , , and for network optimization accordingly.
Figs. 3(a), 4(a), 5(a), 6(a), 7(a) depict the impacts of on . In general, one can see that initiates at 0 at the beginning and then starts to raise sharply when passes . The reason is that the incoming job requests are served by the legacy equipment when . No VMs are powered on. Then DBCA starts to turn on VMs to handle job requests as is increasing. Later, reaches at a bound even if continues growing. This is because all the VMs are turned on so that is bounded as VM costs.
Figs. 3(b), 4(b), 5(b), 6(b), 7(b) show the impacts of on . Interestingly, the trend of the curves can generally be divided into four phases: zero phase, ascent phase, descent phase, and saturation phase. In the zero phase, is zero because incoming jobs are served immediately by available capacities. In the ascent phase, raises sharply due to the setup time of VMs. Specifically, when approaches to and then larger than , VMs start to be powered on and to serve jobs. In doing so, however, still grows sharply because jobs have to wait for turning on processes of VMs. Later, starts to decrease due to new running VMs as shown as the third (descent) phase. In the forth (ascent) phase, starts to grow again and then saturates at a bound. The reason of ascent is that the system is not able to serve the coming jobs when . Finally, the curves reach to saturation because the capacity of the system is too full to handle the jobs and the value of is limited by the total system capacity .
In Figs. 3(c), 4(c), 5(c), 6(c), and 7(c), we study the impacts of on . The trends of the curves are relatively simple compared with the above two metrics. Generally, the curves are growing as increases. In particular, initiates at 0 and starts to increase when . The reason is that the system starts to reject jobs when the queue is full.
Figs. 3(d), 4(d), 5(d), 6(d), 7(d) illustrate the impacts of on . One can see that the trends of the curves are similar with that of . Note that job requests start to quit the queue if the waiting time exceeds their deadline constraints. So is highly related to . If is large then jobs are dropped with high probability. This also explains why the trends are similar. Please refer to the above discussion of for .
6.2 Impacts of the Number of VNF Instances
The figures in Fig. 4 depict the impacts of on performance metrics , , , and , respectively. We can see that increasing from 10 to 60 leads to the gains of while decreasing , , and accordingly. A larger means that more VMs could be used to handle the growing job requests so , , and are improved. If a operator wants to adjust budget constraint , the operator can specify a suitable based on (13).
6.3 Impacts of Abandon Rate
In Figs. 4(a), 4(b), 4(c), and 4(d), we study the impacts of abandon rate on , , , and , respectively. Recall that a job request is assumed to have a deadline constraint with mean , meaning that the job will stop waiting in the queue if the waiting time exceeds its deadline. We observe that increasing decreases , , and while enlarging . Specifically, as shown in Fig. 4(a), has no impacts on when or . The reason is that only depends on the number of running VMs. Whereas, when , a larger leads to less because more jobs are dropped from the system. In addition, the impacts of on is illustrated in Fig. 4(b). A larger makes a smaller . The reason is that when more jobs quit from the queue, the rest of the jobs need to wait less time. Fig. 4(c) shows that increasing leads to less . The reason is straightforward. More jobs quitting from the queue means that the system has more available capacities to handle the incoming jobs. In Fig. 4(d), we observe that a larger means more . It coincides with the definition of .
6.4 Impacts of VM Setup Rate
Figs. 5(a), 5(b), 5(c), 5(d) illustrate the impacts of on , , , and , respectively. Recall that VMs are assumed to have a setup time with mean value . To reduce the setup time, NFV Management and Orchestration can perform scaleup procedure to add resources (e.g., CPU, memory) to make VMs more powerful. We observe that less setup time decreases , , , and . The reason is that short setup time leads to that VMs can be quicker to be available for handling the jobs, resulting in less operation cost (see Fig. 5(a)), lower waiting time for jobs (see Fig. 5(b)), smaller blocking probability (see Fig. 5(d)), and reduced dropping probability as shown in Fig. 5(d).
6.5 Impacts of Capacities of Legacy Equipment
Figs. 6(a), 6(b), 6(c), 6(d) show the impacts of on , , , and , respectively. We observe that the curves initiate at 0 then stay at 0 for a period and start to grow up as increases. decides the length of the period when the curves start to ascend. The reason is that the legacy equipment can handle incoming jobs within its capacity. When exceeds the capacity of the legacy equipment, the performance metrics , , , and start to grow up.
6.6 Impacts of System Capacity
In Figs. 7(a), 7(b), 7(c), and 7(d), we investigate the impacts of on , , , and , respectively. As shown in Fig. 7(a), we observe that has limited impacts on . As we discussed in Section 6.2, is mainly decided by . Figs. 7(b), 7(c), and 7(d) show that has significant impacts on , as well as . Different makes huge gaps between the curves. Moreover, a large leads to a larger as well as but makes smaller. The reason is that it enables more jobs waiting in the queue rather than dropping them.
7 Conclusions
In this paper, we have proposed DBCA for addressing the tradeoff between operation budget constraint and system performance which is evaluated by three performance metrics: the average job response time , blocking probability , and dropping probability . Our work addresses the research gap by considering both VM setup time and the capacity of legacy equipment in NFV enabled EPC scenarios. Compared with our previous work [14], the model quantifies a more practical case. Our results show that the analytical model provides a quick way to help mobile operators to plan and design network optimization strategies without wide deployment, saving on cost and time. Moreover, based on our analytical model, mobile operators can easily estimate operation budget given desired system performance, vice versa.
As our future work, one extension is to generalize the VM setup time and the arrival and service time. Right now there is no literature to support that they are exponential random variables. These results could be generalized by using orthogonal polynomial approaches [28]. Also, we plan to relax the assumption of VM scaling in/out capability, i.e., from one VNF instance per time to arbitrary instances per time. We plan to complete these works in followup papers.
References
 [1] Animoto’s facebook scaleup. [Online]. Available: http://blog.rightscale.com/2008/04/23/animotofacebookscaleup/.
 [2] Z. Xiao, W. Song, and Q. Chen, “Dynamic resource allocation using virtual machines for cloud computing environment,” IEEE Trans. Parallel and Distributed Systems, vol. 24, no. 6, pp. 1107–1117, 2013.
 [3] F. Jokhio, A. Ashraf, S. Lafond, I. Porres, and J. Lilius, “Predictionbased dynamic resource allocation for video transcoding in cloud computing,” in Proc. Euromicro Int’l Conf. Parallel, Distributed and NetworkBased Processing (PDP), Feb. 2013.
 [4] N. Roy, A. Dubey, and A. Gokhale, “Efficient autoscaling in the cloud using predictive models for workload forecasting,” in Proc. IEEE Int’l Conf. Cloud Computing (CLOUD), Jul. 2011.
 [5] J. M. Tirado, D. Higuero, F. Isaila, and J. Carretero, “Predictive data grouping and placement for cloudbased elastic server infrastructures,” in Proc. IEEE/ACM Int’l Symp. Cluster, Cloud and Grid Computing (CCGrid), May 2011.
 [6] D. Niu, H. Xu, B. Li, and S. Zhao, “Qualityassured cloud bandwidth autoscaling for videoondemand applications,” in Proc. IEEE INFOCOM, Mar. 2012.
 [7] R. Calheiros, E. Masoumi, R. Ranjan, and R. Buyya, “Workload prediction using ARIMA model and its impact on cloud applications’ QoS,” IEEE Trans. Cloud Computing, vol. 3, no. 4, pp. 449 – 458, Aug. 2014.
 [8] S. Islam, J. Keung, K. Lee, and A. Liu, “Empirical prediction models for adaptive resource provisioning in the cloud,” Future Generation Computer Systems, vol. 28, no. 1, pp. 155–162, 2012.
 [9] A. A. Bankole and S. A. Ajila, “Cloud client prediction models for cloud resource provisioning in a multitier web application environment,” in Proc. IEEE Int’l Symp. Service Oriented System Engineering (SOSE), Mar. 2013.
 [10] Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, “Cloudscale: elastic resource scaling for multitenant cloud systems,” in Proc. ACM Symp. on Cloud Computing (SoCC), Oct. 2011.
 [11] A. Khan, X. Yan, S. Tao, and N. Anerousis, “Workload characterization and prediction in the cloud: A multiple time series approach,” in Proc. IEEE NOMS, Apr. 2012.
 [12] A. Gandhi, S. Doroudi, M. HarcholBalter, and A. SchellerWolf, “Exact analysis of the M/M/k/setup class of Markov chains via recursive renewal reward,” ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 1, pp. 153–166, 2013.
 [13] T. PhungDuc, “Exact solutions for M/M/c/setup queues,” Springer Telecommunication Systems, pp. 1–16, May 2016.
 [14] Y. Ren, T. PhungDuc, J.C. Chen, and Z.W. Yu, “Dynamic Auto Scaling Algorithm (DASA) for 5G mobile networks,” in Proc. IEEE GLOBECOM ’16.
 [15] I. Mitrani, “Managing performance and power consumption in a server farm,” Annals of Operations Research, vol. 202, no. 1, pp. 121–134, 2013.
 [16] ——, “Service center tradeoffs between customer impatience and power consumption,” Elsevier Performance Evaluation, vol. 68, no. 11, pp. 1222 – 1231, Nov. 2011.
 [17] ——, “Trading power consumption against performance by reserving blocks of servers,” Springer Computer Performance Engineering, LNCS, vol. 7587, pp. 1–15, 2013.
 [18] J. Hu and T. PhungDuc, “Power consumption analysis for data centers with independent setup times and threshold controls,” in Proc. Int’l Conf. Numerical Analysis And Applied Mathematics (ICNAAM), Sep. 2014.
 [19] T. PhungDuc, “Multiserver queues with finite capacity and setup time,” in Proc. Int’l Conf. Analytical and Stochastic Modelling Techniques and Applications (ASMTA), May 2015.
 [20] Z. Hill, J. Li, M. Mao, A. RuizAlvarez, and M. Humphrey, “Early observations on the performance of Windows Azure,” in Proc. ACM HPDC, Jun. 2010.
 [21] 3GPP TR 32.842 V13.1.0, “Telecommunication management; Study on network management of virtualized networks (Release 13),” Tech. Rep., Dec. 2015.
 [22] H. Hawilo, A. Shami, M. Mirahmadi, and R. Asal, “NFV: state of the art, challenges, and implementation in next generation mobile networks (vEPC),” IEEE Network, vol. 28, no. 6, pp. 18–26, Nov. 2014.
 [23] I. ChihLin, C. Rowell, S. Han, Z. Xu, G. Li, and Z. Pan, “Toward green and soft: a 5G perspective,” IEEE Communications Magazine, vol. 52, no. 2, pp. 66–73, Feb. 2014.
 [24] ”The network simulator  ns2.” Available: http://www.isi.edu/nsnam/ns/.
 [25] A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter) network,” in Proc. ACM SIGCOMM, 2015.
 [26] M. Gilani, C. Inibhunu, and Q. H. Mahmoud, “Application and network performance of Amazon elastic compute cloud instances,” in Proc. IEEE Int’l Conf. Cloud Networking (CloudNet), 2015.
 [27] M. Mao and M. Humphrey, “A performance study on the VM startup time in the cloud,” in IEEE Int’l Conf. Cloud Computing (CLOUD),, 2012.
 [28] J. Pender, “Gram charlier expansion for time varying multiserver queues with abandonment,” SIAM Journal on Applied Mathematics, vol. 74, no. 4, pp. 1238–1265, 2014.