Traversing Virtual Network Functions from the Edge to the Core: An End-to-End Performance Analysis
Future mobile networks supporting Internet of Things are expected to provide both high throughput and low latency to user-specific services. One way to overcome this challenge is to adopt network function virtualization and Multi-access edge computing (MEC). In this paper, we analyze an end-to-end communication system that consists of both MEC servers and a server at the core network hosting different types of virtual network functions. We develop a queueing model for the performance analysis of the system consisting of both processing and transmission flows. The system is decomposed into subsystems which are independently analyzed in order to approximate the behaviour of the original system. We provide closed-form expressions of the performance metrics such as system drop rate and average number of tasks in the system. Simulation results show that our approximation performs quite well. By evaluating the system under different scenarios, we provide insights for the decision making on traffic flow control and its impact on critical performance metrics.
I Introduction††This work has been supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 643002.
In future communication systems, mission-critical mobile applications, e.g., augmented reality, connected vehicles, eHealth, will provide services that require ultra-low latency , . To satisfy these low latency requirements, Multi-access Edge Computing (MEC) has been proposed as a key solution . The idea of MEC is to locate more computation resources closer to the users, e.g., at the base stations. Besides low latency requirements, these services may have strict function chaining requirements. In other words, each service has to be processed by a set of network functions (e.g., firewalls, transcoders, load balancers, etc.) in a specific order . Furthermore, the requirements of 5G networks for flexibility and elasticity of the network inspire the idea of Network Function Virtualization (NFV) , . The idea of NFV is to decouple the network functions from dedicated hardware equipment. Instead of dedicated hardware equipment, general purpose servers can host one or more types of network functions. By this means, the network can be flexible and deploy proper network functions according to the demands of various traffic types. NFV together with MEC are considered key technologies for 5G wireless sytems. However, the computation capabilities and the available resources of MEC servers are still limited compared to the high-end servers in the cloud. Therefore, it is interesting to further investigate the cooperation between the edge and the core, and the cooperation among MEC servers.
Recently VNF placement and resource allocation problem has attracted a lot of attention, as shown in [5, 6]. In these works, the authors formulate the VNF placement problem as Mixed Integer Linear Problem (MILP) under the assumption of known traffic demand. In a dynamic and unknown traffic environment, the authors in [7, 8, 9, 10] develop dynamic algorithms in order to control the flow by applying Lyapunov optimization theory. Flows traverse VNF chains and each node decides the resource allocation to each VNF and routes the flow to the next node. The authors in [11, 12, 13] investigate the problem of offloading tasks from mobile devices to MECs. They address the trade-off between the power consumption and the task processing delay. There are few works on analyzing networks and derive key performance metrics such as delay. Authors in  analyze the end-to-end delay for embedded VNF chains. They consider two types of services that transerve different VNF chains and provide the delay analysis for each different chain.
In this paper, we model and analyze an exemplary end-to-end communication system which consists of two MEC servers at the edge network and one at the core network hosting different types of VNFs. We provide expressions for the key performance metrics of our interests by applying tools from queueing theory. In order to simplify the analysis, we split the system into subsystems and analyze each particular system independently. We consider finite buffers for the MEC servers. Therefore, if a buffer is full and a new task arrives at the same time, the task is dropped and removed from the system. We provide closed-form expressions for the key performance metrics such as the average number of tasks in the system and the system drop rate for each subsystem. Simulation results validate our analysis and it is shown that our approximation performs well. Furthermore, by evaluating the system under different scenarios we provide insights of how the routing decision affects the key performance metrics of our interest. This work can be considered as an initial, but important, step to take useful insights for developing optimal routing policies for critical performance metrics such as end-to-end delay.
Ii System Model
We consider an exemplary end-to-end communication system consisting of a mobile device, two MEC servers Server 1 and 2, and Server 3 located in the core network as depicted in Fig. 1. A task traverses a service chain of two consecutive Virtual Network Functions: VNF and VNF . In the chosen system, a MEC server, called Server , is co-located with the base station and hosts one copy of VNF as the primary MEC server. A secondary MEC server, called Server , is located nearby and also hosts a copy of VNF . It may be located at a peer edge host with spare capacity or at a central office location within the metropolitan area network. In addition, Server in the core network hosts VNF and has more advanced computation capabilities than Servers and .
We assume a time-slotted system. At each time slot, the device transmits a task in form of a packet to a base station over a wireless channel. Because of the presence of fading in the wireless channel, transmissions may face errors. A task is successfully transmitted to the base station with a probability that captures fading, attenuation, noise, etc. The device attempts for a new task transmission only if the previous task is successfully uploaded to the base station. The flow of tasks received at the base station needs to be distributed between the queue for local processing and the queue for transmission to the secondary MEC server. So, there are two possible routes to pass through the service chain. A flow controller at the base station decides randomly the routing for each task. With probability the task is processed by Server first, and then forwarded to Server . With probability the task is forwarded to Server , to be processed by VNF , and then forwarded to Server for being processed by VNF .
Each task that arrives at a server first waits in a queue for being processed by a VNF. Then, after the processing, it is stored in the transmission queue, waiting to be forwarded and processed by the next VNF. Let denote the -th queue, where , and is the set of the queues in the system. Note that the queues follow an early departure-late arrival model: at the beginning of the slot the departure takes place and a new arrival can enter the queue at the end of the slot. The queues for task transmission are , , and , and the queues for task processing are , , . The arrival rates for queues and are and , respectively. We denote by , , the service rates of the queues. We assume that the service times are geometrically distributed. Therefore, and can be seen as Geo/Geo/1 queues . Furthermore, given that , , and are non empty, the arrival rates of , , and are equivalent to the service rates of , , and (i.e., , , and ) respectively.
Furthermore, the queues at Servers and are assumed to have finite buffer. Let denote the buffer size of each queue . If a queue is full and no task departs and at the same time that a new one arrives, the new task is dropped and removed from the system. However, the queue of Server (where is located) is assumed to have infinite length of buffer because of large amount of hardware resources available in a server located in the core network.
In this work, we analyze a simple end-to-end communication system consisting of a small number of servers and one mobile device. However, we can generalize the model for arbitrary number of servers following the same methodology that is described in the next section.
Iii Performance Analysis
In this section, we perform the modeling and the performance analysis that allow us to derive the key performance metrics such as the system drop rate and the average number of tasks in the system as defined in Section IV. We model the considered queueing system utilizing Discrete Time Markov Chain (DTMC). Modeling the whole system as one Markov chain can drive in a quite complicated system difficult to be analyzed in terms of closed-form expressions that can provide useful insights. Thus, in order to simplify the analysis, we decompose our system into different subsystems. We consider the following four subsystems: 1) and , 2) and 3) , and 4) . The performance metrics for the whole system are approximated with the closed-form expressions derived from the subsystems. The approximation performs well as shown in Section V.
Iii-a Subsystems 1 and 2: Two queues in tandem
The two queues in tandem and are considered a subsystem. The Markov chain is described by , where and denote the states (in terms of queue length) of and at the -th time slot, respectively, and and are referred to as the level and phase , respectively. In order to facilitate the presentation and because of the space limitation, we first analyze a simple example with buffer size . The Markov chain of this example is shown in Fig. 2. However, the analysis presented below is quite general and independent of the specific buffer size. The Markov chain is a Quasi-Birth-and-Death (QBD) DTMC . Note that the QBD only goes a maximum level up or down, the transition matrix has a block partitioned form:
For the sake of simplicity, given a probability of an event, denoted by , we denote the probability of its complementary event by . The block matrices of are shown below.
In the example above, we construct the Markov chain of this particular example. Let us define , , , , , , , , , and , . The Markov chain is shown in Fig. 2. Then, we construct the transition matrix of the Markov chain and observe a particular structure of the matrix because of the properties of a QBD DTMC.
Utilizing the properties of a QBD DTMC, we can analyze such systems with arbitrary buffer size in the following steps. First we define the following matrices 
Then, the block matrices of the transition matrix are calculated as 
where is the arrival rate of .
Our goal is to derive the steady state distribution of the Markov chain defined above. We can apply direct methods in order to find the steady state distribution [15, Chapter 4]. Note that there are several efficient algorithms that can be used for this purpose, e.g., logarithmic reduction method. Please refer to  for a detailed instruction.
We denote the steady state distribution by a row vector . We find by solving the following linear system of equations
where denotes the column vector of ones. Hereafter we use to denote the steady state distribution vector of the -th subsystem for .
Along similar lines, we can construct the transition matrix and the steady state distribution for the second subsystem consisting of and .
Iii-B Subsystem 3: One queue
We consider as an independent subsystem which can be seen as a Geo/Geo/1/ queue, where is the buffer size of the queue. The Markov chain of this system is shown in Fig. 3.
The transition matrix of this Markov chain is described below
We denote the steady state distribution of Subsystem 3 by . In order to derive the steady state distribution, we solve the following linear system of equations
Using balance equations , we have
Iii-C Subsystem : with infinite buffer size
The arrival rate for depends on the service rate of and . We model the system as a Markov chain as shown in Fig. 4, where
The transition matrix that describes the Markov chain above is shown below
where , , , , , , . The transition matrix is a lower Hessenberg matrix. We denote the steady state distribution of Subsystem by . The general expression for the equilibrium equations for the states is given by the -th term in the following equation 
For the DTMC with infinite state space, we apply -tansform approach to solve the state equations. The -transform for the state transition probabilities and are
respectively. The -transform for the steady state distribution vector is
The solution for is given by
where , , and are the residues, poles, and directs terms, respectively.
Iii-C1 Stability Conditions
Since has infinite buffer size, we need to characterize the conditions under which the queue is stable. Stability is important since it implies finite queueing delay. A definition of queue stability is shown below .
Denote by the length of queue at the beginning of time slot . The queue is said to be stable if and .
The corollary consequence of the previous definition is the Loyne’s theorem  that states: if the arrival and service processes of a queue are strictly jointly stationary and the average arrival rate is less than the average service rate, then the queue is stable.
In this scenario, we have at most two arrivals at each time slot and up to one departure. The average arrival rate is the following
Therefore, is stable if and only if the inequality above holds
Iv Key Performance Metrics
In this section, we provide closed-form expressions of the performance metrics of our interests, i.e., the system drop rate and average number of tasks of the system. We utilize the results of the previous section in order to obtain the corresponding expressions. The probabilities to have a dropped task at each time slot for are shown respectively in below
where is the probability to have a dropped task of queue . The average length of each queue is given by
Therefore, the system drop rate and the average number of task in the system can be described as
In the next section we evaluate numerically the performance of the system and we validate through simulations the accuracy of the approximated model.
V Numerical Results
In this section, we evaluate the performance of our approximated model by comparing the theoretical and simulation results. We compare the results of the closed-form expressions for the key performance metrics with those of simulations. Furthermore, we provide results that show how different values of and affect the optimal routing decision. Note that our key performance metrics are the system drop rate and the average number of tasks in the system. We provide results for two different scenarios: 1) Our objective is to minimize the drop rate when the buffer capacities are small, 2) Our objective is to minimize the average number of tasks in the system when the buffer capacities are large. The results for the two scenarios are shown below in two different subsections. The simulation set-up is: , , . The values of , are different for each scenario. We developed a MATLAB-based behavioural simulator and each case run for timeslots.
V-a Effect of and on the drop rate in small buffered systems
In this subsection, we observe the performance of the system in terms of the drop rate when the size of the buffers is small (, .). In Fig. 5, we provide the optimal values of (probabilistic routing decision) for different values of and . Note that we obtain the optimal for each value of and by applying brute force. We observe that for small values of and , the value of is small (around ). Therefore, the routing selects to route the traffic flow to the secondary MEC server (Server ). Furthermore, it is shown that the value of optimal is affected by the smallest value between and . For example, when and , the optimal alpha is , that is equal to the optimal for the case of . Therefore, it is shown by the results that the buffer with the smallest transmission or computation capacity becomes the bottleneck for the subsystem. This could be the case, for example, when the connection between the MEC server and the server in the core network is weak. Fig. 6 shows the system drop rate for the corresponding optimal values of . We observe that the performance of the system in terms of drop rate increases when both and increase. The reason why the drop rate is affected only if both and increase is because the smallest value between and operates as bottleneck. This observation is important not only for obtaining the optimal routing decisions but also for system design purposes, e.g., allocation of the resources.
V-B Effect of and on the number of tasks in the system for large buffered systems
In this subsection, we provide results for the performance of the system in terms of average number of tasks in the system. In this case, our objective is to minimize the average number of tasks in the system when the buffer size is large (, .). In Fig. 7, the optimal ’s for different values of and are shown. We observe that for small values of , the optimal value of is equal to . Therefore, the flow controller selects to route the whole traffic to the first server. This decision is optimal in terms of minimization of the average number of tasks in the system, but it increases significantly the system drop rate. Therefore, the reason why the average number of tasks in the system is minimized is because a large percentage of the tasks are dropped and but not served as shown in Fig. 9. Furthermore, with fixed , the value of optimal increases as increases. Since increases, the arrival rate for also increases. However, is equal to zero, therefore, the larger the values of and the larger the system drop rate. Hence, the average number of tasks in the system decreases. For values of and that are greater or equal than , we observe that the values of ’s increase smoothly as , increase. However, we also observe that the smallest value between and operates as bottleneck in the subsystem and subsequently in the whole system. For example, when is fixed and equal to , the routing decision and the performance of the system do not improve as we increase the value of .
In Fig. 10, we compare the theoretical with experimental results in terms of optimal and the corresponding system drop rates. For the ease of presentation, we fix . We select the optimal according to the minimum drop rate and provide results for different values of . It is shown that our approximation model works well. Similarly, in Fig. 11, we compare the theoretical with experimental results for the case that we select the optimal according to the minimum number of tasks in the system. Overall, we also calculate the corresponding for all the cases that are shown above (Fig. 5 – 9) according to our theoretical model. For the case that our objective is the minimization of drop rate the average absolute error is equal to , while in the case of minimizing the average number of tasks in the system the average absolute error yields .
Vi Conclusions & Future Directions
In this work, we consider an exemplary network topology with two MEC servers, a high-end server at core network, and VNF chains embedded in the servers. We model the network and provide a theoretical study on the system performance in terms of system drop rate and average number of the tasks in the system that can be useful for more general set-ups. We provide both experimental and theoretical results in order to evaluate the performance of our approximated model and as it is shown our derived model can approximate the system with high accuracy. Numerical results show that, we are able to offer some useful insights on the design of such systems or resource allocation at each server. Furthermore, we investigate numerically the routing policy that minimizes the system drop rate for different set-ups of the system. This work can be considered as an initial, but significant, step for analyzing and optimizing end-to-end delay, and throughput of such networks. The developed analysis, can provide guidelines for delay-aware routing and resource allocation schemes in similar systems.
-  Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edge computing–A key technology towards 5G,” ETSI white paper, vol. 11, no. 11, pp. 1–16, 2015.
-  T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and D. Sabella, “On multi-access edge computing: A survey of the emerging 5G network edge cloud architecture and orchestration,” IEEE Communications Surveys & Tutorials, vol. 19, no. 3, pp. 1657–1681, 2017.
-  R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network function virtualization: State-of-the-art and research challenges,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 236–262, 2016.
-  M. S. Bonfim, K. L. Dias, and S. F. Fernandes, “Integrated NFV/SDN architectures: A systematic literature review,” arXiv preprint arXiv:1801.01516, 2018.
-  R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “Near optimal placement of virtual network functions,” in IEEE INFOCOM, 2015, pp. 1346–1354.
-  L. Wang, Z. Lu, X. Wen, R. Knopp, and R. Gupta, “Joint optimization of service function chaining and resource allocation in network function virtualization,” IEEE Access, vol. 4, pp. 8084–8094, 2016.
-  H. Feng, J. Llorca, A. M. Tulino, and A. F. Molisch, “Optimal dynamic cloud network control,” IEEE/ACM Transactions on Networking, no. 99, pp. 1–14, 2018.
-  M. Barcelo, J. Llorca, A. M. Tulino, and N. Raman, “The cloud service distribution problem in distributed cloud networks,” in IEEE ICC, 2015, pp. 344–350.
-  H. Feng, J. Llorca, A. M. Tulino, and A. F. Molisch, “Dynamic network service optimization in distributed cloud networks,” in IEEE INFOCOM Workshops, 2016, pp. 300–305.
-  H. Feng, J. Llorca, A. M. Tulino, and A. Molisch, “Optimal control of wireless computing networks,” vol. abs/1710.10356, 2017. [Online]. Available: http://arxiv.org/abs/1710.10356
-  J. Liu, Y. Mao, J. Zhang, and K. B. Letaief, “Delay-optimal computation task scheduling for mobile-edge computing systems,” in IEEE ISIT, 2016, pp. 1451–1455.
-  Y. Mao, J. Zhang, S. Song, and K. B. Letaief, “Stochastic joint radio and computational resource management for multi-user mobile-edge computing systems,” IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 5994–6009, 2017.
-  D. Han, W. Chen, and Y. Fang, “Power-optimal scheduling for delay constrained mobile computation offloading,” in IEEE ICC, May 2018, pp. 1–6.
-  Q. Ye, W. Zhuang, X. Li, and J. Rao, “End-to-end delay modeling for embedded VNF chains in 5G core networks,” IEEE Internet of Things Journal, pp. 1–1, 2018.
-  A. S. Alfa, Applied discrete-time queues. Springer, 2016.
-  F. Gebali, Analysis of Computer and Communication Networks, 1st ed. Springer, 2010.
-  W. Szpankowski, “Stability conditions for some distributed systems: Buffered random access systems,” Advances in Applied Probability, vol. 26, no. 2, pp. 498–515, 1994.
-  R. M. Loynes, “The stability of a queue with non-independent inter-arrival and service times,” in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 58, no. 3. Cambridge University Press, 1962, pp. 497–520.