MSDF: A Deep Reinforcement Learning Framework for Service Function Chain Migration
Under dynamic traffic, service function chain (SFC) migration is considered as an effective way to improve resource utilization. However, the lack of future network information leads to non-optimal solutions, which motivates us to study reinforcement learning based SFC migration from a long-term perspective. In this paper, we formulate the SFC migration problem as a minimization problem with the objective of total network operation cost under constraints of users’ quality of service. We firstly design a deep Q-network based algorithm to solve single SFC migration problem, which can adjust migration strategy online without knowing future information. Further, a novel multi-agent cooperative framework, called MSDF, is proposed to address the challenge of considering multiple SFC migration on the basis of single SFC migration. MSDF reduces the complexity thus accelerates the convergence speed, especially in large scale networks. Experimental results demonstrate that MSDF outperforms typical heuristic algorithms under various scenarios.
Traffic flows in packet networks always require various services, which can be achieved by traversing kinds of middle-boxes. Traditionally, the network operator manipulates these middle-boxes with high cost due to specific hardware. Fortunately, network function virtualization (NFV) decouples network functions (NFs) from the underlying hardware so that various NFs can run as softwares on the commodity hardware to substitute the traditional middle-boxes , which brings great flexibility to provide specific services thus significantly reduces the cost and complexity. Usually, a series of virtualized network functions (VNFs) are concatenated by virtual links in sequence to form the service function chain (SFC). Specifically, the corresponding VNF instances (VNFIs) hosted by virtual machines (VMs) will be deployed on servers in the data center network (DCN) firstly. Then the SFC is mapped to these instances and physical links to achieve the promised level of service. Nowadays, NFV is usually combined with software defined network (SDN) to provide easy traffic steering. However, the high dynamic of traffic makes it difficult to meet users’ quality of service (QoS) with low cost all the time, which might reduce resource utilization in DCN.
Traditionally, there are several mechanisms to handle the traffic dynamic ,  in NFV-enabled networks: i) Horizontal scaling: adding or removing virtualized resource, e.g., VNF instance. Rahman et al.  propose VNF placement algorithms to scale in/out VNF instances which dynamically reacts to traffic changes. ii) Vertical scaling: reconfiguring the size of virtualized resource allocated to containers or VMs. The authors in  formulate the dynamic resource scaling of VNF management as an integer linear programming (ILP) problem, and propose a greedy algorithm to solve it. iii) VNF migration: migrating the related VMs in interrupted or alive manner. Cho et al.  migrate the VNFs by going through all those migration strategies satisfying the modeled constraints to achieve low network latency. Quang et al.  formulate the VNF forwarding graph placement problem as an ILP problem and consider VNF migration in dynamic networks. In fact, VNF migration is the most effective solution which can be combined with the other two scaling mechanisms to further enhance the resilience of VNF.
Nevertheless, VNF migration involves transferring the internal states of VMs, which causes QoS degradation and is costly especially in alive manner . To avoid QoS degradation caused by VM shutdown and reduce the network cost, SFC migration can be considered. Specifically, SFC migration implies manipulating the traffic forwarding by changing the mapping between VNFs in SFC and VNFIs in physical network. However, in DCN with dynamic traffic, obtaining the optimal migration strategy is difficult. As the time-varying traffic pattern is unknown, directly adopting instantaneous optimal migration strategy may lead to frequent migrations back and forth, which induces serious overhead. The optimal migration strategy should be considered from a long-term vision.
However, the long-term optimal migration strategy needs future network information. Fortunately, deep learning offers a tool to predict future network information accurately and easily with sufficient history data. Tang et al.  propose a real-time VNF migration algorithm using the deep belief network to predict future resource requirements. But training the practicable neural network is extremely eager for large amounts of training data. Reinforcement learning (RL)  shows superiority when facing shortage of training data. Guided by appropriate reward, the RL agent can adjust its policy online and achieve the long-term goal finally. Li et al.  propose a RL based algorithm to solve the NP-hard VNF scheduling problem, while H. J. Ku et al.  propose to migrate the whole SFC using RL to make better use of physical resource. However, both of them limit the migration action space in a small scale and solve it by tabular Q-learning, which is not practical in real networks with complicated states and great quantities of migration strategies.
In this paper, we solve the multiple SFC migration problem which considers joint migration of multiple SFCs in dynamic DCN while guaranteeing users’ QoS. The high complexity in large scale networks motivates us to solve the problem using deep reinforcement learning (DRL). With the ability of extracting features, DRL can handle problems with high dimensional states thus overcome the shortcomings of tabular Q-learning. Furthermore, an cooperative multi-agent framework is proposed to address the challenge of huge action space. In detail, the contributions of this paper are listed as follows:
1) We formulate the SFC migration problem with time-varying traffic to minimize the total network cost in a long time span. We consider and analyze the constraints of users’ customized QoS, in terms of end-to-end delay and packet loss. Moreover, both energy consumption and migration overhead are involved for the formulation of the optimization objective.
2) To solve the problem, we first explore the situation of single SFC migration. We design the deep Q-network (DQN) based subagent for single SFC migration, where the reward function of the subagent is designed as piecewise function based on the formulated optimization objective. In this way, the subagent can adapt to the dynamic traffic loads.
3) On the basis of subagents designed for single SFC migration, we propose a Monitor and Successive Decision Framework (MSDF) to handle the challenge of huge action space when consider joint migration of multiple SFCs. MSDF divides the huge action space into much more smaller ones of subagents for single SFC migration, thus the complexity is reduced and the convergence speed is consequently accelerated. In MSDF, the joint migration strategy is formed by concatenating actions of these subagents.
4) We compare the performance of MSDF with typical heuristic algorithms by experiments using real trace data. The experimental results show that MSDF outperforms existing algorithms under various scenarios.
The rest of this paper is organized as follows. Section II describes the system model and illustrates the problem formulation. Then Section III proposes the learning framework from single subagent to cooperative multi-agents in detail. Section IV reports the performance evaluation. Finally, the paper is concluded in Section V.
Ii System Model and Problem Formulation
Ii-a Network model
A time-varying DCN is considered in this paper, which is modeled by several main elements from the perspective of NFV:
Physical network. It is modeled as an undirected graph . A function node is characterized by its resource capacity . It can accommodate VMs on which the VNFIs are hosted. The physical link between node is characterized by the propagation delay on it.
VNF set . The set includes all demanded types of VNF in DCN. Each type of VNF is characterized by where the elements denote the coefficients of processing delay, resource allocation delay, deployment delay and deployment cost, respectively.
SFC set . Each SFC is denoted by a set of ordered virtual nodes. It is described by an undirected graph where node is a logic VNF and is a virtual link between logic VNF node . Each SFC is characterized by its bandwidth and maximum end-to-end delay .
Flow set . Each flow , denoted by the pair of source node and destination node , is characterized by its bandwidth and maximum end-to-end delay .
Ii-B Migration model
Each SFC represents a tailored service for users that the network operator offers. It serves multiple flows that have the same request of service level but own different pairs of source node and destination node. With SFC migration, we change the mapping between logic VNFs and VMs on physical nodes, as well as the forwarding of relevant flows.
The detailed migration process is elaborated here: i): Preparation of the target node. The centralized SDN controller informs the migration target node to perform resource reallocation on the related VM, according to the bandwidth of migrated SFC. If there is no requested type of VNFI, deploy one in advance. ii): Informing the original node. The controller informs the original node to submit all the information of flows related to the migrated SFC, including session state information and data packets. iii): Information transferring. The relevant information and data stored in the controller will be transferred to the target node, once the migration target node is ready. iv): Updating flow table. The controller updates flow tables of related switches to forward flows according to the new mapping of SFC.
Note that if the source instance has no traffic to process after migration, the resource allocated to the related VM can be retrieved. If all the instances on the source node are idle, the node can run in a low power mode.
Ii-C Problem formulation
|Time dependent variables|
|Indicating the VNF of type is deployed on function node at time with value 1, 0 otherwise.|
|Indicating the VNF of SFC is mapped to node at time with value 1, 0 otherwise.|
|Indicating the virtual link between neighboring VNF and VNF of SFC () is mapped to physical links between node and node () at time with value 1, 0 otherwise.|
|Indicating the VM that hosts a VNFI of type on node has flows to process at time with value 1, 0 otherwise. From which we have indicating node is idle at time .|
|Time independent variables|
|Indicating the VNF of SFC is type with value 1, 0 otherwise.|
|Indicating service type of the flow is translated into SFC with value 1, 0 otherwise. From which we have and .|
We formulate the SFC migration problem which considers joint migration of multiple SFCs from a long-term perspective. The objective trades off between the operation cost and the revenue of guaranteeing users’ QoS.
Above all, we define the long-term optimization time span as an operational cycle , which is further divided into discrete time slots (). A migration decision will be made at the beginning of each time slot . All the variables used in the SFC migration problem are binary, and they are summarized in Table I.
To guarantee users’ QoS, we consider two important metrics, i.e., maximum end-to-end delay and packet loss. The end-to-end delay includes the propagation delay on links and the fixed processing delay on VMs. There is a minimum processing resource request to make the processing speed catch with the arrival rate of data, otherwise packet loss will occur due to the lack of resource. These two metrics are represented by constraints, where the packet loss is guaranteed by the resource constraint.
1) End-to-end delay constraint.
where and represent the source node and the destination node, respectively, while and represent nodes that are mapped by the first and the last VNF in , respectively.
2) Resource constraint. The sum of resource requested by all VNFIs on one node should not exceed the capacity of this node to avoid node congestion. We calculate the required computational capacity of VMs as . Then we have the resource constraint as
where represents the requested resource of VM that hosts on node at time . is the length of data packet. When the resource is insufficient, we allocate it following the max-min fair sharing algorithm and process packets according to the principle of first-come first-served. Then packet loss can be calculated as
With the above two constraints to guarantee users’ QoS, we aim to minimize the total network cost in , including energy consumption of the physical network and migration overhead.
1) Energy consumption. The physical network consumes energy even when there is no traffic to process, which is called the basic resource consumption . We separate it into two parts: the basic energy for running nodes and the basic resource consumption of VMs. To save the basic resource consumption, traffic processed on inefficient nodes should be aggregated, i.e., minimizing the number of running VMs and nodes. Thus we can express the energy consumption as:
where is an indication function with value when and otherwise. The binary variable when . and are the basic energy cost factors of node and VM that hosts the , respectively.
2) Migration overhead. To keep the consistence, session state and data need to be stored and forwarded during the migration. Thus extra network resource is used during migration process, which is part of the migration overhead. The amount of these data is proportional to the product of the migration preparation time and the bandwidth of the SFC. By minimizing the following expression, frequent migrations can be avoided.
Another part is reconfiguration cost including the cost of deploying new VNFIs and extra link bandwidth usage caused by link re-mapping, which can be expressed as follows.
The migration overhead is the sum of the above two parts, i.e., . Where and are normalization factors.
Finally, the SFC migration problem is formulated in (7) with the objective of minimizing the total network cost during . In (7a), are weights that balance the aforementioned two kinds of costs, satisfying . Constraint (7c) indicates that each VNF should be mapped to one function node. (7d) constrains the number of migrated VNFs for each SFC. (7e) indicates that each type of VNF should be deployed on at least one function node. (7f) ensures that each flow belongs to only one SFC. (7g) guarantees the consistence of flow .
It should be emphasized that the main decision variable of SFC migration is the VNF mapping variable: , while other variables can be inferred based on previous network status.
Iii Monitor and Successive Decision Framework
The SFC migration problem is NP-hard as it is actually equivalent to the general case of the NP-hard bin packing problem  by regarding the physical nodes as bins and VNFs as items. Moreover, the dynamic characteristic of the traffic makes it difficult to be solved by traditional heuristic methods. In this section, we propose a DRL based cooperative framework to approximate the optimal global solution in the time-varying DCN.
As the number of joint migration strategies is huge for a large scale network with multiple SFCs, we decompose the multiple SFC migration problem into multiple homogeneous single SFC migration problems. For each SFC, we design a DRL based subagent to make migration decisions. To form an effective joint migration strategy on the basis of these subagents, we design a cooperative framework to facilitate the minimization of total network cost when consider multiple SFC migration.
Iii-a Single SFC migration case
Since the current network status is only related to the current topology, traffic flow and the migration decisions, we consider the SFC migration problem as a Markov Decision Process (MDP). Adopting DQN [16, 17], we design the subagent for each SFC as follows, where we take the () SFC as an example.
State : The network state includes the resource requirements of VNFs in the SFC, the mapping of the SFC, and remaining resource. It can be expressed concisely as , where is the ratio of the demanding resource of the VNF in to the resource capacity of the mapping node, is the ratio of the remaining resource of the node to its resource capacity and is the number of available nodes in the DCN.
Action : The action can be described as selecting one VNF from the VNF set of each SFC and deciding whether to migrate it and where to migrate it to, which forms a finite discrete action space.
Reward : Considering the operation cost and users’ QoS, the reward function is designed as the sum of (7a) and a punishment related to the constraints (7b). As each subagent contributes to the migration overhead to varying degrees, we separately calculate the migration overhead of each subagent, and the global energy consumption is used to promote cooperation among subagents. Furthermore, in order to trades off among different optimization targets under different load scenarios, we design the reward as a piecewise function segmented by the degree of loads. Thus the reward function of the subagent can be expressed as (8), where the represents the migration overhead of the subagent.
where . is a unit ramp function. and are shorthand of real and maximum end-to-end delay. is a threshold of overload, which can be set according to the percentage of loads on nodes.
DQN uses state-action value function to estimate the delayed instant return , which evaluates the quality of action under state . The optimal Q-function estimates the maximum expected return from state with executing action thereafter following policy , where . To stabilize the learning, we adopt experience replay  which removes correlations between transition samples, and target network  which keeps the consistency of targets when updating parameters. The training process is described as Algorithm. 1.
Iii-B Multiple SFC migration case
Combining migration strategies of subagents without conflicts to approximate the optimal joint migration strategy requires a coordination mechanism. We design a cooperative framework, MSDF (Monitor and Successive Decision Framework), where multi-agents successively make decisions to facilitate minimization of total cost. The basic ideas behind this framework are elaborated as follows:
1) Cooperative multi-agent framework. Consider a network with functional nodes and SFCs. For the SFC with VNFs, we use DQN to choose a VNF and select one node from the other available nodes to place the VNF. In this case, the total number of migration strategies for the entire network is , which is a huge action space even for a small number of SFCs. Therefore, we decompose the huge action space into smaller ones of subagents designed as Sec. III-A. The global energy consumption in the reward function (8) directs all subagents to converge toward a common target, while the migration overhead is independent for each subagent. Therefore, collaborative behaviors are generated among these subagents.
2) Successive decision structure. To avoid conflict, these subagents make decisions successively. Each subagent makes decision based on the new intermediate network state after the previous subagent executed its action (the state transition is illustrated in Fig. 1). Combined with the appropriately designed reward function, the whole network will converge when the last subagent converges. The decision order of subagents depends on the probability of overload, calculated as , where represents the probability that node happens to overload. That is to say, the SFC which is most likely to be overloaded will be migrated at first. Finally, the joint migration strategy is formed by concatenating actions of subagents.
3) Simulation and monitor. To reduce the overhead caused by frequent interactions between subagents and the environment, the above successive decision process is performed in a simulated environment within SDN control plane. The simulated network is a snapshot of the real network. The final joint strategy will be applied to the real network in the afore-simulated order to acquire the real rewards. At the same time, the SDN controller monitors the network to obtain the signal to start a new learning process. Thus, the subagents are life-long learning agents thus can automatically keep up with new traffic pattern.
The process of the multiple SFC migration using DQN based MSDF is summarized in Algorithm. 2.
Iv Performance evaluation
The evaluation is based on Uunet which is the real topology of USA Backbone IP network downloaded from topology-zoo with real network traffic data . The Uunet has 49 nodes and 84 links. We set 10 functional nodes depending on the degree of nodes and 4 types of VNFs in the network. We assume that the actually consumed resource of related VMs, the processing time, the configuration time of related VMs and the deployment time of each VNF depend on the type of the VNF. We evaluate our framework from two aspects: convergence and network performance.
Iv-a Convergence performance
To illustrate that the multi-agent framework can effectively reduce the action space and consequently speed up the convergence, we first compare the convergence performance with one-agent under 3 SFCs of length 3. The size of action space is around 27,000 for the one-agent but 28 for each subagent in MSDF. Their convergence curves are shown in Fig. 2(a), from which we can clearly see the difference in terms of convergence speed. After 20,000 rounds of training, the one-agent still does not converge to the same level as MSDF, which only needs hundreds of iterations to converge.
Then we try to find out the main factors affecting the convergence of MSDF. We train the subagents with different parameters, as shown in Fig. 2. With other factors fixed, changing the number of SFCs, i.e., the number of cooperative subagents, has slight impact on the convergence performance (Fig. 2(b)). Nevertheless, MSDF has shown its scalability to a larger network with more SFCs as it can still converge fast in the case of 20 SFCs. Fig. 2(c) shows that the convergence performance gets deteriorated when the SFCs need to traverse more VNFs, which leads to larger action space. Increasing the number of flows included in each SFC (Fig. 2(d)) shows graceful convergence performance under different load scenarios, which validates the ability of trading off among different targets.
Iv-B Network cost evaluation
We compare the converged MSDF with two heuristic algorithms. One is the Greedy proposed in , which trys to ease the overload of functional nodes. Specifically, it migrates VMs in descending order of resource requests on the most overloaded nodes to those nodes that have adequate resource, with satisfying end-to-end delay constraint. The other one is the RM , which performs real-time migration with the target to reduce the end-to-end delay of SFCs under the resource constraint of function nodes.
Firstly, we set 10 SFCs with length of 3, and change the number of flows from 60 to 200 (there are at most 2,000 flows in the network simultaneously). Under the light-load scenario, RM migrates as the resource is sufficient. The result shows that MSDF can save basic energy cost (Fig. 3(d)) by aggregating traffic loads to lesser VMs and nodes with fewer migrations, while the heuristic RM adopts instantaneous optimal solutions leading to frequent migrations (Fig. 3(b)). Along with the increasing loads, RM stops migrating which illustrates that it can not handle the heavy-load situation. Meanwhile, MSDF can hold equivalent level of overload (Fig. 3(c), where the ordinate is the sum of the scaled percentages) as Greedy with fewer migrations. Moreover, Greedy migrates only under the heavy-load scenario, as shown in (Fig. 3(b)). In a total word, the proposed framework can adjust automatically to adapt to the dynamic network. Thus it can achieve the smallest total cost from a long-term perspective.
Then we fix the number of flows as 50, and change the number of SFCs from 5 to 19, trying to further demonstrate the scalability of MSDF. The results (Fig. 4) imply that MSDF can still find the balance between basic energy cost and users’ QoS under the dynamic network. Particularly when there are SFCs in the network, these 19 subagents cooperate successfully to reduce the percentage of unsatisfied QoS constraints with a certain number of migrations (Fig. 4(c), 4(b)). Consequently, MSDF is still effective when it scales to a larger network.
In this paper, we formulate the multiple SFC migration problem under dynamic traffic with the goal to minimize the total network cost in a long time span. A novel learning framework MSDF is proposed to handle the high complexity when apply RL to make joint SFC migration decisions. MSDF accelerates the convergence speed and facilitates cooperation among subagents to reduce the total network cost. Simulation results validate the effectiveness of MSDF. Its convergence speed is more than 25 times faster than the convergence speed of a single agent. Meanwhile, it outperforms two typical heuristic algorithms in terms of balancing the network operation cost and users’ QoS as well as adapting to dynamic traffic loads.
-  R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network function virtualization: State-of-the-art and research challenges,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 236–262, 2016.
-  R. Gouareb, V. Friderikos, and A. Aghvami, “Placement and routing of vnfs for horizontal scaling,” in 2019 26th International Conference on Telecommunications (ICT). IEEE, 2019, pp. 154–159.
-  J. Zhang, L. Li, and D. Wang, “Optimizing vnf live migration via para-virtualization driver and quickassist technology,” in 2017 IEEE International Conference on Communications (ICC). IEEE, 2017, pp. 1–6.
-  H. Tang, D. Zhou, and D. Chen, “Dynamic network function instance scaling based on traffic forecasting and vnf placement in operator data centers,” IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 3, pp. 530–543, 2018.
-  O. Houidi, O. Soualah, W. Louati, M. Mechtri, D. Zeghlache, and F. Kamoun, “An efficient algorithm for virtual network function scaling,” in GLOBECOM 2017-2017 IEEE Global Communications Conference. IEEE, 2017, pp. 1–7.
-  D. Cho, J. Taheri, A. Y. Zomaya, and P. Bouvry, “Real-time virtual network function (vnf) migration toward low network latency in cloud environments,” in 2017 IEEE 10th International Conference on Cloud Computing (CLOUD). IEEE, 2017, pp. 798–801.
-  P. T. A. Quang, A. Bradai, K. D. Singh, G. Picard, and R. Riggio, “Single and multi-domain adaptive allocation algorithms for vnf forwarding graph embedding,” IEEE Transactions on Network and Service Management, vol. 16, no. 1, pp. 98–112, 2018.
-  V. Eramo, E. Miucci, M. Ammar, and F. G. Lavacca, “An approach for service function chain routing and virtual function network instance migration in network function virtualization architectures,” IEEE/ACM Transactions on Networking, vol. 25, no. 4, pp. 2008–2025, 2017.
-  L. Tang, X. He, P. Zhao, G. Zhao, Y. Zhou, and Q. Chen, “Virtual network function migration based on dynamic resource requirements prediction,” IEEE Access, vol. 7, pp. 112 348–112 362, 2019.
-  R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
-  J. Li, W. Shi, N. Zhang, and X. S. Shen, “Reinforcement learning based vnf scheduling with end-to-end delay guarantee,” in 2019 IEEE/CIC International Conference on Communications in China (ICCC). IEEE, 2019, pp. 572–577.
-  H.-J. Ku, J.-h. Jung, and G.-I. Kwon, “A study on reinforcement learning based sfc path selection in sdn/nfv,” International Journal of Applied Engineering Research, vol. 12, no. 12, pp. 3439–3443, 2017.
-  V. Eramo, A. Tosti, and E. Miucci, “Server resource dimensioning and routing of service function chain in nfv network architectures,” Journal of Electrical and Computer Engineering, vol. 2016, 2016.
-  P. Vizarreta, M. Condoluci, C. M. Machuca, T. Mahmoodi, and W. Kellerer, “Qos-driven function placement reducing expenditures in nfv deployments,” in 2017 IEEE International Conference on Communications (ICC). IEEE, 2017, pp. 1–7.
-  J. Hartmanis, “Computers and intractability: a guide to the theory of np-completeness (michael r. garey and david s. johnson),” Siam Review, vol. 24, no. 1, p. 90, 1982.
-  V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
-  V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
-  https://datamarket.com/data/list/?q=cat:ecd%20provider:tsdl.
-  J. Xia, Z. Cai, and M. Xu, “Optimized virtual network functions migration for nfv,” in 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2016, pp. 340–346.