ModelDriven Artificial Intelligence for Online Network Optimization
Abstract
Future 5G wireless networks will rely on agile and automated network management, where the usage of diverse resources must be jointly optimized with surgical accuracy. A number of key wireless network functionalities (e.g., traffic steering, energy savings) give rise to hard optimization problems. What is more, high spatiotemporal traffic variability coupled with the need to satisfy strict per slice/service SLAs in modern networks, suggest that these problems must be constantly (re)solved, to maintain closetooptimal performance. To this end, in this paper we propose the framework of Online Network Optimization (ONO), which seeks to maintain both agile and efficient control over time, using an arsenal of datadriven, adaptive, and AIbased techniques. Since the mathematical tools and the studied regimes vary widely among these methodologies, a theoretical comparison is often out of reach. Therefore, the important question “what is the right ONO technique?” remains open to date. In this paper, we discuss the pros and cons of each technique and further attempt a direct quantitative comparison for a specific use case, using real data. Our results suggest that carefully combining the insights of problem modeling with stateoftheart AI techniques provides significant advantages at reasonable complexity.
I Introduction
Ia Modern Network Management and The Need for Online Optimization
The high complexity of the existing management methodologies for wireless networks, together with the need for rapid network reconfiguration to maintain good performance, led in the past to the concept of SelfOrganized Networks (SON), which proposed to automate network management [1]. Although SONs are on the research spotlight for several years, their practical deployment is not yet fully realized. Moreover, a number of originally envisioned SON techniques have been outpaced by recent developments in 5G wireless and cloudbased networks, such as softwarization (e.g., Software Defined Networks  SDNs), Network Function Virtualization (NFV), and slicing. This evolution has promoted Management and Orchestration (MANO) as a concept of redesigning network management altogether. With a growing attention on Artificial Intelligence (AI)based methodologies to improve and automate MANO, the topic of SON has been revitalized [2, 3]. For example, major telecom operators announced recently their intention to employ AI methodologies for SON features, predicting to halve the times for routine network management tasks [4]. Nevertheless, MANO and such AI techniques do not cover the transport network or RAN.
A fundamental goal of automated network management is to optimize the network. A number of key mobile network functionalities give rise to hard optimization problems in this context, including but not limited to: interference management, load balancing and traffic steering, energy saving, and more recently Service Level Agreement (SLA) enforcement for 5G network slices and services. While a number of works exist addressing such problems, the increasing spatiotemporal variability of network conditions gives rise to complex online optimization variants that must further satisfy a number of challenging requirements: (i) adaptability to changing network conditions, (ii) agility, to ensure adaptation occurs in a timely manner, and with low network overhead, and (iii) efficiency, to ensure the network operating point remains as close to the theoretically optimal, i.e., an oraclederived ideal configuration for the exact current network conditions and traffic demand.
IB Variability of Network Traffic
Telecommunication networks exhibit an inherent variability at many different levels and time scales. For example, optical fibers fail unexpectedly, the quality of wireless channels fluctuates rapidly due to multipath fading, etc. While network management traditionally monitors and reacts to potential network failures, traffic demand is emerging as a key source of spatiotemporal variability in modern wireless networks. This is partly due to increased network densification; each (small) cell deals with much fewer data users, not giving rise to the type of law of large number effects “smoothing” traffic demand in traditional macrocells. A second key reason is the growing use of smartphones and tablets as a user’s main Internetaccess device, making traffic demand follow content popularity ebbs and flows.
These and other factors lead to frequent, and often large fluctuations in traffic demand, over both time and space, affecting the majority of aforementioned mobile functions and related optimization problems. Due to the inherent traffic variability, maintaining good performance in the above features requires to continuously (re)solve the same complex optimization problem under different inputs (i.e., network and traffic conditions). Furthermore, since traffic surges might occur, the real performance of the system does not always follow the one predicted by the optimization model. Hence, Online Network Optimization (ONO) must satisfy two, often contradicting goals: (i) ensure closetooptimal performance for the majority of predicted traffic conditions; (ii) ensure good (or at least stable) performance when the actual traffic conditions diverge from predicted ones.
IC Data Traffic Prediction and Adaptation Methods
There are many ways to capture the variability of traffic demand: in this study, we consider a generic model where captures the number of service units demanded from location at time (depending on the problem setup, this could be bits/sec, packets/sec, flows/sec, etc.). In order to achieve the above strict goals of ONO in a modern mobile network, it is important to understand these fluctuations and try to predict them. A number of traditional and more modern methods can be applied to this end, as we will detail subsequently.
Hence, one could apply a twostep approach to ONO: first, regularly predict the traffic demand in the next period(s) of interest with stateoftheart methods, and then plug these estimates into the respective optimization model which will provide the optimal configuration for the predicted traffic demand. While such a method is well understood, its success hinges on correctly modeling an increasingly complex network and finding the appropriate prediction method that best “fits” each problem. Furthermore, robustness to prediction inaccuracies require keeping track of higher order statistics of demand variability and reformulating existing problems to hedge for prediction errors [10].
More recently, Artificial Intelligence (AI) methods, e.g., based on deep learning, have found significant success in solving complex pattern recognition and control problems. As a result, AIbased methods have attracted interest by the 5G community, offering a radically different way of approaching standard as well as new mobile network functionalities. Two are the key driving factors behind this trend: first, the abundance of large numbers of network data (known as “Big Data”) that can be collected by SDNenabled network components at various network layers; second, the fact that AI methods are usually “modelfree”. AIbased methods for ONO essentially bundle the above two steps, prediction and optimization, into one.
Nevertheless, due to the infancy of these methods for mobile networking problems, a number of important questions remain unanswered: (i) Which method performs better, and for which types of problems? (ii) How should such methods be best applied to mobile networking problems? (iii) What is the computational complexity and related convergence properties? Since the mathematical tools and the studied regimes vary widely among these methodologies, a theoretical comparison is often out of reach. Therefore, the important question “how should online network optimization be applied in 5G and beyond networks?” remains open to date.
ID Comparing ONO Techniques and the Case for ModelDriven AI
To obtain some initial answers to these questions, in this paper, we revisit the topic of AI for SON. Our main contributions can be summarized as follows:

We propose a generic framework for ONO that includes widely different techniques (including AI) to optimize networks in an online manner, predicting and adapting to traffic variability.

We highlight the difficulty of comparing online optimization techniques, and provide intuition into what the strong features for each technique are.

We use a specific problem, namely traffic steering for dense, heterogeneous networks to perform a detailed quantitative comparison of various techniques. Our results suggest that traditional twostep approaches to ONO suffer from a tradeoff between robustness and performance; on the other hand, we show that AI techniques rapidly adapt to changing network conditions, ensuring high accuracy.

Finally, we demonstrate that direct application of AI techniques to mobile networking problems might not suffice as the complexity quickly explodes, requiring the tuning of a prohibitively large number of parameters. Instead, we show that carefully choosing the parameters based on an appropriate problem model, and then applying AI can lead to both significant complexity reduction and better performance. This makes a strong case for ModelDriven Artificial Intelligence for future wireless network management.
Ii Online Network Optimization (ONO)
Iia Architecture & Platform
Online network optimization applies continuous network adaptation so as to optimize performance, as measured by various key performance indicators. In this paper, we consider ONO in future wireless networks and focus on the highlevel network architecture shown in Figure 1. In such networks, traffic will be generated from / destined to a huge number of different mobile devices, including smartphones, Internet of Things and other industrial endusage devices. We assume that applications as well as VNFs will run on generic hardware hosted in various data centers collectively referred to as the cloud. These data centers can be categorized as edge clouds (often small server installations distributed over a geographical area), and core clouds (large server installations typically serving a much larger area). A crucial architectural assumption for ONO is the realtime availability of network monitoring data and the remote controllability of the access network. Both assumptions, are guaranteed today either by sophisticated network devices, or more simply by SDNs. In the architecture shown in Figure 1, the ONO controller rests at a central cloud location, though distributed architectures may be envisaged.
IiB Performance Metrics
Network performance is a very broad term and needs first to be quantified in order to be optimized. In addition, in ONO we are not just interested to optimize a single localized metric at a given instance but are rather concerned with control of networkwide parameters over various time scales to continuously adapt to changing user demand and operating conditions. For instance, the achievable data rate in the downlink channel of a radio cell varies over time and from user to user.
Statistical Performance Metrics: While one might try to optimize the instantaneous data rate of every user at every time instance, networkwide optimization and management can only occur at a longer time scale, and thus statistical metrics are usually chosen. These include:

Time average: of the metric of interest often also averaged over many connections, links, cells, etc. Popular variants include averaging over a time window or using a forgetting factor for old samples.

Variability: captured by variance, standard deviation, or higher order statistics.

Compliance probability: the probability that the metric exceeds / falls short of a given threshold.

Regret: a measure of accumulated “error” of the algorithm, compared to an ideal or oracle one (often not implementable in practice).

Fairness: utility that takes into account the relative value for the metric of interest across different entities (e.g., users), often generalized by “alphafairness”; the latter parameterizes the tradeoff between fairness and optimizing the average metric over all users.
However, more often than not, ONO is concerned with optimizing a number of heterogeneous and even contradicting metrics such as network operating cost and user Quality of Service, connection throughput and delay, etc. There are three common approaches to tackle such cases:

Combining the different metrics into one (e.g., as an appropriately weighted sum).

Applying multiobjective optimization and provide Pareto optimal solutions.

Optimize for one metric (or a weighted sum of a subset of the metrics) and use the remaining metrics as constrains, e.g., optimize throughput subject to a maximum allowable delay constraint.
Algorithmic Metrics: One of the difficulties in comparing ONO algorithms developed independently is that they are often concerned with optimizing a different metric. In addition, when comparing different ONO algorithms, there are numerous criteria to consider, such as algorithmic complexity/scalability, training time, required training data size, sensitivity to noisy training data, response time, convergence time, convergence reliability and control cost, and communication overhead. A detailed discussion of some of the above criteria, from the point of view of machine learning algorithms, is provided in [3].
IiC ONO Usecase: Traffic Steering
As explained in the previous section, the prime source of network variability is the fluctuation of traffic load which we capture by the load measure . Note that, if we observe a specific location , becomes a time series with seasonal structure. Figure 1 shows the weekly traffic fluctuation in two different places in Milan as measured from Telecom Italia in 2014 [5]. We notice the 24hour periodic “diurnal pattern”, a common characteristic of all networks attributed to the daily human activity.
On the other hand, when fixing a time instance , provides a spatial depiction of traffic in the network, as shown in Figure 1. However, when considering jointly , a number of new characteristics appear. When the city population is moving from the outskirts to the city center during the morning rush hour, strong negative traffic correlations couple the traffic evolution of different locations. On the other hand, night and day times are positively correlated (i.e., most locations have low traffic in the night and high traffic in the day), and hence these correlations, if understood, can improve prediction accuracy.
For reasons of exposition, we turn our focus on the following example application of the ONO framework. From the point of view of a mobile network operator, we would like to use ONO techniques to decide how to steer user traffic to different Base Stations (BS). Due to the spatiotemporal fluctuations of user demand, the incurred load at each BS also fluctuates, hurting locally the user performance. The network operator should optimize the steering of mobile traffic to available BSs in order to keep the loads balanced, and improve overall user experience.
Our scenario is based on the Telecom Italia dataset [5], and has the following ingredients:

The dataset determines in 10k locations in Milan, in intervals of ten minutes, for the duration of five weeks.

The control variables indicate what fraction of traffic is steered to BS .

The load contribution to BS from location is given by , where is the connection rate from BS to location . The total load of BS is found by summing up the contribution from each location.

At each time instance, the goal of our ONO is to minimize the average cost between BSs, where the cost at BS is given by . This objective leads to proportionally fair loads.
Iii ONO Techniques
Iiia Online Learning
When data monitoring is complex, or training over the entire dataset is computationally infeasible, it is preferable to use techniques that ignore past data and directly adapt the decisions to new data patterns. This draws from online learning, a machine learning method used to answer a sequence of questions given data available in a sequential order [6]. At round of the sequence, the online learning algorithm answers a given question with a prediction chosen from a set of hypotheses . When the correct answer is revealed, the online learning algorithm computes the loss which measures the discrepancy between its prediction and the correct answer (regret). The learner’s goal is to minimize the cumulative loss by adapting its answers to the new information received during the previous round. Intuitively, if there is no correlation between two consecutive rounds, this method fails.
A special case of interest is when the hypothesis set and the loss function are both convex [7]. A prominent algorithm in this case is the online gradient descent, where at each iteration of the sequence we take a step in the direction of the gradient of the previous loss [8]. In this work, we study Online Mirror Descent (OMD) which generalizes the aforementioned online gradient descent method. At a high level, in OMD a point in the primal space is mapped to the dual space through a function (called mirror map) to perform the gradient update, changing in this way the Euclidean geometry. This idea is motivated by the fact that using the correct way to measure the variation of a function can make an enormous difference in terms of learning rate. More interestingly, by picking a proper mirror map, the regret of OMD can be shown to grow as (where is the dimensionality of the problem) and hence be nearlyindependent of the dimensionality of the problem [6]. For our traffic steering example, we choose the entropic mirror map which fits well to the geometry of the corresponding optimization problem.
IiiB DataDriven Optimization of Forecasts
Forecasting techniques provide an accurate prediction of traffic which can be used to precalculate sophisticated ONO configurations. This provides a powerful methodology for the network operator to enhance resource efficiency and provide guaranteed service. Since the complicated optimization problem is solved offline, the response time of this approach is very small.
The diurnal patterns of traffic demand can be accurately predicted by advanced time series analysis techniques, such as the seasonal AutoRegressive Integrated Moving Average (ARIMA) model, which remove the periodicity from the signal. More recently, with the proliferation of neural networks, various techniques have been developed with increased efficiency, to name a few, Recurrent Neural Nets (RNN), Hierarchical Temporal Memory (HTM), Long ShortTerm Memory (LSTM). LSTMs are particularly good at multivariate timeseries analysis, and therefore are very useful when traffic exhibits spatial correlations on top of temporal correlations.
The quality of the prediction is crucial for the actual playout of the precalculated configuration to satisfy userrequested Quality of Service. If predictions by an oracle were available, the solution of a deterministic optimization problem would give the best configuration at any time instance. In reality, prediction is never perfect, and thus we need to take into consideration events that are unpredictable or not captured accurately by models or neural networks. After a good forecasting tool is applied to our data, we may assume that the prediction error can be described by an i.i.d. Gaussian process , i.e., holds. This motivates us to optimize the actual traffic , which in our model is now a stochastic process. Such stochastic optimizations can be efficiently solved with Robust DataDriven techniques, such as those in [9]. A high quality prediction will have small variance in ; later, we will provide some experiments to evaluate how this prediction affects the performance of ONO techniques.
A method for solving a predictiondriven ONO for traffic steering is described in [10]. There, the resulting stochastic optimization problem is configurable to protect an SLA on the average queue delay experienced in the BS queues as well as guaranteeing service with a probability (a design parameter), while it optimizes for the expected load and prediction error variance by selecting the association decisions. The problem is shown to be convex and efficiently solvable with the proposed algorithm for multiple convex functions capturing a wide variety of network scope objectives.
IiiC Artificial Intelligence
The availability of computational power and the abundance of data have caused a surge of interest in neural networks for AIbased optimization and decision making. Tuning the weights of a neural network appropriately (called training), we obtain a model which approximates well the optimal decision function: given an input, the output is the correct decision (e.g., classification) that corresponds to that input. In the context of network traffic control, the input is related to the past traffic demands and the output to the optimal control decision. In our example, the input is related to traffic history , and the output is a prediction of the optimal steering decision at . The neural network we use is based on LSTM. Such networks are equipped by a state which acts as memory and use gates to regulate the importance of previous vs. current data samples [14]. This architecture allows them to capture temporal short and longrange dependencies in the data, making them ideal for our scenario.
Overall, the potential of AI in traffic control has been underexplored. One approach is to use Reinforcement Learning: start with no knowledge of the system and progressively train a deep neural network by observing the outcomes of the policies applied [12]. Recent work [11] has applied this idea for traffic routing. Here, instead, we propose to leverage the cyclostationary behavior of traffic and the fact that optimal solutions to past traffic instances can be computed offline to train the neural network in a supervised manner: first, in order to reduce the dimensionality we set the output to the optimal base station loads (40 in our example), and use the predicted optimal loads to recover the optimal traffic steering (10000 vectors) through analytical models such as in [13]; then, each of the past samples is “labeled” with the optimal base station loads and used as the training set. The objective is to find the parameters that minimize a loss function, the difference between the predictions and actual optimal loads in the training set. This is done by gradient descent and backpropagation. Once the training is done, the neural network takes as input , finds the optimal loads and finally provides the optimal steering . We emphasize that the reduction of the state space contributes significantly to the practicality of the whole approach. A lesson learnt is that modeldriven artificial intelligence can be proven greatly superior to blindly applying a deep neural network when it comes to an ONO scenario.
Iv Performance Evaluation
Iva Simulator setup
We have built a timeslotted simulator in Python, driven by real data sets, with the longterm vision to provide a tool that will allow to easily compare different ONO techniques. In its current version, our simulator implements different approaches for ONO applied to the problem of traffic steering. In this section we present performance evaluation results obtained by using our simulator on the Milan dataset and associated network topology. The simulator is composed by three main modules (Figure 2):

Prediction module. We perform prediction for (i) traffic and (ii) base station loads. On the one hand, (i) is based on sample mean, seasonal ARIMA or LSTM; the predictors were combined with the robust methodology described above, and the impact of their quality is shown in Figure 4. On the other hand, (ii) is based on a RNN and instead aims to predict the optimal load for the next time slot. Note that the online learning method does not require prediction as it adapts its decisions based on the current incoming data.

DeviceBS association. Each policy computes the deviceBS association matrix per location. In a realistic scenario, the procedure of gathering and processing the traffic statistics along with the computation of a new user association matrix might require several minutes. While the data granularity is of ten minutes, the deviceBS matrix is updated every hour to account for the aforementioned potential overhead, i.e., once the matrix is computed, it will be unchanged for six consecutive time slots.

Metric computation. The metrics computed by the simulator are the average delay per BS and the rejected traffic at each time slot. We choose these metrics because they represent the tradeoff between cost reduction (in terms of average delay) and constraint violations (in terms of rejected traffic). We expect that an aggressive policy which tries to minimize the cost would incur a larger number of constraint violations than a more conservative strategy.
For comparison purposes, we also compute the optimal association map based on the perfect knowledge of the future traffic. We refer to the outcomes of this map as oracle.
IvB Takeaway messages from simulations
In Figure 3 we depict the average delay per BS over one week. When at least one BS is overloaded, the delay becomes infinity, and the incoming traffic associated to such BSs is rejected. According to our dataset, Monday and Tuesday have much more traffic than the rest of the week (we guess this is due to the imminent Sant’Ambrogio holiday in Milan). At a high level, robust optimization introduces a “safety” gap which prevents from overloading BSs at the cost of larger average delay. On the other hand, we note that adaptive learning leads to constraint violations when traffic is high. We elaborate more on this in the following paragraphs.
1) Robust optimization achieves a specific SLA. As already explained, stochastic optimization can be efficiently combined with robust datadriven methods. While in general this technique increases the cost (e.g., delay), making an ONO method robust reduces constraint violations thus better conforming to the SLA. In the simulated scenario, we require BSs to be overloaded less than the 0,1% of the time, and this requirement is indeed satisfied (see Table I).
2) Online learning adapts quickly to a good state, though sometimes fails to capture a rapid increase of traffic. Differently from datadriven optimization, OMD does not use data to optimize its objective or to “protect” its decision from traffic fluctuations. Therefore, a steep traffic increase might lead to large rejected traffic. This is confirmed by our experiments (see Figure 3): on Monday and Tuesday OMD suffers from several load constraint violations since traffic radically changes from nighttime to mornings. In constast, steep traffic increases can be handled more easily when the average cumulative traffic is lower: during the last three days, OMD achieves delays close to the oracle with low or no constraint violations (see also Table I).
3) Data prediction reduces the cost. In Figure 4 we compare the traffic prediction performed by different methods with the actual playout of precalculated ONO configurations based on the robust datadriven algorithm in [10]. The outputs are the average cost (in blue), the average cost during peak hours, i.e., between 11:00 and 13:00 (in light blue), and the mean squared error (MSE) compared to the oracle (in red). We observe that, by increasing the efficiency of the prediction scheme, we achieve significantly lower cost with approximately the same guarantee of violations. What is important is to verify that there is no predictable (periodic or trend) aspect in the residuals which is crucial for protection against SLA failures.
4) Modeldriven AI fuses the best of adaptive learning and datadriven optimization. The RNN used in the simulations is able to predict the optimal load per BS in an extremely accurate way. Although the mapping between such predicted optimal loads and the deviceBS association matrix is not trivial and can be improved, simulations still provide impressive results: as we can see from Table I, AI achieves the lowest average delay from BSs with limited rejected traffic. This is due to the fact that AI is a datadriven method which adapts its decisions according to new data as in the online learning approach (new data are not used to retrain the neural network). In other words, it naturally combines the best features of adaptive learning and datadriven optimization.
Method  Mon  Tue  Wed  Thu  Fri  Avg  

Avg delay 
OMD  11,62  11,50  9,73  8,79  8,52  10,03 
Robust  10,95  11,90  10,01  9,50  9,41  10,35  
AI  9,90  10,09  9,13  9,07  8,43  9,32  
Oracle  9,57  9,81  8,86  8,54  8,35  9,02  
MSE 
OMD  8,63  21,66  9,29  0,18  0,04  7,96 
Robust  2,30  17,23  1,61  0,95  1,21  4,66  
AI  0,49  0,95  0,25  2,63  0,02  0,87  
Rejected 
OMD  1,10  0,24  0,00  0,00  0,00  0,27 
Robust  0,00  0,06  0,00  0,00  0,00  0,01  
AI  0,00  0,26  0,00  0,00  0,00  0,05 
IvC Which method to choose?
Although the simulations results largely lean in favor of modeldriven AI, the answer is not trivial. Surely, AI provides the best tradeoff between robustness and performance. However, its outcomes might be unexpected if the underlying neural network is not properly calibrated: since AI could require a huge amount of resources (i.e., memory, CPU) depending on the dimensionality of the problem, it is important to combine this technique with modeling intuition and appropriate definition of the problem state space. On the other hand, adaptive machine learning algorithms are very simple and quick, but sometimes fail to protect the system from rapid traffic surges. From a practical perspective, we believe that a fusion of these techniques is required. On one hand AI can be employed to provide a good robust prediction of the optimal action, and then a number of OMD iterations can be employed to smooth the final decision. In fact, in our setting this fusion is extremely effective.
V Conclusions
In this paper, we have stressed the need for automating online network optimization, and have proposed such a framework, ONO, that can combine online, datadriven, and modelbased AI techniques to achieve a number of optimization goals including adaptability, robustness, scalability, and others. We have performed a direct comparison of these techniques for the challenging problem of traffic steering and load balancing in a dense, heterogeneous networks. Our study demonstrated how the inherent variability of network traffic can be successfully addressed by selfoptimizing methodologies. The most prominent of them, is based on artificial intelligence, but also makes use of deep modeling insights of the problem. Therefore a key take home message is that the knowledge of network models can enhance the efficiency of AI techniques.
References
 [1] C. Brunner and D. Flore, “Generation of Pathloss and Interference Maps as SON Enabler in Deployed UMTS Networks,” IEEE VTC, 2009.
 [2] S. Latif et al., “Artificial Intelligence as an Enabler for Cognitive SelfOrganizing Future Networks,” Feb. 2017, arXiv:1702.02823.
 [3] P. V. Klaine et al., “A Survey of Machine Learning Techniques Applied to SelfOrganizing Cellular Networks,” IEEE Communications Surveys & Tutorials, vol. 19, no. 4, 2017, pp. 2392–2431.
 [4] S. Tenorio, “AI enabled augmented engineering increases network optimisation speed by over 45%,” Vodafone press release, Sept. 2017.
 [5] The 1st edition of Big Data Challenge, https://dandelion.eu/datamine/openbigdata, Telecom Italia, 2014.
 [6] S. ShalevShwartz, “Online Learning and Online Convex Optimization,” Foundation and Trends in Machine Learning, vol. 4, no. 2, 2012, pp. 107–194.
 [7] E. Hazan, “Introduction to Online Convex Optimization,” Foundations and Trends in Optimization, vol. 2, no. 3–4, 2016, pp. 157–325.
 [8] M. Zinkevich, “Online Convex Programming and Generalized Infinitesimal Gradient Ascent,” ICML, 2003.
 [9] D. Bertsimas, V. Gupta, and N. Kallus, “DataDriven Robust Optimization,” Mathematical Programming, vol. 167, no. 2, 2018, pp. 235–292
 [10] N. Liakopoulos, G. Paschos, and T. Spyropoulos, “Robust User Association for Ultra Dense Networks,” IEEE INFOCOM, 2018.
 [11] Z. Xu et al., “ExperienceDriven Networking: A Deep Reinforcement Learning Approach”, IEEE INFOCOM, 2018.
 [12] H. Mao et al., “Resource Management with Deep Reinforcement Learning, ”, HotNets, 2016
 [13] H. Kim et al., “Optimal User Association and Cell Load Balancing in Wireless Networks, ” IEEE INFOCOM, 2010.
 [14] F. A. Gers, J. Schmidhuber and F. Cummins, “Learning to Forget: Continual Prediction with LSTM,” Neural Computation, vol. 12, no. 10, 2000, pp. 2451 â 2471.