Application of Machine Learning in Wireless Networks: Key Techniques and Open Issues
As a key technique for enabling artificial intelligence, machine learning (ML) has been shown to be capable of solving complex problems without explicit programming. Motivated by its successful applications to many practical tasks like image recognition and recommendation systems, both industry and the research community have advocated the applications of ML in wireless communication. This paper comprehensively surveys the recent advances of the applications of ML in wireless communication, which are classified as: resource management in the MAC layer, networking and mobility management in the network layer, and localization in the application layer. The applications in resource management further include power control, spectrum management, backhaul management, cache management, beamformer design, and computation resource management, while ML-based networking focuses on the applications in base station (BS) clustering, BS switching control, user association, and routing. Each aspect is further categorized according to the adopted ML techniques. Additionally, given the extensiveness of the research area, challenges and unresolved issues are presented to facilitate future studies, where the topics of ML-based network slicing, infrastructure update to support ML-based paradigms, open data sets and platforms for researchers, theoretical guidance for ML implementation, and so on are discussed.
Since the rollout of the first generation wireless communication system, wireless technology has been continuously evolving from supporting basic coverage to satisfying more advanced needs. In particular, the fifth generation (5G) mobile communication system is expected to achieve a considerable increase in data rates, coverage, and the number of connected devices, while significantly reducing latency as well as energy consumption . Moreover, 5G is also expected to provide more accurate localization, especially in an indoor environment .
These goals can be potentially met by enhancing the system involving different aspects. For example, computing and caching resources can be deployed at the network edge to fulfill the demands for low latency and reduce energy consumption [3, 4], and the cloud computing-based baseband unit pool can provide high data rates with the use of large-scale collaborative signal processing among base stations (BSs) and can save much energy via statistical multiplexing . Furthermore, the co-existence of heterogenous nodes, including macro BSs (MBSs), small base stations (SBSs), and UEs with device-to-device (D2D) capability, can boost the throughput and simultaneously guarantee seamless coverage . However, the involvement of computing resource, cache resource, and heterogenous nodes cannot alone satisfy the stringent requirements of 5G; the algorithmic design with high performance for resource management, networking, mobility management, and localization is essential as well. Faced with the characteristics of 5G, current resource management, networking, mobility management, and localization algorithms expose several limitations, which are explained below.
First, with the proliferation of smart phones, the expansion of the network scale, and the diversification of services in the 5G era, the amount of data, which is related to applications, users, and networks, will experience an explosive growth, and such data can contribute an enhanced system performance if properly utilized . However, many of the existing algorithms are incapable of processing and/or utilizing the data, which means much valuable information or patterns hidden in the data are wasted. Second, to adapt to the dynamic network environment, algorithms like radio resource management algorithms are often fast but heuristic in practice, which means the resulting system performance can be far from optimal, and, thus, these algorithms can hardly meet the stringent performance requirements of 5G. In academic studies, research has been done primarily based on numeric optimization to develop more effective algorithms to reach optimal or suboptimal solutions. However, many studies assume a static network environment. Furthermore, considering that 5G networks will be more complex, hence leading to more complex mathematical formulations, the developed algorithms can possess high complexity. Thus, these algorithms will be inapplicable in the real dynamic network, due to their long decision-making time. Third, the development of algorithms like radio resource allocation algorithms in literatures is often based on a well-defined mathematical problem, whose formulation requires some ideal assumptions to make the problem more tractable. Nevertheless, the ideal assumptions can cause a significant performance loss when algorithms are applied in practical systems. Moreover, given the large number of nodes in future 5G networks, traditional centralized algorithms for network management can be infeasible due to the high computing burden and high cost to collect information. Therefore, it is preferred to enable network nodes to autonomously make decisions based on local observations.
As an important enabling technology for artificial intelligence, machine learning has been successfully applied in many areas, including computer vision, medical diagnosis, search engines, and speech recognition . Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. Machine learning techniques can be generally classified as supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the aim of the learning agent is to learn a general rule mapping inputs to outputs with example inputs and their desired outputs provided, which constitute the labeled data set. In unsupervised learning, no labeled data is available, and the agent tries to find some structures from its input. While in reinforcement learning, the agent continuously interacts with an environment and tries to generate a good policy according to the immediate reward/cost fed back by the environment. In recent years, the development of fast and massively parallel graphical processing units and the significant growth of data have contributed to the progress in deep learning, which can achieve more powerful representation capabilities. For machine learning, it has the following advantages to overcome the drawbacks of traditional resource management, networking, mobility management, and localization algorithms.
The first advantage is that machine learning has the ability to learn useful information from input data, which can help improve network performance. For example, convolutional neural networks and recurrent neural networks can extract spatial features and sequential features from observed time-varying Received Signal Strength Indicator (RSSI), which can mitigate the ping-pong effects in mobility management , and more accurate indoor localization for three-dimensional space can be achieved by using an auto-encoder to extract robust fingerprint patterns from noisy RSSI measurements . Second, machine learning based resource management, networking, and mobility management algorithms can well adapt to the dynamic environment. For instance, by using the deep neural network proven to be an universal function approximator, traditional high complexity algorithms can be closely approximated, and similar performance can be achieved but with much lower complexity , which makes it possible to quickly response to environmental changes. In addition, the dynamic environment can also be handled by reinforcement learning, which can achieve fast network control based on learned policies with the goal of optimizing long-term system performance. Third, since model-free reinforcement learning only needs reward feedback during the learning process, it does not require a mathematical problem formulation in closed form. Furthermore, distributed reinforcement learning can help realize the self-optimization of distributed nodes. At last, by involving transfer learning, machine learning has the ability to quickly solve a new problem which is similar to a problem that has been already solved. It is known that there exist some temporal and spatial relevancies in wireless systems such as traffic loads between neighboring regions . Hence, it is possible to transfer the knowledge acquired in one task to another relevant task, which can speed up the learning process for the new task. However, in traditional algorithm design, such knowledge is often not utilized.
Driven by the recent development of the applications of machine learning in wireless networks, some efforts have been made to survey related research and provide useful guidelines. In , the basics of some machine learning algorithms along with applications in future wireless networks are introduced, such as Q-learning for the resource allocation and interference coordination in downlink femtocell networks and Bayesian learning for channel parameter estimation in a massive, multiple-input-multiple-output network. While in , the applications of machine learning in wireless sensor networks (WSNs) are discussed, and the advantages and disadvantages of each algorithm are evaluated with guidance provided for WSN designers. In , different learning techniques, which are suitable for Internet of Things (IoT), are presented, taking into account the unique characteristics of IoT including resource constraints and strict quality-of-service requirements, and studies on learning related to IoT are also reviewed in . The applications of machine learning in cognitive radio (CR) environments are investigated in  and . Specifically, the author in  classifies those applications into decision-making tasks and classification tasks, while the author in  mainly concentrate on model-free strategic learning. Moreover, the authors in  and  focus on the potentials of machine learning in enabling the self-organization of cellular networks with the perspectives of self-configuration, self-healing, and self-optimization. To achieve high energy efficiency in wireless networks, related promising approaches based on big data are investigated in . In , a comprehensive tutorial on the applications of neural networks (NNs) is provided, which presents the basic architectures and training procedures of different types of NNs, and several typical application scenarios are identified. In  and , the applications of deep learning in the physical layer are summarized. Specifically, the authors in  see the whole communication system as an auto-encoder, whose task is to learn compressed representations of user messages that are robust to channel impairments. A similar idea is presented in , where a broader area is surveyed including modulation recognition, channel decoding, and signal detection. Also focusing on deep learning, the literatures [25, 26, 27] pay more attention to upper layers, and the surveyed applications include channel resource allocation, routing, scheduling, and so on.
Although significant progress has been achieved toward surveying the applications of machine learning in wireless networks, there still exist some limitations in current works. More concretely, the literatures [13, 17, 19, 20, 21] are seldom related to deep learning, deep reinforcement learning, and transfer learning, and the content in [23, 24] only focuses on the physical layer. In addition, only a specific network scenario is covered in [14, 15, 16, 17, 18], while the literatures [22, 25, 26, 27] just pay attention to NNs or deep learning. Considering these shortcomings and the ongoing research activities, a more comprehensive survey framework for incorporating the recent achievements in the applications of diverse machine learning methods in different layers and network scenarios seems timely and significant. Specifically, this paper surveys state-of-the-art applications of various machine learning approaches from the MAC layer up to the application layer, covering resource management, networking, mobility management, and localization. The principles of different learning techniques are introduced, and useful guidelines are provided based on the lessons learned in surveyed works. In addition, to facilitate future applications of machine learning, challenges and open issues are identified. Overall, this survey aims to fill the gaps found in the previous papers [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27] as stated above, and the contributions of this paper are threefold:
Popular machine learning techniques utilized in wireless networks are comprehensively summarized including their basic principles and general applications, which are classified into supervised learning, unsupervised learning, reinforcement learning, (deep) NNs, and transfer learning. Note that (deep) NNs and transfer learning are separately highlighted because of their increasing importance to wireless communication systems.
A comprehensive survey of the literatures applying machine learning to resource management, networking, mobility management, and localization is presented, covering all the layers except the physical layer that has been thoroughly investigated. Specifically, the applications in resource management are further divided into power control, spectrum management, beamformer design, backhaul management, cache management, and computation resource management, and the applications in networking are divided into user association, BS switching control, routing, and clustering. Moreover, surveyed papers in each application area are further organized by their adopted machine learning techniques, and the majority of the network scenarios that will emerge in the 5G era are included such as vehicular networks, small cell networks, and cloud radio access networks.
The future challenges and unsolved issues related to the applications of machine learning in wireless networks are identified with regard to machine learning based network slicing, standard data sets for research, theoretical guidance for implementation, and so on.
The remainder of this paper is organized as follows: Section II introduces the popular machine learning techniques utilized in wireless networks, together with a summarization of their applications. The applications of machine learning in resource management are summarized in Section III. In Section IV, the applications of machine learning in networking are surveyed. Section V and VI summarize recent advances in machine learning based mobility management and localization, respectively. The current challenges and open issues are presented in Section VII, followed by the conclusion in Section VIII. For convenience, all abbreviations are listed in Table I.
|3GPP||generation partnership project||OFDMA||orthogonal frequency division multiple access|
|5G||generation||OISVM||online independent support vector machine|
|AP||access point||OPEX||operating expense|
|BBU||baseband unit||OR||outage ratio|
|BS||base station||ORLA||online reinforcement learning approach|
|CBR||call blocking ratio||OSPF||open shortest path first|
|CDR||call dropping ratio||PBS||pico base station|
|CF||collaborative filtering||PCA||principal components analysis|
|CNN||convolutional neural network||PU||primary user|
|CR||cognitive radio||QoE||quality of experience|
|CRE||cell area extension||QoS||quality of service|
|CRN||cognitive radio network||RAN||radio access network|
|C-RAN||cloud radio access network||RB||resource block|
|CSI||channel state information||ReLU||rectified linear unit|
|DRL||deep reinforcement learning||RL||reinforcement learning|
|DNN||dense neural network||RNN||recurrent neural network|
|EE||energy efficiency||RP||reference points|
|ELM||extreme learning machine||RRH||remote radio head|
|ESN||echo state network||RRM||radio resource management|
|FBS||femto base station||RSRP||reference signal receiving power|
|FLC||fuzzy logic controller||RSRQ||reference signal receiving quality|
|FS-KNN||feature scaling based k nearest neighbors||RSS||received signal strength|
|GD||gradient descent||RSSI||received signal strength indicator|
|GPS||global positioning system||RVM||relevance vector machine|
|GPU||graphics processing unit||SBS||small base stations|
|Hys||hysteresis margin||SE||spectral efficiency|
|IA||interference alignment||SINR||signal-to-interference-plus-noise ratio|
|ICIC||inter-cell interference coordination||SNR||signal-to-noise ratio|
|IOPSS||inter operator proximal spectrum sharing||SOM||self-organizing map|
|IoT||internet of things||SON||self-organizing network|
|KNN||k nearest neighbors||SU||secondary user|
|KPCA||kernel principal components analysis||SVM||support vector machine|
|KPI||key performance indicators||TDMA||time division multiple access|
|LOS||line-of-sight||TDOA||time difference of arrival|
|LSTM||long short-term memory||TL||transfer learning|
|LTE||long term evolution||TOA||time of arrival|
|LTE-U||long term evolution-unlicensed||TTT||time-to-trigger|
|MAB||multi-armed bandit||TXP||transmit power|
|MAC||medium access control||UE||user equipment|
|MBS||macro base station||UAV||unmanned aerial vehicle|
|MEC||mobile edge computing||UDN||ultra-dense network|
|ML||machine learning||UWB||ultra-wide bandwidth|
|MUE||macrocell user equipment||WLAN||wireless local area network|
|NE||Nash equilibrium||WMMSE||weighted minimum mean square error|
|NLOS||non-line-of-sight||WSN||wireless sensor network|
Ii Machine Learning Preliminaries
In this section, various machine learning techniques employed in the studies surveyed in this paper are introduced, including supervised learning, unsupervised learning, reinforcement learning, (deep) NNs, and transfer learning.
Ii-a Supervised Learning
Supervised learning is a machine learning task that aims to learn a mapping function from the input to the output, given a labeled data set. Specifically, supervised learning can be further divided into regression and classification based on the continuity of the output. In surveyed works, the following supervised learning techniques are commonly adopted.
Ii-A1 Support Vector Machine
The basic support vector machine (SVM) model is a linear classifier, which aims to separate data points that are -dimensional vectors using a hyperplane. The best hyperplane is one that leads to the largest separation or margin between the two given classes, and is called the maximum-margin hyperplane. However, the data set is often not linearly separable in the original space. In this case, the original space can be mapped to a much higher dimensional space by involving kernel functions, such as polynomial or Gaussian kernels, which results in a non-linear classifier. More details about SVM can be found in .
Ii-A2 K Nearest Neighbors
K Nearest Neighbors (KNN) is a non parametric lazy learning algorithm for classification and regression, where no assumption on the data distribution is needed. The basic principle of KNN is to decide the class of a test point in the form of a feature vector based on the majority voting of its K nearest neighbors. Considering that training data belonging to very frequent classes can dominate the prediction of test data, a weighted method can be adopted by involving a weight for each neighbor that is proportional to the inverse of its distance to the test point. In addition, one of the keys of applying KNN is the tradeoff of the parameter , and the value selection can be referred to .
Ii-B Unsupervised Learning
Unsupervised learning is a machine learning task that aims to learn a function to describe a hidden structure from unlabeled data. In surveyed works, the following unsupervised learning techniques are utilized.
Ii-B1 K-Means Clustering Algorithm
In K-means clustering, the aim is to partition data points into clusters, and each data point belongs to the cluster with the nearest mean. The most common version of K-means algorithm is based on iterative refinement. At the beginning, means are randomly initialized. Then, in each iteration, each data point is assigned to exactly one cluster, whose mean has the least Euclidean distance to the data point, and the mean of each cluster is updated. The algorithm continuously iterates until the members in each cluster do not change. The basic principle is illustrated in Fig. 1.
Ii-B2 Principal Component Analysis
Principal component analysis (PCA) is a dimension-reduction tool that can be used to reduce a large set of variables to a small set that still contains most of the information in the large set and is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system. More specifications of PCA can be referred to .
Ii-C Reinforcement Learning
In reinforcement learning (RL), the agent aims to optimize a long term objective by interacting with the environment based on a trial and error process. Specifically, the following reinforcement learning algorithms are applied in surveyed studies.
One of the most common adopted reinforcement learning algorithms is Q-learning. Specifically, the RL agent interacts with the environment to learn the Q values, based on which the agent takes an action. The Q-value is defined as the discounted accumulative reward starting at a tuple of a state and an action. Once the Q values are learned after a sufficient amount of time, the agent can make a quick decision under the current state by taking the action with the largest Q value. More details about Q-learning can be referred to . In addition, to handle continuous state or action spaces, fuzzy Q-learning can be adopted .
Ii-C2 Multi-Armed Bandit Learning
In a multi-armed bandit (MAB) model with a single agent, the agent sequentially takes an action and then receives a random reward generated by a corresponding distribution, aiming at maximizing an aggregate reward. In this model, there exists a tradeoff between taking the current, best action (exploitation) and gathering information to achieve a larger reward in the future (exploration). While in the MAB model with multiple agents, the reward an agent receives after playing an action is not only dependent on this action but also on the agents taking the same action. In this case, the model is expected to achieve some steady states or equilibrium . More details about MAB can be referred to .
Ii-C3 Actor-Critic Learning
The actor-critic learning algorithm is composed of an actor, a critic, and an environment with which the actor interacts. In this algorithm, the actor first selects an action according to the current strategy and receives an immediate cost. Then, the critic updates the state value function based on a time difference error, and next, the actor will update the policy such that the action with a smaller cost is preferred in the strategy. When each action is revisited infinitely for each state, the algorithm will converge to the optimal state values . The process of actor-critic learning is shown in Fig. 2.
Ii-C4 Joint Utility and Strategy Estimation Based Learning
In this algorithm shown in Fig. 3, each agent holds an estimation of the expected utility, whose update is based on the immediate reward received, and the probability to select each action, named as strategy, is updated in the same iteration based on the updated utility estimation . The main benefit of this algorithm lies in that it can be fully distributed when the reward can be directly calculated locally, as, for example, the data rate between a transmitter and its paired receiver. Based on this algorithm, one can further estimate the regret of each action based on utility estimations and the received immediate reward, and then update strategy using regret estimations. In surveyed works, these two algorithms are often connected with some equilibrium concepts in game theory like Logit equilibrium and coarse correlated equilibrium.
Ii-C5 Deep Reinforcement Learning
In , the author proposes to use a deep NN, called DQN, to approximate optimal Q values, which allows the agent to learn from the high-dimensional sensory data directly, and the resulting corresponding reinforcement learning is known as deep reinforcement learning (DRL). Specifically, state transition samples experienced by interacting with the environment are stored and sampled to train the DQN, and a target DQN is adopted to generate target values, which both help stabilize the training procedure of DRL. Recently, some enhancements to DRL have come out, and readers can refer to the literature  for more details related to DRL . The main components and working process of the basic DRL are shown in Fig. 4.
Ii-D (Deep) Neural Network
Ii-D1 Dense Neural Network
As shown in Fig. 5, the basic component of a dense neural network (DNN) is a neuron corresponding with weights for inputs and an activation function for the output. Common activation functions include tanh, Relu, and so on. The input is transformed through the network layer by layer, and there is no direct connection between two non-consecutive layers. To optimize network parameters, backpropagation together with various gradient descent methods can be employed and includes Momentum, Adam, and so on .
Ii-D2 Recurrent Neural Network
As demonstrated in Fig. 6,there exist connections in the same hidden layer in the recurrent neural network (RNN) architecture. After unfolding the architecture along the time line, it can be clearly seen that the output at a certain time step is dependent on both the current input and former inputs, hence RNN is capable of remembering. RNNs include echo state networks, long short term memory, and so on.
Ii-D3 Convolutional Neural Network
As shown in Fig. 7,two main components of convolutional neural networks (CNNs) are layers for convolutional operations and pooling operations like maximum pooling. By stacking convolutional layers and pooling layers alternately, the CNN can progressively learn rather complex models based on progressive levels of abstraction. Different from dense layers in DNNs that learn global patterns of the input, convolutional layers can learn local patterns. Meanwhile, CNNs can learn spatial hierarchies of patterns.
An auto-encoder is an unsupervised neural network with the target output the same as input, and it has different variations like denoising auto-encoder and sparse auto-encoder trained with back-propagation. With the limited number of neurons, an auto-encoder can learn a compressed but robust representation of the input in order to construct the input at the output. In surveyed works, after the auto-encoder neural network is trained, the decoder is often removed, with the encoder kept as the feature extractor. The structure of an auto-encoder is demonstrated in Fig. 8.
Ii-D5 Extreme-Learning Machine
The extreme-learning machine (ELM) is a feed-forward NN with a single or multiple hidden-layers whose parameters are randomly generated using a distribution, while the weights between the hidden layer and output layer are computed by minimizing the error between the computed output value and the true output value. More specifications on ELM can be referred to .
Ii-E Transfer Learning
Transfer learning is a concept with the goal of utilizing the knowledge gained from an old task to solve a new but similar task. Such knowledge can be represented by weights in deep learning and Q values in reinforcement learning. For example, when deep learning is adopted for image recognition, one can use the weights that have been well trained for another image recognition task as the initial weights, which can help achieve a satisfactory performance with a small training set. For reinforcement learning, the Q values learned by an agent in a former environment can be involved in the Q value update in a new but similar environment to make a wiser decision at the initial stage of learning. Specifications on integrating transfer learning with reinforcement learning can be referred to . However, when transfer learning is utilized, the negative impact of former knowledge on the performance should be carefully handled, since there still exist some differences between tasks or environments.
At last, to intuitively show the applications of each machine learning method, Table II is drawn with all the surveyed works listed.
|Resource Management||Networking||Mobility Management||Localization|
|Power||Spectrum||Backhaul||Cache||Beamformer||Computation Resource||User Association||BS Switching||Routing||Clustering|
|Support Vector Machine|| |||| |
|K Nearest Neighbors||||  |
|K-Means Clustering|| || |
|Principal Component Analysis|| |
|Actor-Critic Learning||   |
|Q-Learning||    || ||   |||| ||     ||   ||   |
|Multi-Armed Bandit Learning|||||
|Joint Utility and Strategy Estimation Based Learning||||  |||
|Deep Reinforcement Learning|| |||||||||| |
|Auto-encoder||||    |
|Dense Neural Network|||||||||
|Convolutional Neural Network|||||||
|Recurrent Neural Network|| || |
|Transfer Learning||||  ||   |
|Other Reinforcement Learning Techniques|||||
|Other Supervised/Unsupervied Techniques||  |||
Iii Machine Learning Based Resource Management
In wireless networks, resource management aims to achieve proper utilization of limited physical resources to meet various traffic demands and improve system performance. Academic resource management methods are often designed for static networks and highly dependent on formulated mathematical problems. However, the states of practical wireless networks are dynamic, which will lead to the frequent re-execution of algorithms that can possess high complexity, and meanwhile, the ideal assumptions facilitating the formulation of a tractable mathematical problem can result in large performance loss when algorithms are applied to real situations. In addition, traditional resource management can be enhanced by extracting useful information related to users and networks, and distributed resource management schemes are preferred when the number of nodes is large.
Faced with the above issues, machine learning techniques including model-free reinforcement learning and NNs can be employed. Specifically, reinforcement learning can learn a good resource management policy based on only the reward/cost fed back by the environment, and quick decisions can be made for a dynamic network once a policy is learned. In addition, owing to the superior approximation capabilities of deep NNs, some high complexity resource management algorithms can be approximated, and similar network performance can be achieved but with much lower complexity. Moreover, NNs can be utilized to learn the content popularity, which helps fully make use of limited cache resource, and distributed Q-learning can endow each node with autonomous decision capability for resource allocation. In the following, the applications of machine learning in power control, spectrum management, backhaul management, beamformer design, computation resource management, and cache management will be introduced.
Iii-a Machine Learning Based Power Control
In the spectrum sharing scenario, effective power control can reduce inter-user interference, and hence increase system throughput. In the following, reinforcement, supervised, and transfer learning-based power control are elaborated.
Iii-A1 Reinforcement Learning Based Approaches
In , authors focus on the inter-cell interference coordination (ICIC) based on Q-learning with Pico BSs (PBSs) and a macro BS (MBS) as the learning agents. In the case of time-domain ICIC, the action performed by each PBS is to select the bias value for cell range expansion and transmit power on each resource block (RB), and the action of the MBS is only to choose the transmit power. The state of each agent is defined by a tuple of variables, each of which is related to the SINR condition of each UE, while the received cost of each agent is defined to meet the total transmit power constraint and make the SINR of each served UE approach a target value. In each iteration of the algorithm, each PBS first selects an action leading to the smallest Q value for the current state, and then the MBS selects its action in the same way. While in the case of frequency-domain ICIC, the only difference lies in the action definition of Q-learning. Utilizing Q-learning, the Pico and Macro tiers can autonomously optimize system performance with a little coordination. In , the authors use Q-learning to optimize the transmit power of SBSs in order to reduce the interference on each RB. With learning capabilities, each SBS does not need to acquire the strategies of other players explicitly. In contrast, the experience is preserved in the Q values during the interaction with other SBSs. To apply Q-learning, the state of each SBS is represented as a binary variable that indicates whether the QoS requirement is violated, and the action is the selection of power levels. When the QoS requirement is met, the reward is defined as the achieved, instantaneous rate, which equals to zero otherwise. The simulation result demonstrates that Q-learning can increase the long-term expected data rate of SBSs.
Another important scenario requiring power control is the CR scenario. In , distributed Q-learning is conducted to manage aggregated interference generated by multiple CRs at the receivers of primary (licensed) users, and the secondary BSs are taken as learning agents. The state set defined for each agent is composed of a binary variable indicating whether the secondary system generates excess interference to primary receivers, the approximate distance between the secondary user and protection contour, and the transmission power corresponding to the current secondary user. The action set of the secondary BS is the set of power levels that can be adopted with the cost function designed to limit the interference at the primary receivers. Taking into account that the agent cannot always obtain an accurate observation of the interference indicator, the author discusses two cases, namely complete information and partial information, and handles the latter by involving belief states in Q-learning. Moreover, two different ways of Q value representation are discussed utilizing look-up tables and neural networks, and the memory, as well as, computation overheads are also examined. Simulations show that the proposed scheme can allow agents to learn a series of optimization policies that will keep the aggregated interference under a desired value.
In addition, some research utilizes reinforcement learning to achieve the equilibrium state of wireless networks, where power control problems are modeled as non-cooperative games among multiple nodes. In , the power control of femto BSs (FBSs) is conducted to mitigate the cross-tier interference to macrocell UEs (MUEs). Specifically, the power control and carrier selection is modeled as a normal form game among FBSs in mixed strategies, and a reinforcement learning algorithm based on joint utility and strategy estimation is proposed to help FBSs reach Logit equilibrium. In each iteration of the algorithm, each FBS first selects an action according to its current strategy, and then receives a reward which equals its data rate if the QoS of the MUE is met and equals to zero otherwise. Based on the received reward, each FBS updates its utility estimation and strategy by the process proposed in , which is proved to converge to an approximated version of mixed-strategy Nash equilibrium. By numerical simulations, it is demonstrated that the proposed algorithm can achieve a satisfactory performance while each FBS does not need to know any information of the game but its own reward, while, at the same time, it is beneficial for system performance to take identical utilities for FBSs.
In , the author models the channel and power level selection of D2D pairs in a heterogeneous cellular network as a stochastic non-cooperative game. The utility of each pair is defined by considering the difference between its achieved data rate and the cost of power consumption as well as the SINR constraint. To avoid the considerable amount of information exchange among pairs incurred by the conventional multi-agent Q-learning, an autonomous Q-learning algorithm is developed based on the estimation of pairs’ beliefs about the strategies of all the other pairs. Finally, simulation results indicate that the proposal possesses a relatively fast convergence rate and can achieve near-optimal performance.
Iii-A2 Supervised Learning Based Approaches
Considering the high complexity of traditional optimization based resource allocation algorithms, authors in [9, 49] propose to utilize deep NNs to develop power allocation algorithms that can achieve real-time processing. Specifically, different from the traditional ways of approximating iterative algorithms where each iteration is approximated by a single layer of the neural network, the author in  adopts a generic dense neural network to approximate the classic WWMSE algorithm for power control in a scenario with multiple transceiver pairs. Notably, the number of ReLUs and binary units as well as the number of layers that are needed for achieving a given approximation error are rigourously analyzed, from which it can be concluded that the approximation error just has a little impact on the size of the deep NN. As for NN training, training data set is generated by running the WMMSE algorithm under varying channel realizations, and each labeled data consists of the channel realization and corresponding power allocation result output by the WMMSE algorithm. Via simulations, it is demonstrated that the adopted fully connected neural network can achieve similar performance but with much lower computation time compared with the WMMSE algorithm.
While in , a CNN based power control scheme is developed for the same scenario considered in , where the full channel gain information is normalized and taken as the input for the CNN, while the output is the power allocation vector. Similar to , the CNN is firstly trained to approximate traditional WMMSE to guarantee a basic performance. Then, according to the optimization objectives like maximizing SE and EE, these performance metrics are further used as the loss functions. Simulation result shows that the proposal can achieve almost the same or even higher SE and EE than WMMSE at a faster computing speed. Also using NNs for power allocation, the author in  adopts an NN architecture formed by stacking multiple encoders from pre-trained auto-encoders and a pre-trained softmax layer. The architecture takes the CSI and the location indicators, each of which represents whether a user is a cell-edge user, as input, and output is the resource allocation result. The training data set is generated by solving a sum rate maximization problem under different CSI realizations via genetic algorithm.
Iii-A3 Transfer Learning Based Approaches
In heterogenous networks, when femtocells share the same radio resources with macrocells, power control is needed to limit the inter-tier interference to MUEs. However, facing with dynamic environment, it is difficult for femtocells to meet the QoS constraints of MUEs during the entire operation time. In , distributed Q-learning is utilized for inter-tier interference management, where the femtocells, as learning agents, aim at optimizing their own capacity while satisfying the data rate requirement of MUEs. Due to the frequent changes in RB scheduling, i.e., the RB allocated to each UE is different from time-to-time, and the backhaul latency, the power control policy learned by femtocells will be no longer useful and can cause the violation of the data rate constraints of MUEs. To deal with this problem, the author proposes to let the MBS inform femtocells about the future RB scheduling, which facilitates the power control knowledge transfer between different environments, and hence, femtocells can still avoid interference to the MUE, even if its RB allocation is changed. In this study, the power control knowledge is represented by the complete Q-table learned for a given RB. System level simulations demonstrate this scheme can normally work in a multi-user OFDMA network and is superior to the traditional power control algorithm in terms of the average capacity of cells.
Iii-A4 Lessons Learned
For the power control problem in wireless networks, the main learning technique used is reinforcement learning, including distributed Q-learning and learning based on joint utility and strategy estimation. These learning techniques help to develop self-organizing and autonomous power control schemes which are expected to be a part of future wireless networks. Moreover, Q values can be represented in a tabular form or by a neuron network that have different memory and computation overheads. In addition, the distributed Q-learning can be enhanced to make agents better adapt to a dynamic environment by involving transfer learning, and a better system performance can be achieved by making the utility of agents identical to the system’s goal. At last, approximating traditional iterative resource allocation algorithms using NNs helps develop schemes with real-time processing capabilities.
|Literature||Scenario||Objective||Machine learning technique||Main conclusion|
|||Heterogenous network with picocells underlaying macrocells||Achieve a target SINR for each UE under total transmission power constraints||Two-level Q-learning||The algorithm makes the average throughput improve significantly|
|||Small cell network||Optimize the data rate of each SBS||Distributed Q-learning||The long-term expected data rates of SBSs are increased|
|||Cognitive radio network||Keep the interference at the primary receivers below a threshold||Distributed Q-learning||The proposals outperform comparison schemes in terms of outage probability|
|||Heterogenous network comprised of FBSs and MBSs||Optimize the throughput of FUEs under the QoS constraints of MUEs||Reinforcement learning with joint utility and strategy estimation||The algorithm can converge to the logit equilibrium, and the spectral efficiency is higher when FBSs take the system performance as their utility|
|||D2D enabled cellular network||Optimize the reward of each D2D pair defined as the difference between achieved data rate and transmit power cost under QoS constraints||Distributed Q-learning||The algorithm is proved to converge to the optimal Q values and improve the average throughput significantly|
|||Cognitive radio network||Optimize transmit power level selection to reduce interference||SVM||The proposed algorithm not only achieves a tradeoff between energy efficiency and satisfaction index, but also satisfies the probabilistic interference constraint|
|||Cellular network||Minimize the total transmit power of devices in the network||SVM||The scheme can tradeoff the chosen transmit power and the user SINR|
|||Heterogenous network with femtocells and macrocells||Optimize the capacity of femtocells under transmit power constraints and QoS constraints of MUEs||Knowledge transfer based Q-learning||The proposed scheme works properly in multi-user OFDMA networks and outperforms conventional power control algorithms|
|||The scenario with multiple transceiver pairs coexisting||Optimize system throughput||Densely connected neural networks||The proposal can achieve almost the same performance compared to WMMSE at a faster computing speed|
|||The scenario with multiple transceiver pairs coexisting||Optimize the SE and EE of the system||Convolutional neural networks||The proposal can achieve almost the same or even higher SE and EE than WMMSE at a faster computing speed|
|||A downlink cellular network with multi-cell||Optimize system throughput||A multi-layer neural network based on auto-encoders||The proposal can successfully predict the solution of genetic algorithm in most of the cases|
Iii-B Machine Learning Based Spectrum Management
With the explosive increase of data traffic, spectrum shortages have drawn big concerns in the wireless communication community, and efficient spectrum management is desired to improve spectrum utilization. In the following, reinforcement learning and unsupervised learning based spectrum management are introduced.
Iii-B1 Reinforcement Learning Based Approaches
In , spectrum management in millimeter-wave, ultra-dense networks is investigated£¬ and temporal-spatial reuse is considered as a method to improve spectrum utilization. The spectrum management problem is formulated as a non-cooperative game among devices, which is proved to be an ordinary potential game guaranteeing the existence of Nash equilibrium (NE). To help devices achieve NE without global information, a novel, distributed Q-learning algorithm is designed, which facilitates devices to learn environments from the individual action-reward experience. The action and reward of each device are channel selection and channel capacity, respectively. Different from traditional Q-learning where the Q-value is defined over state-action pairs, the Q-value in the proposal is defined over actions, and that is each action corresponds to a Q value. In each time slot, the Q value of the played action is updated as a weighted sum of the current Q value and the immediate reward, while the Q values of other actions remain the same. In addition, based on rigorous analysis, a key conclusion is drawn that less coupling in learning agents can help speed up the convergence of learning. Simulations demonstrate that the proposal can converge faster and is more stable than several baselines, and also leads to a small latency.
Similar to , the authors in  focus on temporal-spatial spectrum reuse but with the use of MAB theory. In order to overcome the high computation cost brought by the centralized channel allocation policy, a distributed three-stage policy is proposed, in which the goal of the first two stages is to help SU find the optimal channel access rank, while the third stage, based on MAB, is for the optimal channel allocation. Specifically, with probability 1-, each SU chooses a channel based on the channel access rank and empirical idle probability estimates, and uniformly chooses a channel at random otherwise. Then, the SU senses the selected channel and will receive a reward equal to 1 if neither the primary user nor other SUs transmit over this channel. By simulations, it is shown that the proposal can achieve significantly smaller regrets than the baselines in the spectrum temporal-spatial reuse scenario, and the regret is defined as the difference between the total reward of a genie-aided rule and the expected reward of all SUs.
In , the author focuses on a multi-objective, spectrum access problem in a heterogenous network. Specifically, the studied problem aims at minimizing the received intra/ inter-tier interference at the femtocells and the inter-tier interference from femtocells to eNBs simultaneously under QoS constraints. Considering the lack of global and complete channel information, unknown number of nodes, and so on, the formulated problem is very challenging. To handle this issue, a reinforcement learning approach based on joint utility and strategy estimation is proposed, which contains two sequential levels. The purpose of the first level is to identify available spectrum resource for femtocells, while the second level is responsible for the optimization of resource selection. Two different utilities are designed for each level, namely the spectrum modeling and spectrum selection utilities. In addition, three different learning algorithms, including the gradient follower, the modified RothErev, and the modified Bush and Mosteller learning algorithms, are available for each femtocell. To determine the action selection probabilities based on the propensity of each action output by the learning process, logistic functions are utilized, which are commonly used in machine learning to transform the full-range variables into the limited range of a probability. By using the proposed approach, higher cell throughputs are achieved owing to the significant reduction in intra-tier and inter-tier interference. While in , the joint communication mode selection and subchannel allocation of D2D pairs is solved by joint utility and strategy estimation based reinforcement learning for a D2D enabled C-RAN, and the impacts of fronthaul capacity as well as computing capability at the BBU pool on the system SE are evaluated.
To overcome the challenges existing in the current solutions to spectrum sharing between operators, an inter-operator proximal spectrum sharing (IOPSS) scheme is presented in , in which a BS is able to intelligently offload users to its neighboring BSs based on spectral proximity. To achieve this goal, a Q-learning framework is proposed, resulting in a self-organizing, spectrally efficient network. The state of a BS is the experienced load whose value is discretized, while an action of a BS is a tuple of spectral sharing parameters related to each neighboring BS, including the number of RBs requiring each neighboring BS to be reserved, the probability of each user served by the neighboring BS with the strongest SINR, and the reservation proportion of the requested RBs. The cost function of each BS is related to both the QoE of its users and the change in the number of RBs it requests. Through extensive simulations with different loads, the distributed, dynamic IOPSS based Q-learning can help mobile network operators provide users with a high QoE and reduce operational costs.
In addition to adopting classical reinforcement learning methods, a novel reinforcement learning approach involving recurrent neural networks is utilized in  to handle the management of both licensed and unlicensed frequency bands in LTE-U systems. Specifically, the problem is formulated as a non-cooperative game with SBSs and an MBS as game players. Solving the game is a challenging task since each SBS may know only a little information about the network, especially in a dense deployment scenario. To achieve the mixed-strategy NE, multi-agent reinforcement learning based on echo state networks is proposed, which are easy to train and can track the state of a network over time. Each BS is an ESN agent using two ESNs, namely ESN and ESN , to approximate the immediate and the expected rewards, respectively. The input of the first ESN comprises the action profile of all the other BSs, while the input of the latter is the user association of the BS. Compared to traditional RL approaches, the proposal can quickly learn to allocate resources with not much training data. During the algorithm execution, each BS needs to broadcast only the action currently taken and its optimal action. The simulation result shows that the proposed approach improves the sum-rate of the 50th percentile of users by up to 167% compared to Q-learning. A similar idea that combines RNNs with reinforcement learning have been adopted in  to address resource block allocation for a wireless network supporting virtual reality services.
Iii-B2 Lessons Learned
The advent of the next generation of wireless networks gives rise to new challenges, especially, in spectrum utilization, and machine learning technique can help with this issue in the following ways. First, by endowing SUs with learning capability, SUs can intelligently utilize the licensed spectrum without causing excessive interference to PUs. Second, reinforcement learning can facilitate more efficient temporal-spatial spectrum reuse. Finally, RNN can help network nodes learn from the interaction with other nodes, leading to a better decision on spectrum allocation.
|Literature||Scenario||Objective||Machine learning methods||Main conclusion|
|||Ultra-dense networks with millimeter-wave||Improve spectrum utilization with temporal-spatial spectrum reuse||Distributed Q-learning||Less coupling in learning agents can help speed up the convergence of learning|
|||Cognitive radio network||Improve spectrum utilization with temporal-spatial spectrum reuse||Multi-armed Bandit||The scheme has less regret than other methods when temporal-spatial spectrum reuse is allowed|
|||Heterogenous network||Reduce inter-tier and intra-tier interference||Reinforcement learning||Higher cell throughputs are achieved owing to the significant reduction in intra-tier and inter-tier interference|
|||Spectrum sharing among multi-operators||Fully reap the benefits of multi-operator spectrum sharing||Q-learning||By the proposal, mobile network operators can serve users with high QoE even when all operators’ BSs are equally loaded|
|||LTE-U||Optimize network throughput by user association, spectrum allocation, and load balancing||Multi-agent reinforcement learning based on ESNs||The proposed approach improve the system performance significantly in terms of the sum-rate of the 50th percentile of users, compared with a Q-learning algorithm|
|||Wireless networks supporting virtual reality||Maximize the users’ QoS||Multi-agent reinforcement learning based on ESNs||The proposed algorithm can achieve significant gains, in terms of VR QoS|
Iii-C Machine Learning Based Backhaul Management
In wireless networks, although radio resources like power and spectrum need to be allocated effectively to satisfy user requirements, the management of backhaul links connecting SBSs and MBSs or connecting BSs and the core network is essential as well to achieve better system performance. This subsection will introduce works related to backhaul management based on reinforcement learning.
Iii-C1 Reinforcement Learning Based Approaches
In , Jaber et al. propose a backhaul-aware, cell area extension (CRE) method based on RL to adaptively set the CRE offset value. In this method, the observed state for each small cell is defined as a value reflecting the violation of its backhaul capacity, and the action to be taken is the CRE bias of a cell considering whether the backhaul is available or not. The definition of the cost for each small cell intends to maximize the utilization of total backhaul capacity while keeping the backhaul capacity constraint of each cell satisfied. In addition, Q-learning is adopted to minimize this cost through an iterative process, and the simulation results show that the proposal relieves the backhaul congestion in macrocells and improves the QoE of users. In , the authors concentrate on load balancing to improve backhaul resource utilization by learning system bias values via a distributed Q-learning algorithm. In this algorithm, Xu et al. take the backhaul utilization quantified to several levels as the environment state based on which SBSs determine an action, that is, the bias value. Then, with the reward function defined as the weighted difference between the backhaul resource utilization and the outage probability for each SBS, Q-learning is utilized to learn the bias value selection strategy, achieving a balance between system-centric performance and user-centric performance. Numerical results show that this algorithm is able to optimize the utilization of backhaul resources under the promise of guaranteeing user QoS.
Unlike those in  and , the authors in  and  model the backhaul management problems from a game-theoretical perspective. Problems are solved employing an RL approach based on joint utility and strategy estimation. Specifically, the backhaul management problem is formulated as a minority game in , where SBSs are the players and have to decide whether to download files for predicted requests while serving the urgent demands. In order to approximate the mixed NE, an RL-based algorithm, which enables each SBS to update its strategy based on only the received utility, is proposed. In contrast to previous, similar RL algorithms, this scheme is mathematically proved to converge to a unique equilibrium point for the formulated game. While in , MUEs can communicate with the MBS with the help of SBSs utilized as relays. The backhaul links between SBSs and the MBS are heterogenous including both wired and wireless backhaul. The competition among MUEs is modeled as a non-cooperative game, where their actions are the selections of transmission power, the assisting SBSs, and rate splitting parameters. Using the proposed RL approach, coarse correlated equilibrium of the game can be reached. In addition, it is demonstrated that the proposal achieves better average throughput and delay for the MUEs than existing benchmarks do.
Iii-C2 Lessons Learned
In future wireless networks, creating the intelligent management of backhaul capacity is essential for improving system performance. Particularly, RL based approaches can achieve a favorable system performance by properly utilizing limited backhaul resources.
|Literature||Scenario||Objective||Machine learning methods||Main conclusion|
|||Cellular network||Minimize energy consumption in low traffic scenarios based on a backhaul link selection||Q-learning||The total energy consumption can be reduced by up to 35% with marginal QoS compromises|
|||Cellular network||Optimize joint access and in-band backhaul||Joint utility and strategy estimation based RL||The proposed approach improves system performance by 40% under completely autonomous operating conditions when compared to benchmark approaches|
|||Cellular network||Maximize the backhaul resource utilization of small base stations and minimum the outage probability of users||Q-learning||The proposed approach effectively utilizes the backhaul resource for load balancing|
|||Cellular network||Improve the throughput and delay of MUEs under the limitations of heterogeneous backhaul||Joint utility and strategy estimation based RL||The proposed scheme can significantly improve the overall performance|
|||Cellular network||Maximize system-centric and user-centric performance indicators depending on different performance requirements and backhaul capabilities||Q-learning||The proposed scheme shows considerable improvement in users’ QoE when compared to state-of-the-art user-cell association schemes|
|||Cellular network||Optimize cell area extension with backhaul awareness||Q-learning||The proposed approach alleviates the backhaul congestion of the macro-cell and improves the user QOE|
|||Cellular network||Minimize the traffic conflict in the backhaul||Joint utility and strategy estimation based RL||The proposed scheme is proved to converge to the Nash equilibrium|
Iii-D Machine Learning Based Cache Management
Due to the proliferation of smart devices and intelligent applications, such as augmented reality, virtual reality, ubiquitous social networking, and IoT, wireless communication systems have experienced a tremendous data traffic increase over the past couple of years. Additionally, it has been envisioned that the cellular network will produce about 30.6 exabytes data per month by 2020 . Faced with the explosion of data demands, the caching paradigm is introduced for the future wireless network to shorten latency and alleviate the transmission burden on backhaul . Recently, many excellent research studies have adopted ML techniques to manage cache resource with great success.
Iii-D1 Reinforcement Learning Based Approaches
Considering the various spatial and temporal content demands among different small cells, the author in  develops a decentralized caching update scheme based on joint utility-and-strategy-estimation RL. With this approach, each SBS can optimize a caching probability distribution over content classes using only the received instantaneous utility feedback. In addition, by doing weighted sum of the caching strategies of each SBS and the cloud, a tradeoff between local content popularity and global popularity can be achieved. While in , the author also focuses on the distributed design of caching scheme. Different from , BSs are allowed to cooperate with each other in the sense that each BS can get the locally missing content from other BSs via backhaul, which can be a more cost-efficient solution, and meanwhile D2D offloading is also considered to improve the cache utilization. Then, to minimize system transmission cost, a distributed Q-learning algorithm is utilized. For each BS, the content placement is taken as the observed state, and the adjustment of cached contents is taken as the action. The convergence of the proposal is proved by utilizing the sequential stage game model, and the superior performance is verified via simulations.
Instead of adopting traditional RL approaches, In the literature , the author proposes a novel framework based on DRL for a connected vehicular network to orchestrate computing, networking, and caching resource. Particularly, the DRL agent decides which BS is assigned to the vehicle and whether to cache the requested content in the BS. Simulation results reveal that by utilizing DRL, the system gains much a better performance compared to those in existing works. In addition, the author in  investigates a cache update algorithm based on Wolpertinger DRL architecture for a single BS. Concretely, the request frequencies of each file over different time durations and the current file requests from users constitute the input state, and the action decides whether to cache the requested content. The proposed scheme is compared with several traditional cache update schemes including Least Recently Used and Least Frequently Used schemes, and it is shown that the proposal can achieve improved short-term cache hit rate as well as long-term cache hit rate.
Iii-D2 Supervised Learning Based Approaches
In order to develop an adaptive caching scheme, extreme learning machine (ELM)  has been employed to estimate content popularity. Hereafter, mixed-integer linear programming is used to compute the content placement. Moreover, a simultaneous perturbation stochastic approximation method is proposed to reduce the number of neurons for ELM, while guaranteeing a certain level of prediction accuracy. Based on real-world data, it is shown that the proposal can improve both the QoE of users and network performance.
While in , the joint optimization of content placement, user association, and unmanned aerial vehicles’ positions is studied, aiming at minimizing the total transmit power of UAVs while satisfying the requirement of user quality of experience. To solve the formulated problem that focuses on a whole time duration, it is essential to predict user content request distribution. To this end, an echo state network (ESN), a kind of recurrent neural networks, is employed, which can quickly learn the distribution based on not much training data. Specifically, the input of the ESN is an vector consisting of the user context information like gender and device type, while the output is the vector of user content request probabilities. A similar idea as in  is adopted in  to optimize the contents cached at RRHs and the BBU pool in a cloud radio access network.
Different from [72, 73, 74] predicting the content popularity using neural networks directly, authors in  propose a scheme integrating 3D CNN for video generic feature extraction, SVM for generating representation vectors of videos, and then a regression model for predicting the video popularity taking the corresponding representation vector as input. After the popularity of each video is obtained, the optimal portion of each video cached at the BS can be derived to minimize the backhaul load in each time period. The advantage of the proposal lies in the ability to predict the popularity of new uploaded videos with no statistical information required.
Iii-D3 Transfer Learning Based Approaches
Generally, content popularity profile plays key roles in deriving efficient caching policies, but its estimation with high accuracy suffers from a long time to collect user file request samples. To overcome this issue, the author in  involves the idea of transfer learning by integrating the file request samples from the social network domain into the file popularity estimation formula. By theoretical analysis, the training time is expressed as a function of the “distance“ between the probability distribution of the files requested and that of the source domain samples. In addition, transfer learning based approaches are adopted as well in  and  to handle the over-fitting problems in estimating the content popularity profile matrix.
Iii-D4 Lessons Learned
For cache resource management, knowing content popularity profile is beneficial, which can be got or estimated by supervised learning like recurrent neural networks and extreme learning machine. Meanwhile, involving the content request information from other domain can help reduce the time needed for popularity estimation. At last, reinforcement learning can directly optimize the caching policy without explicitly acquiring the popularity profile.
|Literature||Scenario||Optimization objective||Machine learning methods||Main conclusion|
|||Small cell networks||Minimize latency||Joint utility and strategy estimation based RL||The proposal can achieve 15% and 40% gains compared to various baselines|
|||D2D enabled cellular networks||Minimize system transmission cost||Distributed Q-learning||The proposal outperforms traditional caching strategies including LRU and LFU|
|||Vehicular networks||Enhance network efficiency and traffic control||Deep reinforcement learning||The proposed scheme can significantly improve the network performance|
|||A scenario with a single BS||Maximize the long-term cache hit rate||Deep reinforcement learning||The proposal can achieve improved short-term cache hit rate as well as long-term cache hit rate|
|||Small cell networks||Minimize the latency caused by the unavailability of the requested file||Transfer learning||The training time for popularity distribution estimation can be reduced by transfer learning|
|||Cellular networks||Improve the users’ quality of experience and reduce network traffic||Extreme learning machine||The QoE of users and network performance can be improved compared with industry standard caching schemes|
|||Cloud radio access networks with UAVs||Minimize the transmit power of UAVs and meanwhile satisfy the QoE of users||Echo state networks||The proposal achieves significant gains compared to baselines without cache and UAVs|
|||Cloud radio access networks||Maximize the long-term sum effective capacity||Echo state networks||The proposed approach can considerably improves the sum effective capacity|
|||Cellular networks||Minimize the average backhaul load||3D CNN, SVM, regressive model||Content-aware based proactive caching is cost-effective for dealing with the bottleneck of backhaul|
Iii-E Machine Learning Based Computation Resource Management
In , the author investigates a wireless network that provides MEC services, and a computation offloading decision problem is formulated for a representative mobile terminal, where multiple BSs are available for computation offloading. More specifically, the problem takes environmental dynamics into account including time-varying channel quality and the task arrival and energy status at the mobile device. To develop the optimal offloading decision policy, a double DQN based learning approach is proposed, which does not need the complete information about network dynamics and can handle the state spaces with high dimension. Simulation results show that the proposal can improve computation offloading performance significantly compared with several baseline policies.
Iii-E1 Lessons Learned
DRL based on double DQN can be used to optimize the computation offloading policy without knowing the statistics of network dynamics and meanwhile can handle the issue of state space explosion. In the future, it is interesting to study the computation offloading for multi-users based on DRL, whose offloading decisions can be coupled due to interference and constrained MEC resources.
|Literature||Scenario||Optimization objective||Machine learning methods||Main conclusion|
|||An MEC scenario with a representative user and multiple BSs||Optimize a long-term utility that is related to task execution delay, task queuing delay, and so on||DRL||The proposal can improve computation offloading performance significantly compared with several baseline policies|
Iii-F Machine Learning Based Beamforming
Considering the ever-increasing QoS requirements and the need for real-time processing in practical systems, the author in  proposes a supervised learning based resource allocation framework to quickly output optimal or near optimal resource allocation solutions for the current scenario. Specifically, the data related to historical scenarios is collected and the feature vector is extracted for each scenario. Then, the optimal or near optimal resource allocation can be searched off-line by taking the advantage of cloud computing. After that, those feature vectors with the same resource allocation solution are labeled with the same class index. Up to now, the remaining task to determine resource allocation for a new scenario is to identify the class of its corresponding feature vector, and that is the resource allocation problem is transformed into a multi-class classification problem, which can be handled by supervised learning. To make the application of the proposal more intuitive, an example to optimize beam allocation in a single cell with multiple users is shown, and simulation results show an improvement in terms of sum rate by the proposal compared to a state-of-the-art method.
Iii-F1 Lessons Learned
The work in  sheds a light on fully utilizing the cloud computing technique to facilitate real-time and high-performance resource allocation for 5G wireless networks. The key to the success of the proposal framework lies in the feature vector construction of the communication scenario and the design of low-complexity multi-class classifiers.
|Literature||Scenario||Optimization objective||Machine learning methods||Main conclusion|
|||A scenario with a single cell and multiple users||Optimize system sum rate||KNN||The proposal can further raise system performance compared to a state-of-the-art approach|
Iv Machine Learning Based Networking
With the rapid growth of data traffic and the expansion of the network, networking in future wireless communications requires more efficient solutions. In particular, the imbalance of traffic loads among heterogenous BSs needs to be addressed, and meanwhile, wireless channel dynamics and newly emerging vehicle networks both incur a big challenge for traditional networking algorithms that are mainly designed for static networks. To overcome these issues, research on ML based user association, BS switching control, routing, and clustering has been conducted.
Iv-a Machine Learning Based BS Association
Iv-A1 Reinforcement Learning Based Approaches
In the vehicle network, the introduction of economical SBSs greatly reduces the network operation cost. However, proper association schemes between vehicles and BSs are needed for load balancing among SBSs and MBSs. Most previous algorithms of load balancing assume static channel quality, which is not feasible in the real world. Fortunately, the traffic flow in vehicular networks possesses spatial-temporal regularity. Based on this observation, Li et al. in  propose an online reinforcement learning approach (ORLA). The proposal is divided into two learning phases: initial reinforcement learning and history-based reinforcement learning. In the initial learning model, the vehicle-BS association problem is seen as a multi-armed bandit problem, where the action of each BS is the decision on the association with vehicles and the reward is defined to minimize the deviation of the data rates of the vehicles served from the average rate of all the vehicles. In the second learning phase, considering the spatial-temporal regularities of vehicle networks, the association patterns obtained in the initial RL stage enable the load balancing of BSs through history-based RL when the environment dynamically changes. Specifically, each BS will calculate the similarity between the current environment and each historical pattern, and the association matrix is output based on the historical association pattern. Compared with the max-SINR scheme and distributed dual decomposition optimization, the proposed ORLA reaches the minimum load variance of multiple cells.
Besides the information related to SINR, backhaul capacity constraints and diverse attributes related to the QoE of users should also be taken into account for user association. In , the authors propose a distributed, user-centric, backhaul-aware user association scheme based on fuzzy Q-learning to enable each cell to autonomously maximize its throughput under backhaul capacity constraints and user QoE constraints. More concretely, each cell broadcasts a set of bias values to guide users to associate with preferred cells, and each bias value reflects the capability to satisfy a kind of performance metrics like throughput and resilience. Using fuzzy Q-learning, each cell tries to learn the optimal bias values for each of the fuzzy rules through iterative interaction with the environment. The proposal is shown to outperform the traditional Q-learning solutions in both computational efficiency and system capacity. While in , Kudo et al. focus on another cell selection scheme that is optimized by adding bias values to small cells based on Q-learning, which decreases the number of user communication outages and improves system throughput compared to a trial and error method.
In , an uplink user association problem for energy harvesting devices in an ultra-dense small cell networks is studied. Faced with the uncertainty of channel gains, interference, and/or user traffic, which directly affects the probability to receive a positive reward, the association problem is formulated as an MAB problem, where each device selects an SBS for transmission in each transmission round.
Considering the trend to integrate cellular-connected UAVs in future wireless networks, the authors in  focus on a joint optimization of optimal paths, transmission power levels, and cell associations for cellular-connected UAVs to minimize the wireless latency of UAVs and their interference to the ground network. The problem is modeled as a dynamic game with UAVs as game players, and an ESN based deep reinforcement learning approach is proposed to solve the game. Specifically, the deep ESN after training enables each UAV to decide an action based on the observation of the network state. Once the approach converges, a subgame perfect Nash equilibrium is reached.
Iv-A2 Collaborative Filtering Based Approaches
The historical information of the network is beneficial for obtaining the service capabilities of BSs, from which the similarities between the preferences of users selecting the cooperating BSs will also be achieved. Meng et al. propose an association scheme considering both the historical QoS information of BSs and user social interactions in heterogeneous networks . This solution contains a recommendation system composed of the rating matrix, UEs, BSs, and operators based on collaborative filtering (CF). The rating matrix formed by the measurement information received by UEs from their connecting BSs is the core of the system. The voice over internet protocol service is used as an example to describe the network recommendation system in this scheme. The E-model proposed by ITU-T is used to map the SNR, delay, and packet loss rate measured by UEs in real time to an objective mean opinion score of QoS as a rating, thus generating the rating matrix. With the ratings from UEs, the recommendation system guides the user association through the user-oriented neighborhood-based CFs. The simulation shows that this scheme needs to set a moderate expectation value for the recommendation system to reach a convergence within a minimum number of iterations. In addition, this scheme outperforms selecting the BS with the strongest RSSI or QoS.
Iv-A3 Lessons Learned
The QoS received by users, backhaul throughput, and regular features of data traffic in the network can be taken as useful information for user association. Through the frequent interaction with the environment, reinforcement learning can help optimize the bias values for user association, where the tradeoff between the exploitation of learned knowledge and the exploration of unknown situations is made.
|Literature||Scenario||Optimization objective||Machine learning methods||Main conclusion|
|||Vehicular networks||Optimize the user association strategy through learning the temporal dimension regularities in the network||Reinforcement learning||The proposed scheme can well balance the traffic load|
|||Heterogeneous networks||Optimize the association strategy between UEs and BSs considering the multiple factors affecting the QoS||Collaborative filtering||The proposed scheme achieves satisfaction equilibrium between the profits and costs|
|||Heterogeneous network||Optimize the user-cell association in a user-centric and backhaul-aware manner||Fuzzy Q-learning||The proposed scheme improves the users’ performance by 12% at the cost of 33.3% additional storage memory|
|||Heterogeneous network||Optimize the user-cell association through learning the cell range expansion||Q-learning||The proposed scheme minimize the number of UE outages and improve the system throughput|
|||Small cell networks with energy harvesting devices||Guarantee the minimum data rate of each device||MAB||The proposal is applicable in hyper-dense networks|
|||Cellular networks with UAV-UEs||Minimize the wireless latency of UAVs and their interference to the ground network||ESN based DRL||The altitude of the UAVs greatly affects the objective optimization|
Iv-B Machine Learning Based BS Switching
Deploying a number of BSs is seen as an effective way to meet the explosive growth of traffic demand. However, much energy can be consumed to maintain the operations of BSs. Nowadays, the concept of green wireless networks, which aims to save energy consumption and minimize the OPEX , has attracted a lot of attention. To this end, BS switching is considered a promising solution to lower energy consumption by switching off the unnecessary BSs . Nevertheless, there exist some drawbacks in traditional switching strategies. For example, some do not consider the cost caused by the transition of on-off state, while some others assume the traffic loads are constant, which is very impractical. In addition, those methods highly rely on precise, prior knowledge about the environment which is hard to collect. Facing these challenges, some researchers have revisited BS switching problems from the perspective of ML.
Iv-B1 Reinforcement Learning Based Approaches
In , an actor-critic learning based method is proposed, which avoids highly relying on the prior knowledge of the environment. In this method, BS switching on-off operations are defined as the actions of the controller with the traffic load as the state, aiming at minimizing overall energy consumption. At a given traffic load state, the controller chooses a BS switching action in a stochastic way based on policy values. After executing a switching operation, the system will transform into a new state and calculate the energy cost of the former state. When the energy cost of the executed action is smaller than those of other actions, the controller will update the policy value to enable this action to be more likely to be selected, and vice versa. By gradually communicating with the environment, an optimal switching strategy is considered to be obtained when the policy values converge. Simulation results show that the energy consumption of the proposal is slightly higher than that of the state of the art scheme in which the prior knowledge of the environment is known but hard to acquire in practice. Similarly, the authors in  propose a Q-learning method to reduce the overall energy consumption, which also defines BS switching operations as actions and the state as the user number and the number of active SBSs. After choosing a switching action at a given state, the reward, which takes into account both energy consumption and transmission gains, can be obtained. After that, with the calculated reward, the system updates the corresponding Q value. This iterative process goes on until all the Q values converge. Results show that the proposed method is able to save energy.
Although the above works have achieved good performances, the power cost incurred by the on-off state transition of BSs is not taken into account. To make theoretical results more rigorous, the authors in  include the transition power in the cost function, and propose a Q-learning method with the action defined as a pair of thresholds named as the upper user threshold and the lower user threshold. When the number of users in a small cell is higher than the upper threshold, then the small cell will be switched on, while the cell will be switched off once the number of users is less than the lower threshold. Simulation results show that this method can avoid frequent BS on-off state transitions, thus saving energy consumption. While in , a more advanced power consumption optimization framework based on DRL is proposed for a downlink cloud radio access network, where the power consumption caused by the on-off state transition of RRHs is also considered. Simulation results reveal that the proposed framework can achieve much power saving with the satisfaction of user demands and adaptation to dynamic environment.
Moreover, fuzzy Q-learning is utilized in  to find the optimal sensing probability of the SBS, which directly impacts its on-off operations. Specifically, an SBS operates in sleep mode when there is no active users to serve, but then it wakes up randomly to sense MUE activity. Once the activity of an MUE is detected, the SBS goes back to active mode. By simulation, it is demonstrated that the proposal can well handle user density fluctuations and can improve energy efficiency while guaranteeing network capacity as well as coverage probability. In addition, authors in  investigate the influences of different learning rates and discount factors in Q-learning on the energy consumption of BS switching.
Iv-B2 Transfer Learning Based Approaches
Although the discussed studies based on RL show favorable results, the performance and convergence time can be further improved by fully utilizing prior knowledge. In the literature, , which is an enhanced study of , the RL agent not only considers the experience of the current environment to make switching on-off decisions, but also refers to similar knowledge in a past period or a neighboring region, that is transferring the traffic load knowledge of a similar environment to the current environment for action selection. The simulation result shows that combining RL with transfer learning outperforms the method only using RL, in terms of both energy saving and convergence speed. Similar to , the authors in  adopt TL to transfer the previous, Hetnet traffic load knowledge to the current environment. Though some TL based methods have been employed to develop BS sleeping strategy, the WiFi network scenario is not covered. Under the context of WiFi networks, the knowledge of the real time data, gathered from the APs related to the present environment, is utilized for developing switching on-off policy in , where the actor-critic algorithm is used. These works have indicated that TL can offer much help in finding optimal BS switching strategies, but it should be noted that TL may lead to a negative influence in the network, since there are still differences between the source task and the target task. To resolve this problem, the authors in  propose to diminish the impact of the prior knowledge on decision making with time going by.
Iv-B3 Unsupervised Learning Based Approaches
In addition to RL algorithms, unsupervised learning like K-means can improve BS switching on-off strategies as well. In , based on the similarity of the location and traffic load of BSs, K-means clustering is used to group BSs into different clusters, within each of which the interference is mitigated by allocating orthogonal resources among communication links, and the traffic of off-BSs can be offloaded to on-BSs. While in , by applying K-means, different values of RSRQ are grouped into different clusters. Consequently, the users will be grouped into clusters depending on their corresponding RSRQ values. After that, the cluster information is considered as a part of the system state in Q-learning to find the optimal BS switching strategy. With the help of K-means, the proposed method achieves lower average energy consumption than the method without K-means.
Iv-B4 Lessons Learned
Network energy saving has been listed as one of the visions of 5G, and BS on-off switching is regarded as an effective approach to achieve this goal. For machine learning, it can assist to develop effective switching strategies in the following ways. First, by using reinforcement learning including Q-learning and deep reinforcement learning, the network controller or BSs can intelligently learn the environment and make switching decisions with less information. Meanwhile, by integrating transfer learning, RL based approaches can achieve better performance at the beginning as well as faster convergence. Second, by properly clustering BSs and users utilizing K-means before optimizing BS on-off states, better performance can be gained.
|Literature||Scenario||Optimization objective||Machine learning methods||Main conclusion|
|||Hetnets||Minimize the network energy consumption||Q-learning||The proposal can avoid the frequent on-off state transitions of SBSs|
|||Cellular networks||Improve energy efficiency||Actor-critic||The proposed method achieves similar energy consumption under dynamic traffic loads compared with the method which has a full traffic load knowledge|
|||Cellular networks||Improve energy efficiency||Transfer learning based actor-critic||Transfer learning based RL outperforms classic RL method in energy saving|
|||Hetnets||Optimize the trade-off between total delay experienced by users and energy savings||Transfer learning based actor-critic||Transfer learning based RL outperforms classic RL method in energy saving|
|||WiFi networks||Improve energy efficiency||Transfer learning based actor-critic||Transfer learning based RL can achieve higher energy efficiency|
|||Hetnets||Improve energy efficiency||Q-learning||Q-learning based approach outperforms the traditional static policy|
|||Hetnets||Improve the system performance in terms of drop rate, throughput, and energy efficiency||Q-learning||The proposed multi-agent Q-learning method outperforms the traditional greedy method|
|||Green wireless networks||Maximize energy saving||Q-learning||Different learning rates and discount factors of Q-learning will significantly influence the energy consumption|
|||Cloud radio access networks||Minimize system power consumption||Deep reinforcement learning||The proposed framework can achieve much power saving with the satisfaction of user demands and can adapt to dynamic environment|
|||Hetnets||Improve energy efficiency||Q-learning||The proposed method improves energy efficiency meanwhile maintaining network capacity and coverage probability|
|||Small cell networks||Reduce energy consumption||K-means||Proposed clustering method reduces the overall energy consumption|
|||opportunistic mobile broadband networks||Optimize the spectrum allocation, load balancing, and energy saving||Q-learning, k-means, and transfer learning||The proposed method achieves lower average energy consumption than the method without K-means|
Iv-C Machine Learning Based Routing
To fulfill stringent traffic demands in the future, many new RAN technologies continuously come into being, including C-RANs, CR networks (CRNs), and ultra-dense networks (UDNs). To obtain effective networking in these scenarios, routing strategies play key roles. Specifically, by deriving proper paths for data transmission, transmission delay and other types of performance can be optimized. Recently, machine learning has emerged as a break-through for providing efficient routing protocols to enhance the overall network performance . In this vein, we provide a vivid summarization on novel machine learning based routing schemes.
Iv-C1 Reinforcement Learning Based Approaches
To overcome several challenges for multi-hop routing in cognitive radio networks like the dynamic channel availability, the author in  proposes a clustering and reinforcement learning based routing scheme, which provides high stability and scalability. Using Q-learning, the availability of the bottleneck channel along the route can be well estimated, which guides the routing node selection. Also focusing on multi-hop routing in CRNs, two different routing schemes based on reinforcement learning are investigated in , whose superior performance is verified by implementing a test bed. While in the literature, , the authors study the influences of several network characteristics such as network size on the performance achieved by Q-learning based routing for a cognitive radio ad hoc network, and it is found that network characteristics have slight impacts on the end-to-end delay and packet loss rate of secondary users. In addition, reinforcement learning is also a promising paradigm for developing routing protocols for the unmanned robotic network. Specifically, to save network overhead in high-mobility scenarios, a Q-learning-based geographic routing strategy is introduced in . Simulation using NS-3 confirms a better packet delivery ratio but with a lower network overhead compared to existing methods.
Iv-C2 Supervised Learning Based Approaches
In , a routing scheme based on DNNs is developed, which enables each router in heterogenous networks to predict the whole path to the destination. More concretely, each router trains a DNN to predict the next proper router for each potential destination using the training data generated by following Open Shortest Path First (OSPF) protocol, and the input and output of the DNN is the traffic patterns of all the routers and the index of the next router, respectively. Moreover, instead of training all the weights of the DNN at the same time, a greedy layer-wise training method is adopted. By simulations, lower signaling overhead and higher throughput is observed comparing with OSPF routing strategy.
To improve the routing performance for the wireless backbone by learning from the experienced congestion, deep CNNs are exploited in . Specifically, a CNN is constructed for each routing strategy, and the CNN takes traffic pattern information collected from routers, such as traffic generation rate, as input to predict whether the corresponding routing strategy can cause congestion. If yes, the next routing strategy will be evaluated until it is predicted that there will be no congestion. Meanwhile, it should be noted that these constructed CNNs are trained in an on-line manner with the training data set continuously updated, and hence the routing decision becomes more accurate.
Iv-C3 Lessons Learned
To develop intelligent routing schemes for future wireless networks with massive traffic demand, Q-learning and deep NNs can be utilized. Specifically, wireless systems can use Q-learning to learn routing strategies only based on the interaction with the environment and simple update rules, while the NN based approaches can lead to effective routing by extracting useful information from raw network data directly.
|Literature||Scenario||Objective||Machine learning methods||Main conclusion|
|||HetNets||Improve network performance by designing a novel routing strategy||Dense Neural Networks||Lower signaling overhead and higher throughput is observed comparing with OSPF routing strategy|
|||Cognitive radio networks||Overcome the challenges faced by multi-hop in CRNs||Q-learning||The SUs’ interference to PUs is minimized, and more stable routes are selected|
|||CRNs||Choose routes with high QoS||Q-learning||RL based approaches can achieve a higher throughput and packet delivery ratio compared to highest-channel route selection approach|
|||Unmanned robotic networks||Mitigate the network overhead requirement for route selection||Q-learning||A better packet delivery ratio is achieved by the proposal but with a lower network overhead|
|||Wireless backbone||Realize real-time intelligent traffic control||Deep convolution neural networks||The proposal can significantly improve the average delay and packet loss rate compared to existing approaches|
|||Cognitive radio ad hoc networks||To minimize interference and operating cost||Q-learning||The proposed method can improve the routing efficiency and help minimize interference as well as operating cost|
Iv-D Machine Learning Based Clustering
In wireless networking, it is common to divide nodes or users into different clusters to attain some cooperation or coordination within each cluster, which can further improve network performance. Based on the introduction on ML, it can be seen that the clustering problem can be naturally dealt with the K-means algorithm, as some papers do. Moreover, supervised learning and reinforcement learning can be utilized to derive cluster formation as well.
Iv-D1 Supervised Learning Based Approaches
To reduce content delivery latency in a cache enabled small cell network, a user clustering based TDMA transmission scheme is proposed in  under pre-determined user association and content placement, where the user cluster formation and the time duration to serve each cluster need to be optimized. Since the number of potential clusters grows exponentially with respect to the number of users served by an SBS, a DNN is constructed to predict for each user whether it is in a cluster, which takes the user channel gains and user demands as input. In this manner, users joining clusters can be quickly identified, reducing the searching space to get the optimal user cluster formation.
Iv-D2 Unsupervised Learning Based Approaches
In , a modified version of the constrained K-means clustering algorithm is considered for clustering hotspots in densely populated with the goal of maximizing spectrum utilization. In this scheme, the mobile device accessing cellular networks can act as a hotspot to provide broadband access to nearby users called slaves. The fundamental problem to be solved is to identify which devices play the role of hotspots and the set of users associated with each hotspot. To this end, the author first adopts a modified version of the constrained K-means clustering algorithm to group the set of users into different clusters based on their locations, and both the maximum number and minimum number of users in a cluster are set. Then, the user with the minimum average distance to both the center of the cluster and the BS is selected as the hotspot in each cluster. After that, a graph-coloring approach is utilized to assign spectrum resource to each cluster, and power and spectrum resource allocation to all slaves and hotspots are performed later. Simulation shows that the proposal can significantly increase the total number of users that can be served in the system with lower cost and complexity.
In , the author examines the clustering of SBSs to realize the coordination between them. The similarity between two BSs considers both their distance and the heterogeneity between their traffic loads, meaning that two BSs with shorter distance and higher load difference can have more of a chance to cooperate. Since the similarity matrix possesses the properties of Gaussian similarity matrix, the SBS clustering problem can be handled by the K-means algorithm with each SBS corresponding to an attribute vector composed of its coordinates and traffic load. By intra-cluster coordination, the number of switched-OFF BSs can be increased by efficiently offloading UEs from SBSs that are switched OFF to active SBSs compared to the case without clustering and coordination.
Iv-D3 Reinforcement Learning Based Approaches
In , to mitigate the interference in a downlink wireless network containing multiple transceiver pairs operating in the same frequency band, a cache-enabled opportunistic interference alignment (IA) scheme is adopted. Facing with dynamic channel state information and content availability at each transmitter, a deep reinforcement learning based approach is developed to determine communication link scheduling at each time slot, and those scheduled transceiver pairs then perform IA. To efficiently handle the raw collected data like channel state information, the deep Q-network is built using a convolutional neural network. Simulation results verify the improved performance in terms of system sum rate and energy efficiency compared to an existing scheme.
Iv-D4 Lessons Learned
Machine learning can help with user clustering or BS clustering in the following ways. First, supervised learning can help identify those users or BSs that are not necessary to join clusters, which facilitates the searching of optimal cluster formation due to the reduction of searching space. Second, clustering problem can be naturally solved using K-means clustering. Finally, reinforcement learning can be used to directly control the members of a cluster in a dynamic network environment.
|Literature||Scenario||Objective||Machine learning technique||Main conclusion|
|||Cache-enabled small cell networks||Optimize energy consumption for content delivery||Deep neural networks||The solution quality led by the designed DNN can be approximated to around 90% of the optimum|
|||Cellular network with devices serving as access points||Study the performance gain obtained by making some devices provide broadband access to other users in a densely populated area||constrained K-means clustering||Given fixed amount of network resources, the proposed algorithm can significantly improve the overall performance of network users|
|||Small cell network||Minimize the network cost||K-means clustering||Significant gains can be achieved in energy expenditure and load reduction compared to the conventional transmission techniques|
|||Cache-enabled opportunistic IA networks||Maximize system sum rate||Deep reinforcement learning||The proposal can achieve improved performance in terms of system sum rate and energy efficiency compared to an existing scheme|
V Machine Learning Based Mobility Management
In wireless networks, mobility management is a key component in guaranteeing successful service delivery. Recently, machine learning has shown its significant advantages in user mobility prediction, handover process optimization, and so on. In this section, machine learning based, mobility management schemes are comprehensively surveyed.
V-a Reinforcement Learning Based Approaches
The fuzzy logic controller (FLC) is an effective mobility robustness optimization technique, which has been utilized to adjust handover parameters such as hysteresis margin (Hys) and time-to-trigger (TTT). In , the authors focus on a two-tier network composed of macro cells and small cells, and propose a dynamic fuzzy Q-learning algorithm for mobility management. To apply Q-learning, the call drop rate together with the signaling load caused by handover constitutes the system state, while the action space is defined as the set of possible values for the adjustment of handover margin. The aim is to achieve a tradeoff between the signaling cost incurred by handover and the user experience affected by call dropping ratio (CDR). Simulation results show that the proposed scheme is effective in minimizing the number of handovers while keeping the CDR at a desired level. In addition, Klein et al. in  also apply the framework based on fuzzy Q-learning to jointly optimize TTT and Hys, and the superiority of the scheme compared with Q-learning strategy is shown.
Moreover, achieving load balancing during the handover process is an essential part to ensure the stability of the communication process. In , a fuzzy-rule based RL system is proposed for small cell networks, which aims to balance traffic load by selecting transmit power (TXP) and Hys. Considering that the call blocking ratio (CBR) and the outage ratio (OR) will change significantly when the load in a cell is heavy, the two parameters jointly comprise the observation state. The adjustments of Hys and TXP are system actions, and the reward is defined such that user satisfaction is optimized. As a result, the optimal adjustment strategy for Hys and TXP is generated by the Q-learning system based on fuzzy rules, which can minimize the localized congestion of small cell networks.
For the LTE network with multiple SON functions, it is inevitable that optimization conflict exists. In , the authors propose a comprehensive solution for SON functions including handover optimization and load balancing. In this scheme, the fuzzy Q-Learning controller is utilized to adjust the Hys and TTT parameters simultaneously, while the heuristic Diff-Load algorithm optimizes the handover offset according to different load measurements in the cell. To apply fuzzy Q-learning, radio link failure, handover failure, and handover ping-pong, which are key performance indicators (KPIs) in the handover process, are defined as the inputs to the fuzzy system with three fuzzy labels appended to them by membership functions. After that, the truth value and initial state value are calculated for each input variable, and the system rewards, corresponding to each state-action pair, are recorded through an iterative process. Finally, the optimization strategy of the Hys and TTT is obtained. By LTE-Sim simulation, results show that the proposal enables the joint optimization of the KPIs above.
In addition to Q-learning, authors in  and  utilize deep reinforcement learning for mobility management. In , to overcome the challenge of intelligent wireless network management when a large number of RANs and devices are deployed, Cao et al. formulate an artificial intelligence framework based on DRL. The framework is divided into four parts: real environment, environment capsule, feature extractor, and policy network. Wireless facilities in a real environment upload information such as the RSSI to the environment capsule. Then, the capsule transmits the stacked data to the wireless signal feature extraction part consisting of a CNN and a RNN. After that, these extracted feature vectors will be input to the policy network that is based on a deep Q-network to select the best action for real network management. Finally, this novel framework is applied to a seamless handover scenario with one user and multiple APs. Using the measurement of RSSI as input, the user is guided to select the best AP, which maximizes network throughput.
While in , the author proposes a two-layer framework to optimize the handover process and reach a balance between the handover rate and system throughput. The first step is to apply a centralized control method to classify the UEs according to their mobility patterns with unsupervised learning. Then, the multi-user handover process in each cluster is distributedly optimized using DRL. Specifically, the RSRQ received by the user from the candidate BS and the current serving BS index make up the state vector. At the same time, the weighted sum between the average handover rate and throughput is defined as the system reward. In addition, considering that new state exploration in DRL may start from some unexpected initial points, the performance of UEs will greatly fluctuate. In this framework, Wang et al. apply the output of the traditional 3GPP handover scheme as training data to initialize deep Q networks through supervised learning and compensate the negative effects caused by exploration at the early stage of learning.
V-B Supervised Learning Based Approaches
Except for the current location of the mobile equipment, learning an individual’s next location enables novel mobile applications and a seamless handover process. In general, location prediction utilizes the user’s historical trajectory information to infer the next position of a user. In order to overcome the lack of historical information issue, Yu et al. propose a supervised learning based prediction method based on user activity patterns . The core idea is to first predict the user’s next activity and then predict its next location. Simulation results demonstrate the robust performance of the proposal.
V-C Unsupervised Learning Based Approaches
Considering the impact of RF conditions at cell edge on the setting of handover parameters, the author in  proposes an unsupervised-shapelets based method to help BSs be automatically aware of the RF conditions of their cell edge by finding useful patterns from RSRP information reported by users. In addition, RSRP information can be employed to derive the position at the point of a handover trigger. In , the authors propose a modified self-organizing map (SOM) based method to determine whether indoor users should be switched to another external BS based on their location information. SOM is a type of unsupervised NN that allows generating a low dimensional output space from the high dimensional discrete input. The input data in this scheme is RSRP together with the angle of the arrival of the mobile terminal, based on which the real physical location of a user can be determined by the SOM algorithm. After that, the handover decision can be made for the user according to pre-defined prohibited and permitted areas. Through evaluation using network simulator 3, the proposal decreases the number of unnecessary handovers by 70%.
V-D Lessons Learned
Q-learning based on the FLC and unsupervised self-organizing map can both decrease unnecessary handover, by optimizing mobility parameters such as Hys and TTT, and identifying user location, respectively. Furthermore, using DNNs, CNNs, and RNNs can help extract robust features from raw data collected by the network like RSRP, which facilitates the deep reinforcement learning agent to make intelligent handover decisions.
|Literature||Scenario||Objective||Machine learning methods||Main conclusion|
|||Small cell networks||Achieve a tradeoff between the signaling cost led by handover and the user experience influenced by CDR||Fuzzy Q-learning||The proposal can minimize the number of handovers while keeping the CDR at a desired level|
|||Cellular networks||Minimize the weighted difference between the target KPIs and achieved KPIs||Fuzzy Q-learning||The proposal outperforms existing methods in terms of HO failures and ping-pong HOs|
|||Small cell networks||Achieve load balancing||Fuzzy Q-learning||The proposal can minimize the localized congestion of small-cell networks|
|||LTE networks||Realize the coordination of multi-SON functions||Fuzzy Q-learning||The proposal can enable the joint optimization of different KPIs|
|||WLAN||Optimize system throughput||Deep reinforcement learning||The proposal can improve system throughput and reduce the number of handovers|
|||Cellular networks||Minimize the handover rates and ensure the system throughput||Deep reinforcement learning||The proposal outperforms the state-of-art on-line schemes in terms of HO rates|
|||Wireless networks||Predict users’ next location||Supervised learning||The proposed approach can predict more accurately and perform robustly|
|||Heterogeneous networks||Classify the user trajectories||Unsupervised shapelets||The proposed approach provide clustering results with an average accuracy of 95%|
|||LTE networks||Identify whether an indoor user should be switched to an external BS||Self-organizing map||The proposed approach can reduce unnecessary handovers by up to 70%|
Vi Machine Learning Based Localization
In recent years, we have witnessed an explosive proliferation of location based services, whose service quality is highly dependent on the accuracy of localization. The mature technique Global Positioning System (GPS) has been widely used for outdoor localization. However, when it comes to indoor localization, GPS signals from a satellite will be heavily attenuated, which makes GPS incapable of use in indoor localization. Furthermore, indoor environments are more complex as there are lots of obstacles such as tables, wardrobes, and so on, thus causing the difficulty of localization. In this situation, to locate indoor mobile users precisely, many wireless technologies can be utilized such as WLAN, ultra-wide bandwidth (UWB), and Bluetooth. Moreover, the common measurements used in indoor localization include time of arrival (TOA), time difference of arrival (TDOA), channel state information (CSI), and received signal strength (RSS). To solve various problems associated with indoor localization, research has been conducted by adopting machine learning for these measurements in scenarios with different wireless technologies.
Vi-1 Supervised Learning Based Approaches
Instead of assuming that equal signal differences account for equal geometrical distances as in traditional KNN based localization approaches do, the author in  proposes a feature scaling based KNN (FS-KNN) localization algorithm, considering that the relation between signal differences and geometrical distances is actually dependent on the measured RSS. Specifically, in the signal distance calculation between the fingerprint of each RP and the RSS vector reported by the user, the square of each element-wise difference is multiplied by a weight that is a function of the corresponding RSS value measured by the user. To identify the parameters involved in the weight function, an iterative training procedure is used, and each iteration consists of the performance evaluation on a test set whose metric is taken as the objective function of a simulated annealing algorithm to tune those parameters. After the model is well trained, the distance between a newly received RSS vector and each fingerprint is calculated, and then the location of the user is determined by calculating a weighted mean of the locations of the k nearest RPs.
To deal with the high energy consumption incurred by the frequent AP scanning via WiFi interfaces, an energy-efficient indoor localization system is proposed in , where ZigBee interfaces are used to collect WiFi signals. To improve localization accuracy, three KNN based localization approaches adopting different distance metrics are evaluated, including weighted Euclidian distance, weighted Manhattan distance, and relative entropy. The principle for weight setting is that the AP with more redundant information is assigned with a lower weight. While in , the author theoretically analyzes the optimal number of nearest RPs used to identify the user location in a KNN based localization algorithm, and it is shown that and outperform the other KNN algorithms for the application of static localization.
To avoid regularly training localization models from scratch, the author in  proposes an online independent support vector machine (OISVM) based localization system that employs the RSS of Wi-Fi signals. Compared to traditional SVM, OISVM is capable of learning in an online fashion and allows to make a balance between accuracy and model size, facilitating its adoption on mobile devices. The constructed system includes two phases, i.e., the offline phase and the online phase. The offline phase further includes kernel parameter selection, data under sampling to deal with the imbalanced data problem, and offline training using pre-collected RSS data set with RSS samples appended with corresponding RP labels. In the online phase, location estimation is conducted for new RSS samples, and meanwhile online learning is performed as new training data arrives, which can be collected via crowdsourcing. The simulation result shows that the proposal can reduce the location estimation error by 0.8m, while the prediction time and training time are decreased significantly compared to traditional methods.
Given that non-line-of-sight (NLOS) radio blockage can lower the localization accuracy, it is beneficial to identify the NLOS signals. To this end, the author in  develops a relevance vector machine (RVM) based method for ultrawide bandwidth TOA localization. Specifically, a RVM based classifier is used to identify the LOS and NLOS signals received by the agent with unknown position from the anchor with its position already known, while a RVM regressor is adopted for ranging error prediction. Both of the two models take a feature vector as input data, which consists of received energy, maximum amplitude, and so on. The advantage of RVM over SVM is that RVM uses a smaller number of relevance vectors than the number of support vectors in the SVM case, hence reducing the computational complexity. On the contrary, the author in  proposes an SVM based method, where a mapping between features extracted from the received waveform and the ranging error is directly learned. Hence, explicit LOS or NLOS signal identification is not needed anymore.
In addition to KNN and SVM, some researchers have applied NNs to localization. In order to reduce the time cost of the training procedure, the authors in  utilize ELM. The RSS fingerprints and their corresponding physical coordinates are used to train the output weights. After the model is trained, it can predict the physical coordinate of a new RSS vector. Also adopting a single layer NN, in , a NN based method is proposed for an LTE downlink system, aiming at estimating the UE position. The employed NN contains three layers including an input layer, a hidden layer, and an output layer, with the input and the output being channel parameters and the corresponding coordinates, respectively. Levenberg Marquardt algorithm is applied to iteratively adjust the weights of the neural network based on the Mean Squared Error that is a metric to assess the error between the output and the corresponding known label. When the NN is trained, it can predict the location given the new data. Preliminary experimental results show that the proposed method has yielded a median positioning error distance of 6 meters for the indoor scenario.
In a WiFi network with RSS-based measurement, a deep NN is utilized in  for indoor localization without using the pathloss model or comparing with the fingerprint database. The training set contains the RSS vectors appended with the central coordinates and indexes of the corresponding grid areas. The implementation procedure of the NN can be divided into three parts that are transforming, denoising, and localization. Particularly, this method pre-trains the transforming section and denoising section by using auto-encoder block. Experiments show that the proposed method can realize higher localization accuracy compared with maximum likelihood estimation, the generalised regression neural network, and fingerprinting methods.
Vi-2 Unsupervised Learning Based Approaches
To reduce computation complexity and save storage space for fingerprinting based localization systems, the author in  first divides the radio map into multiple sub radio maps, and then uses Kernel Principal Components Analysis (PCA) to extract features of each sub radio map to get a low dimensional version. Result shows that the size of the radio map can be reduced by 72% with 2m localization error. In , PCA is also employed together with linear discriminant analysis to extract lower dimensional features from raw RSS measurement.
Considering the drawbacks of adopting RSS measurement as fingerprints like its high randomness and loose correlation with propagation distance, the author in  proposes to utilize the calibrated CSI phase information for indoor fingerprinting. Specifically, in the off-line phase, a deep autoencoder network is constructed for each position to reconstruct the collected calibrated CSI phase information, and the weights are recorded as the fingerprints. While in the online phase, a new location is obtained by a probabilistic method that performs a weighted average of all the reference locations. Simulation results show that the proposal outperforms traditional CSI or RSS based methods in two representative scenarios. Moreover, similar ideas are adopted in  and . In , the author uses a denoising autoencoder for bluetooth, low-energy, indoor localization to provide high performance 3-D localization in large indoor places.
Vi-3 Lessons learned
In order to improve localization accuracy, ML based methods can be applied in the following ways. Firstly, the most intuitive approach is to compare the similarity between the RSS vector measured by the user with pre-collected fingerprints at different RPs, which is the basic principle of KNN based localization methods. Secondly, SVM classifier can be used to map the input RSS vector to the RP index, and RVM classifier can be adopted to identify LOS and NLOS signals with low computation complexity. Thirdly, KPCA is an effective approach to reduce the size of radio map and hence the storage capacity at the terminal can be saved. At last, the NN is a powerful tool to extract useful information from RSS data or CSI data to help infer user locations.
|Literature||Scenario||Objective||Machine learning methods||Main conclusion|
|||WLAN||Improve localization accuracy by considering a more practical similarity metric for RSS vectors||FS-KNN||Simulation shows that FS-KNN can achieve an average distance error of 1.93m|
|||WLAN||Improve the energy efficiency of indoor localization||KNN||The proposal can achieve high localization accuracy with the average energy consumption significantly reduced compared with WiFi interface based methods|
|||WLAN||Find the optimal number k of nearest RPs for KNN based localization||KNN||The accuracy performance can not be further improved by increasing the parameter k|
|||WLAN||Reduce training and prediction time||Online independent support vector machine||The proposal can improve localization accuracy and meanwhile decrease the prediction time and training time|
|||UWB||To identify NLOS signal||Relevance vector machine||Simulation result shows that the proposal can improve the range estimates of NLOS signals|
|||UWB||Mitigate ranging error without NLOS signal identification||SVM||The proposal can achieve considerable performance improvements in various practical localization scenarios|
|||WLAN||Reduce the size of radio map||KPCA||The proposal can reach high localization accuracy while reducing 74% size of the radio map|
|||WLAN||Extract low dimensional features of RSS data||PCA and LCA||The combination method outperforms the LCA-based method in terms of the accuracy of floor classification|
|||WLAN||Reduce localization model training time||Extreme learning machine||The Consensus-based Parallel ELM performs similarly compared to centralized ELM in terms of localization accuracy with no additional computation cost and more robustness|
|||LTE networks||Reduce calculation time||Dense neural network||By using only one LTE eNodeB, the proposal can achieve an error distance of 6 meters in indoor environments|
|||WLAN||Achieve high localization accuracy without using the radio pathloss model or comparing the radio map||Auto-encoder||The proposal outperforms maximum likelihood estimation, the generalised regression neural network, and fingerprinting methods|
|||WLAN||Achieve higher localization accuracy by utilizing CSI phase information for fingerprinting||Deep autoencoder network||The proposal outperforms three benchmark schemes based on CSI or RSS in two typical indoor scenarios|
|||WLAN||Overcome the drawbacks of RSS based localization methods by utilizing CSI information for fingerprinting||Deep autoencoder network||Experimental results show that the proposal can localize the target effectively|
|||WLAN||Achieve higher localization accuracy by utilizing CSI amplitudes and estimated angle of arrivings for fingerprinting||Deep autoencoder network||The proposal has superior performance compared to several baseline schemes like FIFS in |
|||WLAN||Achieve high performance 3-D localization in large indoor places||Deep autoencoder network||Positioning accuracy can be effectively improved by 3-D space fingerprinting|
Vii Challenges and Open Issues
Although many research studies have been conducted on the applications of ML in wireless communications, several challenges and open issues are identified in this section to facilitate further study on the applications of ML.
Vii-a Machine Learning Based Heterogenous Backhaul/Fronthaul Management
In future wireless networks, various backhaul/fronthaul solutions will coexist , including wired backhaul/fronthaul like fiber and cable as well as wireless backhaul/fronthaul like the sub-6 GHz band. Each solution has a different amount of energy consumption and different bandwidth, and hence the management of backhaul/fronthaul is important to the whole system performance. In this case, ML based techniques can be utilized to select suitable backhaul/fronthaul solutions based on the extracted traffic patterns and performance requirements of users.
Vii-B Infrastructure Update
To facilitate the adoption of ML in practical systems, current wireless network infrastructures should be evolved to fit into the deployment of ML based systems. For example, servers equipped with GPUs can be deployed at the network edge to implement deep learning based signal processing, resource management, and localization, and storage devices are needed at the network edge as well to achieve in-time data analysis. Moreover, network function virtualization should be involved in the wireless network, which decouples the network functions and hardware, and network functions can be implemented as softwares. In this way, machine learning can be adopted to realize flexible network control and configuration.
Vii-C Machine Learning Based Network Slicing
As a cost-efficient way to support diverse use cases, network slicing has been advocated by both academia and industry . The core of network slicing is to allocate appropriate resources including computing, caching, backhaul/fronthaul, and radio resources on demand to guarantee the performance requirements of different slices under slice isolation constraints. Generally speaking, network slicing can benefit from ML in the following aspects. First, ML can be used to learn the mapping between the system state and resource allocation plans, and hence a new network slice can be quickly constructed. Second, by employing transfer learning, knowledge about resource allocation plans for different use cases in one environment can act as useful knowledge in another environment, which can speed up the learning process. Recently, the authors in  and  have applied DRL to network slicing, and the advantages of DRL are demonstrated via simulations.
Vii-D Standard Datasets and Environments for Research
To make researchers pay full attention to the learning algorithm design and conduct fair comparisons between different ML based approaches, it is essential to identify some common problems in wireless networks together with corresponding labeled or unlabeled data for supervised or unsupervised learning based approaches, similar to the open dataset MNIST which is often used in computer vision. For reinforcement based approaches, standard network control problems together with well defined environments should be built, similar to the standard environment MountainCar-v0.
Vii-E Theoretical Guidance for Algorithm Implementation
It is known that the performance of ML algorithms is affected by the selection of hyperparameters like the learning rate, loss functions, and so on. Trying different hyperparameters directly is a time-consuming task, especially when the training time for the model under a fixed set of hyperparameters is long. Moreover, the theoretical analysis of the size of dataset needed for training, the performance bound of deep learning architectures, and the ability of generalization of different learning models are still open questions. Since stability is one of the main features of communication systems, rigorous theoretical studies on all these aspects are essential to ensure ML based approaches always work well in practical systems.
Vii-F Transfer Learning Based Approaches
Transfer learning promises transferring the knowledge learned from one task to another similar task. By avoiding retraining learning models from scratch for every individual setting, the learning process in new environments can be speeded up, and the ML algorithm can have a good performance even with a small amount of training data. Therefore, transfer learning is critical for the practical implementation of learning models considering the cost for retraining without prior knowledge. Using transfer learning, network operators can apply ML to solve new but similar problems in a cost-efficient manner. However, negative effects of prior knowledge on system performance should be dealt with as pointed out in , and need further investigation.
This paper surveys the state-of-the-art applications of machine learning (ML) in wireless communication and outlines several unresolved problems. Faced with the intricacies of these applications, we have broadly divided the body of knowledge into resource management in the MAC layer, networking and mobility management in the network layer, and localization in the application layer. Within each of these topics, we have surveyed the diverse ML based approaches that have been proposed for enabling wireless networks to run intelligently. Nevertheless, considering that the applications of ML in wireless communications are still at the initial stage, there are quite a number of problems that need further investigation. For example, infrastructure update is required for the implementation of ML based paradigms, theoretical analysis on the ML based approaches should be conducted to provide a performance guarantee, and open data sets and environments are expected to facilitate future research on the ML applications in a wide range.
-  M. Agiwal, A. Roy, and N. Saxena, “Next generation 5G wireless networks: A comprehensive survey,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1617–1655, Thirdquarter 2016.
-  W. H. Chin, Z. Fan, and R. Haines, “Emerging technologies and research challenges for 5G wireless networks,” IEEE Wireless Commun., vol. 21, no. 2, pp. 106–112, Apr. 2014.
-  Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, Fourthquarter 2017.
-  X. Wang, M. Chen, T. Taleb, A. Ksentini, and V. C.M. Leung, “Cache in the air: Exploiting content caching and delivery techniques for 5G systems,” IEEE Commun. Mag., vol. 52, no. 2, pp. 131–139, Feb. 2014.
-  M. Peng, Y. Sun, X. Li, Z. Mao, C. Wang, “Recent advances in cloud radio access networks: System architectures, key techniques, and open issues,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 2282–2308, Thirdquarter 2016.
-  J. Liu, N. Kato, J. Ma, and N. Kadowaki, “Device-to-device communication in LTE-advanced networks: A survey,” IEEE Commun. Surveys Tuts., vol. 17, no. 4, pp. 1923–1940, Fourthquarter 2015.
-  S. Han, C. I, G. Li, S. Wang, and Q. Sun, “Big data enabled mobile network design for 5G and beyond,” IEEE Commun. Mag., vol. 55, no. 9, pp. 150–157, Jul. 2017.
-  R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, “Machine learning: An artificial intelligence approach, Springer Science & Business Media,” 2013.
-  H. Sun et al., “Learning to optimize: Training deep neural networks for wireless resource management,” arXiv:1705.09412v2, Oct. 2017, accessed on Apr. 15, 2018.
-  G. Cao, Z. Lu, X. Wen, T. Lei, and Z. Hu, “AIF : An artificial intelligence framework for smart wireless network management,” IEEE Commun. Lett., vol. 22, no. 2, pp. 400–403, Feb. 2018.
-  C. Xiao, D. Yang, Z. Chen, and G. Tan, “3-D BLE indoor localization based on denoising autoencoder,” IEEE Access, vol. 5, pp. 12751–12760, Jun. 2017.
-  R. Li, Z. Zhao, X. Chen, J. Palicot, and H. Zhang, “TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks,” IEEE Trans. Wireless Commun., vol. 13, no. 4, pp. 2000–2011, Apr. 2014.
-  C. Jiang et al., “Machine learning paradigms for next-generation wireless networks,” IEEE Wireless Commun., vol. 24, no. 2, pp. 98–105, Apr. 2017.
-  M. A. Alsheikh, S. Lin, D. Niyato, and H. P. Tan, “Machine learning in wireless sensor networks: Algorithms, strategies, and applications,” IEEE Commun. Surveys Tuts., vol. 16, no. 4, pp. 1996–2018, Apr. 2014.
-  T. Park, N. Abuzainab, and W. Saad, “Learning how to communicate in the internet of things: Finite resources and heterogeneity,” IEEE Access, vol. 4, pp. 7063–7073, Nov. 2016.
-  O. B. Sezer, E. Dogdu, and A. M. Ozbayoglu, “Context-aware computing, learning, and big data in internet of things: A survey,” IEEE Internet Things J., vol. 5, no. 1, pp. 1–27, Feb. 2018.
-  M. Bkassiny, Y. Li, and S. K. Jayaweera, “A survey on machine-learning techniques in cognitive radios,” IEEE Commun. Surveys Tuts., vol. 15, no. 3, pp. 1136–1159, Oct. 2013.
-  W. Wang, A. Kwasinski, D. Niyato, and Z. Han, “A survey on applications of model-free strategy learning in cognitive wireless networks,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 1717–1757, Thirdquarter 2016.
-  X. Wang, X. Li, and V. C. Leung, “Artificial intelligence-based techniques for emerging heterogeneous network: State of the arts, opportunities, and challenges,” IEEE Access, vol. 3, pp. 1379–1391, Aug. 2015.
-  P. V. Klaine, M. A. Imran, O. Onireti, and R. D. Souza, “A survey of machine learning techniques applied to self-organizing cellular networks,” IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2392–2431, Jul. 2017.
-  X. Cao, L. Liu, Y. Cheng, and X. Shen, “Towards energy-efficient wireless networking in the big data era: A survey,” IEEE Commun. Surveys Tuts., vol. 20, no. 1, pp. 303–332, Firstquarter 2018.
-  M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Machine learning for wireless networks with artificial intelligence: A tutorial on neural networks,” arXiv:1710.02913v1, Oct. 2017, accessed on Jun. 20, 2018.
-  T. O’ Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017.
-  T. Wang et al., “Deep learning for wireless physical layer: Opportunities and challenges,” arXiv:1710.05312v2, Oct. 2017, accessed on Jun. 30, 2018.
-  Q. Mao, F. Hu, and Q. Hao, “Deep learning for intelligent wireless networks: A comprehensive survey,” IEEE Commun. Surveys Tuts., Jun. 2018, doi: 10.1109/COMST.2018.2846401, submitted for publication.
-  C. Zhang, P. Patras, and H. Haddadi, “Deep learning in mobile and wireless networking: A survey,” arXiv:1803.04311v1, Mar. 2018, accessed on Jun. 30, 2018.
-  Z. M. Fadlullah et al., “State-of-the-art deep learning: Evolving machine intelligence toward tomorrows intelligent network traffic control systems,” IEEE Commun. Surveys Tuts., vol. 19, no. 4, pp. 2432–2455, May 2017.
-  V. N. Vapnik, Statistical Learning Theory, vol. 1. New York, NY, USA: Wiley, 1998.
-  B. S. Everitt, S. Landau, M. Leese, and D. Stahl, Miscellaneous Clustering Methods, in Cluster Analysis, 5th Edition, John Wiley & Sons, Ltd, Chichester, UK, 2011.
-  H. Abdi and L. J. Williams, “Principal component analysis,” Wiley interdisciplinary reviews: Computational statistics, vol. 2, no. 4, pp. 433–459, Jul./Aug. 2010.
-  R. Sutton and A. Barto, Reinforcement learning: An introduction, Cambridge, MA: MIT Press, 1998.
-  P. Y. Glorennec, “Fuzzy Q-learning and dynamical fuzzy Q-learning,” in Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference, Orlando, FL, USA, Jun. 1994, pp. 474–479.
-  S. Maghsudi and E. Hossain, “Distributed user association in energy harvesting dense small cell networks: A mean-field multi-armed bandit approach,” IEEE ACCESS, vol. 5, pp. 3513–3523, Mar. 2017.
-  S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Found. Trends Mach. Learn., vol. 5, no. 1, pp. 1–122, 2012.
-  S. Singh, T. Jaakkola, M. Littman, and C. Szepesvri, “Convergence results for single-step on-policy reinforcement-learning algorithms,” Mach. Learn., vol. 38, no. 3, pp. 287–308, Mar. 2000.
-  S. M. Perlaza, H. Tembine, and S. Lasaulce, “How can ignorant but patient cognitive terminals learn their strategy and utility?” in Proceedings of SPAWC, Marrakech, Morocco, Jun. 2010, pp. 1–5.
-  V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
-  Y. Li, “Deep reinforcement learning: An overview,” arXiv:1701.07274v5, Sept. 2017, accessed on Jun. 30, 2018.
-  S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv:1609.04747v2, Jun. 2017, accessed on Jun. 30, 2018.
-  G. Huang, Q. Zhu, and C. Siew, “Extreme learning machine: Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–501, Dec. 2006.
-  M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” Journal of Machine Learning Research, vol. 10, pp. 1633–1685, Jul. 2009.
-  M. Simsek, M. Bennis, and I. Guvenc, “Learning based frequency- and time-domain inter-cell interference coordination in HetNets,” IEEE Trans. Veh. Technol., vol. 64, no. 10, pp. 4589–4602, Oct. 2015.
-  T. Sanguanpuak, S. Guruacharya, N. Rajatheva, M. Bennis, and M. Latva-Aho, “Multi-operator spectrum sharing for small cell networks: A matching game perspective,” IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 3761–3774, Jun. 2017.
-  A. Galindo-Serrano and L. Giupponi, “Distributed Q-learning for aggregated interference control in cognitive radio networks,” IEEE Trans. Veh. Technol., vol. 59, no. 4, pp. 1823–1834, May 2010.
-  M. Bennis, S. M. Perlaza, P. Blasco, Z. Han, and H. V. Poor, “Self-organization in small cell networks: A reinforcement learning approach,” IEEE Trans. Wireless Commun., vol. 12, no. 7, pp. 3202–3212, Jul. 2013.
-  A. Asheralieva and Y. Miyanaga, “An autonomous learning-based algorithm for joint channel and power level selection by D2D pairs in heterogeneous cellular networks,” IEEE Trans. Commun., vol. 64, no. 9, pp. 3996–4012, Sep. 2016.
-  L. Xu and A. Nallanathan, “Energy-efficient chance-constrained resource allocation for multicast cognitive OFDM network,” IEEE J. Sel. Areas Commun., vol. 34, no. 5, pp. 1298–1306, May 2016.
-  M. Lin, J. Ouyang, and W. P. Zhu, “Joint beamforming and power control for device-to-device communications underlaying cellular networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 1, pp. 138–150, Jan. 2016.
-  W. Lee, M. Kim, and D. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,” IEEE Commun. Lett., vol. 22, no. 6, pp. 1276–1279, Apr. 2018.
-  K. I. Ahmed, H. Tabassum, and E. Hossain, “Deep learning for radio resource allocation in multi-cell networks,” arXiv:1808.00667v1, Aug. 2018, accessed on Aug. 15, 2018.
-  A. Galindo-Serrano, L. Giupponi, and G. Auer, “Distributed learning in multiuser OFDMA femtocell networks,” in Proceedings of VTC, Yokohama, Japan, May 2011, pp. 1–6.
-  C. Fan, B. Li, C. Zhao, W. Guo, and Y. C. Liang, “Learning-based spectrum sharing and spatial reuse in mm-wave ultra dense networks,” IEEE Trans. Veh. Technol., vol. 67, no. 6, pp. 4954–4968, Jun. 2018.
-  Y. Zhang, W. P. Tay, K. H. Li, M. Esseghir, and D. Gaiti, “Learning temporal-spatial spectrum reuse,” IEEE Trans. Commun., vol. 64, no. 7, pp. 3092–3103, Jul. 2016.
-  G. Alnwaimi, S. Vahid, and K. Moessner, “Dynamic heterogeneous learning games for opportunistic access in LTE-based macro/femtocell deployments,” IEEE Trans. Wireless Commun., vol. 14, no. 4, pp. 2294–2308, Apr. 2015.
-  Y. Sun, M. Peng, and H. V. Poor, “A distributed approach to improving spectral efficiency in uplink device-to-device enabled cloud radio access networks,” IEEE Trans. Commun., Jul. 2018, doi: 10.1109/TCOMM.2018.2855212, submitted for publication.
-  M. Srinivasan, V. J. Kotagi, and C. S. R. Murthy, “A Q-learning framework for user QoE enhanced self-organizing spectrally efficient network using a novel inter-operator proximal spectrum sharing,” IEEE J. Sel. Areas Commun., vol. 34, no. 11, pp. 2887–2901, Nov. 2016.
-  M. Chen, W. Saad, and C. Yin, “Echo state networks for self-organizing resource allocation in LTE-U with uplink-downlink decoupling,” IEEE Trans. Wireless Commun., vol. 16, no. 1, pp. 3–16, Jan. 2017.
-  M. Chen, W. Saad, and C. Yin, “Virtual reality over wireless networks: Quality-of-service model and learning-based resource management,” IEEE Trans. Commun., Jun. 2018, doi: 10.1109/TCOMM.2018.2850303, submitted for publication.
-  M. Jaber, M. Imran, R. Tafazolli, and A. Tukmanov, “An adaptive backhaul-aware cell range extension approach,” in Proceedings of ICCW, London, UK, Jun. 2015, pp. 74–79.
-  Y. Xu, R. Yin, and G. Yu, “Adaptive biasing scheme for load balancing in backhaul constrained small cell networks,” IET Commun., vol. 9, no. 7, pp. 999–1005, Apr. 2015.
-  K. Hamidouche, W. Saad, M. Debbah, J. B. Song, and C. S. Hong, “The 5G cellular backhaul management dilemma: To cache or to serve,” IEEE Trans. Wireless Commun., vol. 16, no. 8, pp. 4866–4879, Aug. 2017.
-  S. Samarakoon, M. Bennis, W. Saad, and M. Latva-aho, “Backhaul-aware interference management in the uplink of wireless small cell networks,” IEEE Trans. Wireless Commun., vol. 12, no. 11, pp. 5813–5825, Nov. 2013.
-  J. Lun and D. Grace, “Cognitive green backhaul deployments for future 5G networks,” in Proc. Int. Workshop Cognitive Cellular Systems (CCS), Germany, Sep. 2014, pp. 1-5.
-  P. Blasco, M. Bennis, and M. Dohler, “Backhaul-aware self-organizing operator-shared small cell networks,” in Proceedings of ICC, Budapest, Hungary, Jun. 2013, pp. 2801-2806.
-  M. Jaber, M. A. Imran, R. Tafazolli, and A. Tukmanov, “A multiple attribute user-centric backhaul provisioning scheme using distributed SON,” in Proceedings of GLOBECOM, Washington DC, USA, Dec. 2016, pp. 1-6.
-  Cisco Visual Networking Index: “Global mobile data traffic forecast update 2015¨C2020,” 2016.
-  Z. Chang, L. Lei, Z. Zhou, S. Mao, and T. Ristaniemi, “Learn to cache: Machine learning for network edge caching in the big data era,” IEEE Wireless Commun., vol. 25, no. 3, pp. 28–35, Jun. 2018.
-  S. Hassan, S. Samarakoon, M. Bennis, M. Latva-aho, and C. S. Hong, “Learning-based caching in cloud-aided wireless networks,” IEEE Commun. Lett., vol. 22, no. 1, pp. 137–140, Jan. 2018.
-  W. Wang et al., “Edge caching at base stations with device-to-device offloading,” IEEE Access, vol. 5, pp. 6399–6410, Mar. 2017.
-  Y. He, N. Zhao, and H. Yin, “Integrated networking, caching and computing for connected vehicles: A deep reinforcement learning approach,” IEEE Trans. Veh. Technol., vol. 67, no. 1, pp. 44–55, Jan. 2018.
-  C. Zhong, M. Gursoy, and S. Velipasalar, “A deep reinforcement learning-based framework for content caching,” in Proceedings of CISS, Princeton, NJ, USA, Mar. 2018, pp. 1–6.
-  S. M. S. Tanzil, W. Hoiles, and V. Krishnamurthy, “Adaptive scheme for caching YouTube content in a cellular network: Machine learning approach,” IEEE Access, vol. 5, pp. 5870–5881, Mar. 2017.
-  M. Chen et al., “Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-ofexperience,” IEEE J. Sel. Areas Commun., vol. 35, no. 5, pp. 1046–1061, May 2017.
-  M. Chen, W. Saad, C. Yin, and M. Debbah, “Echo state networks for proactive caching in cloud-based radio access networks with mobile users,” IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 3520–3535, Jun. 2017.
-  K. N. Doan, T. V. Nguyen, T. Q. S. Quek, and H. Shin, “Content-aware proactive caching for backhaul offloading in cellular network,” IEEE Trans. Wireless Commun., vol. 17, no. 5, pp. 3128–3140, May 2018.
-  B. N. Bharath, K. G. Nagananda, and H. V. Poor, “A learning-based approach to caching in heterogenous small cell networks,” IEEE Trans. Commun., vol. 64, no. 4, pp. 1674–1686, Apr. 2016.
-  E. Bastug, M. Bennis, and M. Debbah, “Anticipatory caching in small cell networks: A transfer learning approach,” in Proceedings of Workshop Anticipatory Netw., Germany, Sep. 2014.
-  E. Bastug, M. Bennis, and M. Debbah, “A transfer learning approach for cache-enabled wireless networks,” in Proceedings of WiOpt, Mumbai, India, May 2015, pp. 161–166.
-  X. Chen et al., “Optimized computation offloading performance in virtual edge computing systems via deep reinforcement learning,” 1805.06146v1, May 2018, accessed on Jun. 15, 2018.
-  J. Wang et al., “A machine learning framework for resource allocation assisted by cloud computing,” IEEE Netw., vol. 32, no. 2, pp. 144–151, Apr. 2018.
-  Z. Li, C. Wang, and C. J. Jiang, “User association for load balancing in vehicular networks: An online reinforcement learning approach,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 8, pp. 2217–2228, Aug. 2017.
-  F. Pervez, M. Jaber, J. Qadir, S. Younis, and M. A. Imran, “Fuzzy Q-learning-based user-centric backhaul-aware user cell association scheme,” IEEE Access, Jul. 2018, doi: 10.1109/ACCESS.2018.2850752, submitted for publication.
-  T. Kudo and T. Ohtsuki, “Cell range expansion using distributed Q-learning in heterogeneous networks,” in Proceedings of VTC, Las Vegas, USA, Sep. 2013, pp. 1–5.
-  Y. Meng, C. Jiang, L. Xu, Y. Ren, and Z. Han, “User association in heterogeneous networks: A social interaction approach,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9982–9993, Dec. 2016.
-  U. Challita, W. Saad, and C. Bettstetter, “Cellular-connected UAVs over 5G: Deep reinforcement learning for interference management,” arXiv:1801.05500v1, Jan. 2018, accessed on Jun. 15, 2018.
-  H. Yu et al., “Mobile data offloading for green wireless networks,” IEEE Wireless Commun., vol. 24, no. 4, pp. 31–37, Aug. 2017.
-  I. Ashraf, F. Boccardi, and L. Ho, “SLEEP mode techniques for small cell deployments,” IEEE Commun. Mag., vol. 49, no. 8, pp. 72–79, Aug. 2011.
-  R. Li, Z. Zhao, X. Chen, and H. Zhang, “Energy saving through a learning framework in greener cellular radio access networks,” in Proceedings of GLOBECOM, Anaheim ,USA, Dec. 2012, pp. 1556–1561.
-  X. Gan et al., “Energy efficient switch policy for small cells,” China Commun., vol. 12, no. 1, pp. 78–88, Jan. 2015.
-  G. Yu, Q. Chen, and R. Yin, “Dual-threshold sleep mode control scheme for small cells,” IET Commun., vol. 8, no. 11, pp. 2008–2016, Jul. 2014.
-  Z. Xu, Y. Wang, J. Tang, J.Wang, and M. Gursoy, “A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs,” in Proceedings of ICC, Paris, France, May 2017, pp. 1–6.
-  S. Fan, H. Tian, and C. Sengul, “Self-optimized heterogeneous networks for energy efficiency,” Eurasip J. Wireless Commun. Netw., vol. 2015, no. 21, pp. 1–11, Feb. 2015.
-  P. Y. Kong and D. Panaitopol, “Reinforcement learning approach to dynamic activation of base station resources in wireless networks,” in Proceedings of PIMRC, London, UK, Sep. 2013, pp. 3264–3268.
-  S. Sharma, S. J. Darak, and A. Srivastava, “Energy saving in heterogeneous cellular network via transfer reinforcement learning based policy,” in Proceedings of COMSNETS, Bengaluru, India, Jan. 2017, pp. 397–398.
-  S. Sharma, S. Darak, A. Srivastava, and H. Zhang, “A transfer learning framework for energy efficient Wi-Fi networks and performance analysis using real data,” in Proceedings of ANTS, Bangalore, India, Nov. 2016, pp. 1–6.
-  S. Samarakoon, M. Bennis, W. Saad, and M. Latva-aho, “Dynamic clustering and on/off strategies for wireless small cell networks,” IEEE Trans. Wireless Commun., vol. 15, no. 3, pp. 2164–2178, Mar. 2016.
-  Q. Zhao, D. Grace, A. Vilhar, and T. Javornik, “Using k-means clustering with transfer and Q-learning for spectrum, load and energy optimization in opportunistic mobile broadband networks,” in Proceedings of ISWCS, Brussels, Belgium, Aug. 2015, pp. 116–120.
-  M.Miozzo, L.Giupponi, M.Rossi and P.Dini, “Distributed Q-learning for energy harvesting Heterogeneous Networks,” in 2015 IEEE Intl. Conf. on Commun. Workshop (ICCW), London, 2015, pp. 2006-2011.
-  Y. Saleem et al., “Clustering and reinforcement-learning-based routing for cognitive radio networks,” IEEE Wireless Commun., vol. 24, no. 4, pp. 146–151, Aug. 2017.
-  A. Syed et al., “Route selection for multi-hop cognitive radio networks using reinforcement learning: An experimental study,” IEEE Access, vol. 4, pp. 6304–6324, Sep. 2016.
-  H. A. A. Al-Rawi, K. L. A. Yau, H. Mohamad, N. Ramli, and W. Hashim, “Effects of network characteristics on learning mechanism for routing in cognitive radio ad hoc networks,” in Proceedings of CSNDSP, Manchester, USA, Jul. 2014, pp. 748–753.
-  W. Jung, J. Yim, and Y. Ko, “QGeo: Q-learning-based geographic ad hoc routing protocol for unmanned robotic networks,” IEEE Commun. Lett., vol. 21, no. 10, pp. 2258–2261, Oct. 2017.
-  N. Kato et al., “The deep learning vision for heterogeneous network traffic control: Proposal, challenges, and future perspective,” IEEE Wireless Commun., vol. 24, no. 3, pp. 146–153, Jun. 2017.
-  F. Tang et al., “On removing routing protocol from future wireless networks: A real-time deep learning approach for intelligent traffic control,” IEEE Wireless. Commun., vol. 25, no. 1, pp. 154–160, Feb. 2018.
-  L. Lei et al. , “A deep learning approach for optimizing content delivering in cache-enabled HetNet,” in Proceedings of ISWCS, Bologna, Italy, Aug. 2017, pp. 449–453.
-  H. Tabrizi, G. Farhadi, and J. M. Cioffi, “CaSRA: An algorithm for cognitive tethering in dense wireless areas,” in Proceedings of GLOBECOM, Atlanta, USA, Dec. 2013, pp. 3855–3860.
-  Y. He et al., “Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks,” IEEE Trans. Veh. Technol., vol. 66, no. 11, pp. 10433–10445, Nov. 2017.
-  J. Wu, J. Liu, Z. Huang, and S. Zheng, “Dynamic fuzzy Q-learning for handover parameters optimization in 5G multi-tier networks,” in Proceedings of WCSP, Nanjing, China, Oct. 2015, pp. 1–5.
-  A. Klein, N. P. Kuruvatti, J. Schneider, and H. D. Schotten, “Fuzzy Q-learning for mobility robustness optimization in wireless networks,” in Proceedings of GC Wkshps, Atlanta, USA, Dec. 2013, pp. 76–81.
-  P. Muoz, R. Barco, J. M. Ruiz-Avil¨¦s, I. de la Bandera, and A. Aguilar, “Fuzzy rule-based reinforcement learning for load balancing techniques in enterprise LTE femtocells,” IEEE Trans. Veh. Technol., vol. 62, no. 5, pp. 1962–1973, Jun. 2013.
-  K. T. Dinh and S. Kukli¨½ski, “Joint implementation of several LTE-SON functions,” in Proceedings of GC Wkshps, Atlanta, USA, Dec. 2013, pp. 953–957.
-  Z. Wang, Y. Xu, L. Li, H. Tian, and S. Cui, “Handover control in wireless systems via asynchronous multi-user deep reinforcement learning,” arXiv:1801.02077v2, May 2018, accessed on Jun. 30, 2018.
-  C. Yu et al., “Modeling user activity patterns for next-place prediction,” IEEE Syst. J., vol. 11, no. 2, pp. 1060–1071, Jun. 2017.
-  D. Castro-Hernandez and R. Paranjape, “Classification of user trajectories in LTE HetNets using unsupervised shapelets and multiresolution wavelet decomposition,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 7934–7946, Sep. 2017.
-  N. Sinclair, D. Harle, I. A. Glover, J. Irvine, and R. C. Atkinson, “An advanced SOM algorithm applied to handover management within LTE,” IEEE Trans. Veh. Technol., vol. 62, no. 5, pp. 1883–1894, Jun. 2013.
-  D. Li, B. Zhang, Z. Yao, and C. Li, “A feature scaling based k-nearest neighbor algorithm for indoor positioning system,” in Proceedings of GLOBECOM, Austin, USA, Dec. 2014, pp. 436–441.
-  J. Niu, B. Wang, L. Shu, T. Q. Duong, and Y. Chen, “ZIL: An energy-efficient indoor localization system using ZigBee radio to detect WiFi fingerprints,” IEEE J. Sel. Areas Commun., vol. 33, no. 7, pp. 1431–1442, Jul. 2015.
-  Y. Xu, M. Zhou, W. Meng, and L. Ma, “Optimal KNN positioning algorithm via theoretical accuracy criterion in WLAN indoor environment,” in Proceedings of GLOBECOM, Miami, USA, Dec. 2010, pp. 1–5.
-  Z. Wu et al., “A fast and resource efficient method for indoor positioning using received signal strength,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9747–9758, Dec. 2016.
-  T. Van Nguyen, Y. Jeong, H. Shin, and M. Z. Win, “Machine learning for wideband localization,” IEEE J. Sel. Areas Commun., vol. 33, no. 7, pp. 1357–1380, Jul. 2015.
-  H. Wymeersch, S. Marano, W. M. Gifford, and M. Z. Win, “A machine learning approach to ranging error mitigation for UWB localization,” IEEE Trans. Commun., vol. 60, no. 6, pp. 1719–1728, Jun. 2012.
-  Z. Qiu, H. Zou, H. Jiang, L. Xie, and Y. Hong, “Consensus-based parallel extreme learning machine for indoor localization,” in Proceedings of GLOBECOM, Washington DC, USA, Dec. 2016, pp. 1–6.
-  X. Ye, X. Yin, X. Cai, A. P¨¦rez Yuste, and H. Xu, “Neural-network-assisted UE localization using radio-channel fingerprints in LTE networks,” IEEE Access, vol. 5, pp. 12071–12087, Jun. 2017.
-  H. Dai, W. h. Ying, and J. Xu, “Multi-layer neural network for received signal strength-based indoor localisation,” IET Commun., vol. 10, no. 6, pp. 717–723, Apr. 2016.
-  Y. Mo, Z. Zhang, W. Meng, and G. Agha, “Space division and dimensional reduction methods for indoor positioning system,” in Proceedings of ICC, London, UK, Jun. 2015, pp. 3263–3268.
-  J. Yoo, K. H. Johansson, and H. Jin Kim, “Indoor localization without a prior map by trajectory learning from crowdsourced measurements,” IEEE Trans. Instrum. Meas., vol. 66, no. 11, pp. 2825–2835, Nov. 2017.
-  X. Wang, L. Gao, and S. Mao, “CSI phase fingerprinting for indoor localization with a deep learning approach,” IEEE Internet Things J., vol. 3, no. 6, pp. 1113–1123, Dec. 2016.
-  X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based fingerprinting for indoor localization: A deep learning approach,” IEEE Trans. Veh. Technol., vol. 66, no. 1, pp. 763–776, Jan. 2017.
-  X. Wang, L. Gao, and S. Mao, “BiLoc: Bi-modal deep learning for indoor localization with commodity 5GHz WiFi,” IEEE Access, vol. 5, pp. 4209–4220, Mar. 2017.
-  J. Xiao, K. Wu, Y. Yi, and L. M. Ni, “FIFS: Fine-grained indoor fingerprinting system,” in Proceedings of IEEE ICCCN, Munich, Germany, Jul. 2012, pp. 1–7.
-  Z. Yan, M. Peng, and C. Wang, “Economical energy efficiency: An advanced performance metric for 5G systems,” IEEE Wireless Commun., vol. 24, no. 1, pp. 32–37, Feb. 2017.
-  H. Xiang, W. Zhou, M. Daneshmand, and M. Peng, “Network slicing in fog radio access networks: Issues and challenges,” IEEE Commun. Mag., vol. 55, no. 12, pp. 110–116, Dec. 2017.
-  Z. Zhao et al., “Deep reinforcement learning for network slicing,” arXiv:1805.06591v1, May 2018, accessed on Jun. 15, 2018.
-  X. Chen et al., “Multi-tenant cross-slice resource orchestration: A deep reinforcement learning approach,” arXiv:1807.09350v1, Jul. 2018, accessed on Jul. 30, 2018.