SelfOrganizing mmWave Networks : A Power Allocation Scheme Based on Machine Learning
Abstract
Millimeterwave (mmWave) communication is anticipated to provide significant throughout gains in urban scenarios. To this end, network densification is a necessity to meet the high traffic volume generated by smart phones, tablets, and sensory devices while overcoming large pathloss and high blockages at mmWaves frequencies. These denser networks are created with users deploying small mmWave base stations (BSs) in a plugandplay fashion. Although, this deployment method provides the required density, the amorphous deployment of BSs needs distributed management. To address this difficulty, we propose a selforganizing method to allocate power to mmWave BSs in an ultra dense network. The proposed method consists of two parts: clustering using fast local clustering and power allocation via Qlearning. The important features of the proposed method are its scalability and selforganizing capabilities, which are both important features of 5G. Our simulations demonstrate that the introduced method, provides required quality of service (QoS) for all the users independent of the size of the network.
I Introduction
Millimeterwave (mmWave) communication is one of the main technologies of the next generation of cellular networks (5G). The large bandwidth at mmWave frequency has the potential to enhance network throughput by tenfolds [1]. However, large path loss and shadowing limit the performance of mmWave systems and need to be dealt with. One approach to overcome this problem is based on increasing the density of access points [2, 3]. However, as the number of access points increases, the complexity of network management increases. Keeping this in mind, one of the features of future mmWave base stations (BSs) is selfdeployment by users. In other words access points can be deployed in a plugandplay fashion, and the network architecture may change frequently. Considering the above points, 5G needs selforganizing methods to configure, adapt, or heal itself when necessary. In this paper, a selforganizing algorithm is proposed to maximize the sum capacity in a dense mmWave network while providing users with their required quality of service (QoS). The algorithm consists of clustering, based on fast local clustering (FLOC), and distributed power allocation, via Qlearning. Scalability and fast convergence of FLOC, adaptability and distributed nature of Qlearning, makes their combination a suitable tool to achieve selforganization in a dense network.
Ii System Model
The System model considers a dense outdoor urban scenario as an important example of 5G, i.e., we consider the downlink of densely deployed mmWave BSs. To this end let us consider mmWave BSs that are distributed based on the homogeneous spatial Poisson point process (SPPP) with density [4]. Each BS is associated to one user. BSs share a single frequency resource block (FRB) to support their associated users. We assume a time invariant channel model, i.e. slow fading. The channel vector between the BS and user , can be written as follows
(1) 
where and denote the path loss and the path gain between the BS and user . The path loss between the BS and its associated user , , follows the free space propagation based on Frii’s law [1]. Here, we consider that the majority of interferers have nonlineofsight (NLOS) paths[5]. Hence, the path loss () can be written as [1]
(2) 
where and are factors used to achieve best fit to channel measurements, is the distance between the BS and the user , denotes the logarithmic shadowing factor, where , and denotes the lognormal shadowing variance.
The received signal in the downlink at the user includes the desired signal from its associated BS (BS ), interference from neighboring BSs, and also thermal noise. Hence, the signaltointerferencenoiseratio (SINR) at the user is given by
(3) 
where denotes the power transmitted by the BS, is the set of interfering BSs, and denotes the variance of the additive white Gaussian noise. Accordingly, the normalized capacity at the user is given by
(4) 
Iii Problem Formulation
The goal of the optimization problem is to find the best power distribution between mmWave BSs () in order to maximize the sum capacity of the network, while supporting all users with their required QoS. The optimization problem () can be formulated as
(5a)  
subject to  (5b)  
(5c) 
Here, the objective (5a) is to maximize the sum capacity of the network while providing all users with their required QoS in (5c). The first constraint, (5b), refers to the power limitation of every BS. The term in (5c) refers to the minimum required SINR for the user.
Eq. (5a) contains the interference term in the denominator of SINR term. In a dense network the interference term cannot be ignored [6]. Due to the presence of the interference term, the objective function (5a) is a nonconcave function [7].
The solution to should have certain features. First, it should be distributed due to no central authority in this network. Second, the range of mmWave BSs is limited, so each user will receive interference from the BSs in its neighborhood. Therefore, the solution should consider local clustering to reduce the computation overhead. Third feature is selfhealing. The number of BSs in the network changes sporadically, which means the solution should be adaptive to new possible architectures. Considering the above, in this paper, we propose a method which contains two parts : a fast local clustering method to locally cluster the BSs, and in each cluster, BSs will choose their transmitting power based on Qlearning [8]. Qlearning is modelfree (adaptable) and gives the BSs the ability to learn from their environment by interacting with it (selforganization).
Iv Cluster Based Distributed Power Allocation Using QLearning (CDPQ)
In our proposed method, mmWave BSs are considered as the agents of Qlearning, so the terms agent and mmWave BS are used interchangeably. CDPQ is a distributed method in which multiple agents (mmWave BSs) find a suboptimal policy (power allocation) to maximize the network capacity. CDPQ consists of two parts: (1) clustering, and (2) power allocation. Clustering is based on a local clustering method, and power allocation is based on Qlearning. In the following each part is detailed.
Iva MmWave BSs Clustering
Since mmWave signals suffer from high pathloss and shadowing, only neighboring BSs that are close in distance interfere with each other. Consequently, we propose to use a clustering mechanism to divide BSs into clusters in which the interference of one cluster is negligible on other clusters’ users.
In this paper, we propose to use Fast local clustering (FLOC) [9] to divide mmWave BSs into clusters. FLOC is a distributed messagepassing clustering method with complexity, which guarantees scalability, and produces nonoverlapping clusters. Another feature of FLOC is local selfhealing, which means reclustering, due to addition of a new node or removing a node, does not propagate through all clusters. In order to apply FLOC in a mmWave network, the following concepts are defined:

Cluster head (CH): The mmWave BS that is chosen as the head of the cluster. In our algorithm, there is no priority between a cluster head and other members of the cluster.

Inbound (IB), and outband (OB) node: In FLOC, a node is inbound if it is a unit distance from a CH. A unit distance is a set value, which in this case is the range of mmWave links, i.e.,  [1]. Accordingly, we define inbound as , which is an indication of strong interference, and outband as , which indicates the edge of the cluster around a CH. Finally, if a node is in outband distance of a cluster , and not in an inbound distance of any other clusters, then node will join the cluster as an OB node.
IvB Distributed Power Allocation Using QLearning
The output of Qlearning is a decision policy (power allocation) which is represented as a function called Qfunction. Here, the Qfunction of agent is represented as a table called a Qtable (). The columns of a Qtable are the actions (), and the rows are the state () of the agent .
In multiagent Qlearning, agents can act independently or cooperatively. In the independent learning, each agent interacts with the environment without communicating with other agents. In fact, it considers the other agents as part of the environment. Independent learning has shown good performance in many applications [10]. In independent learning, since the environment is not stationary, oscillation and longer convergence time might happen for the agents, but due to no communication overhead between agents compared to cooperative learning, we choose independent learning. Motivated by this fact, the agents will select their actions according to [11]
(6) 
in which, subscript denotes time step of Qlearning. The CDPQ algorithm is presented in Algorithm 1.
In the following the actions, states, and the reward function of the proposed Qlearning method are defined.
Actions
The set of actions (powers) is defined as , which uniformly covers the range between minimum () and maximum () power.
States
We define equally spaced concentric circles around the cluster head (CH) of each cluster. These circles, define rings with units of spacing, around the CH. The state of the agent at time step is defined as which shows the ring number that the agent belongs to. Considering the definition of the Qtable and the states at the beginning of this section, if the agents’ location is fixed, each agent will choose just one row of its Qtable to search for the best action decision.
Reward
is the immediate reward incurred due to selection of the action at state by the agent at time step . The constraint in (5c), can be represented as: for . is the normalized capacity of agent at time step . Based on this, the normalized proposed reward function for the agent at time step is defined as
(7) 
The rationale behind the proposed reward function is as follows

The term (a) normalizes the value of reward function.

The objective of the optimization problem is to maximize the capacity of the network, so the term (b) results in a higher reward for higher capacity for an agent.

To satisfy the QoS constraint for agent , capacity deviation of its associated user from the required QoS, term (c), should result in a lower reward.

There is a maximum reward () for an agent to provide fairness between the agents which is shown in Fig. 1.

The proposed reward function is a first order function of , which reduces each iteration’s complexity.
V Simulation Results
In this section the simulation setup is detailed and then the results of the simulations are presented.
Va Simulation Setup
A dense mmWave BS network, with approximately BSs in a area is considered. The BSs are distributed based on SPPP and operate independently in the network. Each BS, supports one user equipment (UE), which is located in a radius of m around the BS. The QoS for a user is defined as the required SINR to support the user’s service. The value of is considered for all the users.
To perform Qlearning, the learning rate is considered as , the discount factor as , , m, and . The maximum number of iterations is set to . The remaining parameters of the simulation are represented in Table I.
Param.  Value  Param.  Value 

f  28 GHz  10 dBm  
8.7 dB  35 dBm  
72.0  2.92  
120 dBm 
VB Clustering Results
The implementation of clustering algorithm, is an event driven, messagepassing distributed program in C++. Every BS is simulated as an independent thread, and is added to the network randomly in . The clustering algorithm converges in less than for the assumed value for the . The resulted clusters in two different distribution of BSs are shown in different colors in Fig. 3, and 3. Each cluster head (CH) is marked with a filled color.
VC Power Allocation Results
According to [2, 1], the coverage range of millimeter communication is in the range of m, which means the maximum coverage of for each mmWave BS. Considering the interferencelimited assumption and the value of , a cluster might have mmWave BSs. Hence, the CDPQ algorithm results in clusters that include 2 to 14 BSs.
The results of power allocation using the proposed reward function are compared to the exponential reward function proposed in [12], which are presented as EXPQ in the simulations. For all possible cluster sizes, power allocation using the proposed reward function is simulated, and the normalized capacity of all BSs in the clusters are plotted in Fig. 5. The same simulations for EXPQ are presented in Fig. 5.
As it is shown in these figures, while both reward functions satisfy all members for all sizes of clusters with their required QoS, the normalized capacity of users in the CDPQ are close to each other, while in EXPQ the normalized capacity of users are much diverse. The diversity of normalized capacity values in EXPQ effects the fairness index. The fairness index in each cluster is measured using Jain’s fairness index [13] and is shown in Fig. 7. In Fig. 7, the CDPQ maintains fairness for all sizes of clusters, while EXPQ fails to support users with fairness in large cluster sizes. On the other hand, total capacity of the clusters are shown in Fig. 7 with respect to the cluster size. According to Fig. 7, the CDPQ provides higher capacity than the EXPQ for all sizes of clusters.
Vi Conclusion
In this paper, a selforganized distributed power allocation algorithm is presented. The proposed algorithm reduces the optimization complexity by using a distributed clustering method, and provides adaptability in power allocation by using Qlearning. The proposed reward function, satisfies required QoS for the users in all sizes of the resulted clusters, and outperforms the exponential based reward function.
References
 S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeterwave cellular wireless networks: Potentials and challenges,” Proceedings of the IEEE, vol. 102, no. 3, pp. 366–385, March 2014.
 R. Baldemair, T. Irnich, K. Balachandran, E. Dahlman, G. Mildh, Y. SelÃ©n, S. Parkvall, M. Meyer, and A. Osseiran, “Ultradense networks in millimeterwave frequencies,” IEEE Commun. Mag., vol. 53, no. 1, pp. 202–208, January 2015.
 T. Bai and R. W. Heath, “Coverage in dense millimeter wave cellular networks,” in Asilomar Conference on Signals, Systems and Computers, Nov 2013, pp. 2062–2066.
 D. P. Kroese and Z. Botev, “Spatial process generation,” Aug 2013.
 M. Rebato, M. Mezzavilla, S. Rangan, F. Boccardi, and M. Zorzi, “Understanding noise and interference regimes in 5G millimeterwave cellular networks,” in 22th European Wireless Conference, May 2016, pp. 1–5.
 S. Niknam and B. Natarajan, “On the Regimes in Millimeter wave Networks: Noiselimited or Interferencelimited?” CoRR, Apr. 2018. [Online]. Available: https://arxiv.org/abs/1804.03618
 Z.Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE J. Select. Areas Commun., vol. 24, no. 8, pp. 1426–1438, Aug 2006.
 C. J. C. H. Watkins and P. Dayan, “Qlearning,” Machine Learning, vol. 8, no. 3, pp. 279–292, 1992. [Online]. Available: http://dx.doi.org/10.1007/BF00992698
 M. Demirbas, A. Arora, V. Mittal, and V. Kulathumani, “A faultlocal selfstabilizing clustering service for wireless ad hoc networks,” IEEE Trans. Parallel Distrib. Syst., vol. 17, no. 9, pp. 912–922, Sept 2006.
 L. Panait and S. Luke, “Cooperative multiagent learning: The state of the art,” Autonomous Agents and MultiAgent Systems, vol. 11, no. 3, pp. 387–434, Nov 2005. [Online]. Available: https://doi.org/10.1007/s1045800526312
 R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998.
 H. Saad, A. Mohamed, and T. ElBatt, “Distributed cooperative Qlearning for power allocation in cognitive femtocell networks,” in Proc. IEEE Veh. Technol. Conf., Sept 2012, pp. 1–5.
 R. Jain, D. Chiu, and W. Hawe, “A quantitative measure of fairness and discrimination for resource allocation in shared computer systems,” CoRR, 1998. [Online]. Available: http://arxiv.org/abs/cs.NI/9809099