Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design

Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design

Xiao Liu,  Yuanwei Liu, 
and Yue Chen, 
X. Liu, Y. Liu, and Y. Chen are with Queen Mary University of London, London E1 4NS, UK. (email: x.liu@qmul.ac.uk; yuanwei.liu@qmul.ac.uk; yue.chen@qmul.ac.uk). Part of this work was presented in the IEEE Global Communication Conference Workshops 2018 [1].
Abstract

A novel framework is proposed for quality of experience (QoE)-driven deployment and dynamic movement of multiple unmanned aerial vehicles (UAVs). The problem of joint non-convex three-dimensional (3D) deployment and dynamic movement of the UAVs is formulated for maximizing the sum mean opinion score (MOS) of ground users, which is proved to be NP-hard. In the aim of solving this pertinent problem, a three-step approach is proposed for attaining 3D deployment and dynamic movement of multiple UAVs. Firstly, genetic algorithm based K-means (GAK-means) algorithm is utilized for obtaining the cell partition of the users. Secondly, Q-learning based deployment algorithm is proposed, in which each UAV acts as an agent, making their own decision for attaining 3D position by learning from trial and mistake. In contrast to conventional genetic algorithm based learning algorithms, the proposed algorithm is capable of training the direction selection strategy offline. Thirdly, Q-learning based movement algorithm is proposed in the scenario that the users are roaming. The proposed algorithm is capable of converging to an optimal state. Numerical results reveal that the proposed algorithms show a fast convergence rate after a small number of iterations. Additionally, the proposed Q-learning based deployment algorithm outperforms K-means algorithms and Iterative-GAKmean (IGK) algorithms with a low complexity.

Dynamic movement, Q-learning, quality of experience (QoE), three-dimensional deployment, unmanned aerial vehicle (UAV)

I Introduction

The unprecedented demands for high quality of wireless services impose enormous challenges for the conventional cellular communication networks. UAV-assisted communications in which the UAVs are employed as aerial base stations for assisting the existing terrestrial communication infrastructure, has been viewed as a promising candidate technique for tackling with problems like quick-response wireless service recovery after unexpected infrastructure damage or natural disasters, as well as cellular network offloading in hot spots such as sport stadiums, outdoor events [2, 3, 4]. Additionally, UAVs can also be employed in Internet of Things (IoT) networks which requires massive connectivity [5, 6]. In IoT networks, UAVs are capable of collecting data from ground IoT devices in a given geographical area, in which constructing a complete cellular infrastructure is unaffordable [7, 8].

Given the advantages of agility and mobility, low cost, their beneficial line-of-sight (LoS) propagation, the application of UAV-assisted communication networks is highly desired due to the following advantages:

  • Agility feature: UAV can be deployed quickly and easily to offer high quality services in quick response scenarios. Additionally, the cost of deploying UAV as aerial base station is much more affordable than terrestrial base stations [3].

  • LoS feature: As is deployed in the three-dimensional space, UAV has the feature of LoS connections towards ground users which can be significantly less effected by shadowing and fading [9].

  • Mobility feature: UAV is capable of adjusting their 3D positions dynamically and swiftly according to users’ real-time locations, which enables it to provide flexible and on-demand service for users [8].

Sparked by the aforementioned characteristics of UAV-assisted communication networks, American wireless carriers AT&T [10] have conducted the research of employing UAVs to develop airborne LTE services. UAVs have been utilized by Huawei corporation and China mobile corporation as aerial base stations in emergency communications, and their achievements had been exhibited in 2018 Mobile World Congress (MWC) in Barcelona. Additionally, UAV-based aerial base station which equipped with the fifth generation (5G) networks has experienced its first experiment on April, 2018 by Huawei cooperation in Shenzhen. With the number and requirement of UAVs increasing more than tenfold year by year, it faces with enormous challenges to design UAV-assisted wireless networks.

I-A Related Works

In order to fully reap the benefits of the deployment of UAVs for communication purposes, some core technical challenges need to be faced with firstly. For instance, the 3D placement of multiple UAVs, the interference elimination [11, 12], energy efficiency optimization [13, 14, 15], trajectory/movement design [16], air-ground channel modeling [17], power/bandwidth allocation [8] and the compatibility between aerial wireless networks and the existing cellular networks. Among all these challenges, the deployment of UAVs is the fundamental and foremost one. Some groups have made great contributions on extensively solving the aforementioned problems.

I-A1 Deployment of Static UAVs

Early research contributions had studied the deployment of static UAVs for both single-UAV scenarios and multi-UAV scenarios. The authors in [2] proposed a solution for the deployment of single aerial base station in the scenario that ground base stations were destroyed after natural disaster. In the letter [18], the authors proposed an analytical solution for optimizing the UAV’s altitude for providing maximum radio coverage for the users. In [19], the authors presented an optimal deployment algorithm for UAV-BS in order to provide wireless service to the maximal number of connected ground users. At the meantime, the transmit power of UAV is also minimized. However, these research contributions mainly focus on the problem of a single UAV. Cooperative deployment of multiple-UAV needs to be designed for providing reliable service for users. In [20], the authors proposed a novelty framework for maximizing the throughput of users while considering the fairness between the users. Three dimensional placement of the UAVs was obtained with the aid of circle packing theory. Both the total coverage radium and operating lifetime of the UAVs were maximized in [21]. However, high mobility and dynamic movement design of multiple UAVs is not considered in the aforementioned researches.

I-A2 Deployment of mobile UAVs

It is intuitionist that UAV with movements is capable of obtaining better performance than a static UAV in the scenario that users are mobile. When faced with the challenge of trajectory design, current researches mainly consider the scenario that ground users are static. The trajectory of a single UAV was designed in [22] for serving each users via TDMA, even though orthogonal multiple access [23, 24] had been proposed to improve the system capacity for 5G networks. Finally, a cyclical trajectory was designed for providing more reliable services for users. Additionally, a significant throughput gain over the case of a single static UAV was obtained. In [25], the authors presented a low-complexity solution for the trajectory design of a single UAV for maximizing the minimum average throughput of all users. However, only a single UAV was considered in this research, and the trajectory of UAV was designed as simple circular trajectory. Research on the dynamic movement of multiple-UAV is still a blank domain.

I-A3 Reinforcement learning in UAV-assisted communications

Machine learning applications have gained remarkable attentions in wireless communication networks during recent decades, and reinforcement learning algorithm has been proved to possess the capacity of tackling problems in UAV-assisted wireless networks [26]. An interference-aware path planning scheme based on deep reinforcement learning was proposed for a network of cellular-connected UAVs in [27], better wireless latency and transmit rate were achieved in the proposed scheme. In  [28], the authors proposed a UAV relay scheme based on both reinforcement learning and deep reinforcement learning. With the aid of these two algorithms, energy consumption of the UAV is minimized while a better bit error rate (BER) performance is received. However, the multiple-UAV scenario was not considered in this article and the movement of ground users was also neglected for the purpose of simplifying the system model.

I-B Our New Contributions

Again, deploying UAVs as aerial base stations has seen increasing applications both in academia and industry [29]. However, currently, few research contributions consider the 3D deployment of multi-UAV. The real-time movement of ground users is also overlooked by current articles, based on which, the mobility of UAVs needs to be considered to attain a better quality of service. In contrast to current articles [22, 27] which design the trajectory of UAV with fixed source and destination, the movement of UAVs in this article has no fixed destination due to the mobility of users. Additionally, the research gap still exists on investigating the QoE-driven UAVs-assisted communications, which is also worth studying for further improving the wireless services. The motivations of this work are concluded as follows:

  • Mobility of UAVs has not been considered based on the movement of users. Current research contributions exist on the two-dimensional placement of multi-UAV or the mobility of single-UAV while ground users remain static. The UAVs need to travel in the scenario that ground users are roaming continuously to maintain good connections.

  • QoE has not been considered for UAVs-assisted communications. Most existing research contributions address this issue without considering the specific requirements of different ground users. QoE is invoked for demonstrating the users’ satisfaction, and it is supposed to be considered in UAV-assisted wireless networks.

  • Three-dimensional dynamic movement of UAVs has not been considered in the UAV-assisted wireless networks. Most existing researches only focus on the 2D movement design of UAVs. UAVs are capable of adjusting their 3D positions dynamically and swiftly for maintain a high performance. In this case, 3D dynamic movement design of UAVs needs to be investigated.

Therefore, the problem of 3D deployment and dynamic movement design of multiple UAVs is formulated for improving the users’ QoE instead of throughput. Our new contributions are concluded as follows:

  1. We propose a novel QoE-driven multi-UAV assisted communication framework, in which multiple UAVs are deployed in a 3D space to serve mobile users. Mean opinion score (MOS) is adopted for evaluating the satisfaction of users. Meanwhile, we formulate the sum MOS maximization problem by optimizing the UAVs’ placement and dynamic movement.

  2. We develop a three-step approach for solving the formulated problem. i) We invoke GAK-means algorithm for obtaining initial cell partition. ii) We develop a Q-learning based deployment algorithm for attaining 3D placement of UAVs when users are static at the initial time. iii) We develop a Q-learning based movement algorithm for designing 3D dynamic movement of the UAVs.

  3. We conceive a Q-learning based solution to solve the NP-hard 3D deployment and movement problem of the UAVs. In contrast to conventional genetic based learning algorithms, the proposed algorithm is capable of training the policy by learning from the trial and mistake.

  4. We demonstrate that the proposed algorithms show a fast convergence. Additionally, the proposed Q-learning based deployment algorithm outperforms K-means algorithms and IGK algorithms with a low complexity.

I-C Organization

In Section II, the problem formulation for the QoE-based 3D deployment and dynamic movement of multi-UAV is presented. In Section III, efficient cell partition and UAVs’ deployment for QoE maximization with static users are proposed. In Section IV, Q-learning algorithm is utilized for attaining the UAVs’ dynamic movement when users are roaming. The numerical results for both deployment and dynamic movement of multi-UAV are illustrated in Section V. The conclusions is presented in Section VI, which is followed by some essential proofs. Additionally, The list of notations is demonstrated in Table I.

TABLE I: List of Notations
Notations Description Notations Description
U Ground users’ number N Cluster number and UAVs’ number
{x_{{k_{n}}}},{y_{{k_{n}}}} Ground users’ coordinates {x_{n}},{y_{n}} UAVs’ coordinate
f_{c} Carrier frequency h_{n} UAVs’ altitudes
{P_{\max}} UAV’s maximum transmit power {g_{{k_{n}}}} channel power gain from UAVs to users
N_{0} Noise power spectral B Total bandwidth
{\mu_{LoS}},{\mu_{NLoS}} Additional path loss for LoS and NLoS {P_{LoS}},{P_{NLoS}} LoS and NLoS probability
{\Gamma_{{k_{n}}}} Receives SINR of ground users {I_{{k_{n}}}} Receives interference of ground users
{r_{{k_{n}}}} Instantaneous achievable rate {R_{sum}} Overall achievable sum rate
b_{1},b_{2} Environmental parameters (dense urban) \alpha Path loss exponent
{\text{MOS}}({k_{n}}) Overall achievable MOS a_{t} State in Q-learning model at time t
a_{t} Action in Q-learning model at time t r_{t} Reward in Q-learning model at time t

II System Model

II-A System Description

Consider the down-link transmission in UAV-assisted wireless networks, where multi-UAV are deployed as aerial base stations for supporting the mobile users in a particular area. As illustrated in Fig. 1, this particular area is divided into N clusters, then the users are denoted as K=\left\{{{K_{1}},\cdots{K_{N}}}\right\}, while {K_{n}} is the users partitioned into cluster n, n\in\mathbb{N}=\left\{{1,2,\cdots N}\right\}. The clustering scheme will be discussed in Section III. Each user belongs to only one cluster, thus {K_{n}}\cap{K_{{n^{\prime}}}}=\phi,{n^{\prime}}\neq n,\forall{n^{\prime}},n\in% \mathbb{N}, while {{\rm K}_{n}}=\left|{{K_{n}}}\right| is the total number of users partitioned to the cluster n. In any cluster n, n\in\mathbb{N}, a single UAV is dispatched. The UAVs are able to connect with the core network by satellite or a fully working terrestrial base station. At any time t, multiple users in the same cluster are served by the same UAV simultaneously by employing FDMA [30]. The coordinate of each user is denoted by {w_{{k_{n}}}}={[{x_{{k_{n}}}}(t),{y_{{k_{n}}}}(t)]^{T}}\in{\mathbb{R}^{2\times 1% }},{k_{n}}\in{K_{n}}, where {x_{{k_{n}}}}(t) and {y_{{k_{n}}}}(t) are the coordinates of user k_{n} at time t. The movement of ground users is discussed in section IV. The vertical trajectory is denoted by {h_{n}}(t)\in[{h_{\min}},{h_{\max}}],0\leq t\leq{T_{n}}. The horizontal coordinate of the UAV is denoted by {q_{n}}(t)={[{x_{n}}(t),{y_{n}}(t)]^{T}}\in{\mathbb{R}^{2\times 1}},with 0\leq t\leq{T_{n}}.

Furthermore, the distance from the UAV n to user k_{n} at time t can be expressed as

\displaystyle{{d_{{k_{n}}}}(t)=\sqrt{{h_{n}}^{2}(t)+{{[{x_{n}}(t)-{x_{{k_{n}}}% }(t)]}^{2}}+{{[{y_{n}}(t)-{y_{{k_{n}}}}(t)]}^{2}}}}. (1)
Fig. 1: QoE-driven deployment of multiple UAVs in wireless communications.

II-B Signal Model

As are deployed in the 3D space, the UAVs have a higher probability of LoS connections than terrestrial communication infrastructures. The LoS probability is given by [18]

\displaystyle{{P_{{\text{LoS}}}}({\theta_{{k_{n}}}})={b_{1}}{(\frac{{180}}{\pi% }{\theta_{{k_{n}}}}-\zeta)^{{b_{2}}}}}, (2)

where {\theta_{{k_{n}}}}(t)={\sin^{-1}}[\frac{{{h_{n}}(t)}}{{{d_{{k_{n}}}}(t)}}] is the elevation angle between UAV n and the ground user {k_{n}}. b_{1}, b_{2} and \zeta are constant values determined by environment. In the particular case, the altitude of the n-UAV {h_{n}}(t)\in[{h_{\min}},{h_{\max}}],0\leq t\leq{T_{n}} is supposed to be properly chosen in practice to balance between the probability of LoS channel and the path loss. Then, the Non-Line-of-Sight (NLoS) probability is given by {P_{{\text{NLoS}}}}=1-{P_{{\text{LoS}}}}.

Thus, the channel power gain from UAV n to user k_{n} at time t can be expressed as

\displaystyle{{g_{{k_{n}}}}(t)={K_{0}}^{-1}d_{{k_{n}}}^{-\alpha}(t){[{P_{{% \text{LoS}}}}{\mu_{{\text{LoS}}}}+{P_{{\text{NLoS}}}}{\mu_{{\text{NLoS}}}}]^{-% 1}}}, (3)

where {K_{0}}={\left({\frac{{4\pi{f_{c}}}}{c}}\right)^{2}}, \alpha is a constant denotes path loss exponent, {\mu_{LoS}} and {\mu_{NLoS}} are different attenuation factors considered for LoS and NLoS links, f_{c} is the carrier frequency, and finally c is the speed of light.

Denote that the total available bandwidth B is distributed to each user equally, thus the bandwidth of each user is expressed as {B_{{k_{n}}}}={B\mathord{\left/{\vphantom{B{{K_{n}}}}}\right.\kern-1.2pt}{{K_{% n}}}}. Also denote that the total transmit power is allocated to each users uniformly, namely {p_{{k_{n}}}}={{{P_{\max}}}\mathord{\left/{\vphantom{{{P_{\max}}}{{K_{n}}}}}% \right.\kern-1.2pt}{{K_{n}}}}, where {P_{\max}} is the maximum allowed transmit power of each UAV.

As the received interference from UAV to users is able to be mitigated by using the millimeter wave (mmWave) band and zero-forcing (ZF) beamforming [16, 10], the received SNR {\Gamma_{{k_{n}}}}(t) of ground user k_{n} connected to UAV n at time t can be expressed as

\displaystyle{{\Gamma_{{k_{n}}}}(t)=\frac{{{p_{{k_{n}}}}(t){g_{{k_{n}}}}(t)}}{% {{\sigma^{2}}}}}, (4)

where {\sigma^{2}}={B_{{k_{n}}}}N_{0} with N_{0} denoting the power spectral density of the additive white Gaussian noise (AWGN) at the receivers. Assume the SNR target which must be received by all ground users is \gamma, then {\Gamma_{{k_{n}}}}(t)\geq\gamma.

Lemma 1.

For the aim of guaranteeing that all the users are capable of connecting to the networks, we have a constraint for the UAVs’ transmit power, which can be expressed as

\displaystyle{{P_{\max}}\geq\gamma{\sigma^{2}}{K_{0}}{\mu_{LoS}}\cdot\max\left% \{{{h_{1}},{h_{2}},\cdots{h_{n}}}\right\}}. (5)
Proof.

See Appendix A . ∎

Lemma 1 shows the constraint for transmit power of UAVs. When choosing the type of UAVs in practical application, the lower bound of transmit power is supposed to be no less than equation (5). In this case, the received SNR of every ground users can be larger than SNR threshold.

The user k_{n}’s transmit rate {r_{{k_{n}}}}(t) at time t can be expressed as

\displaystyle{{r_{{k_{n}}}}(t)={B_{{k_{n}}}}{\log_{2}}[1+\frac{{{p_{{k_{n}}}}(% t){g_{{k_{n}}}}(t)}}{{{\sigma^{2}}}}]}. (6)

Thus, the total transmit rate of all users at any time t is given by

\displaystyle{{R_{{\text{sum}}}}=\sum\limits_{n=1}^{N}{\sum\limits_{{k_{n}}=1}% ^{{{\rm K}_{n}}}{{r_{{k_{n}}}}(t)}}}. (7)
Proposition 1.

The altitude constraint for the UAV n has to satisfy

\displaystyle{{d_{{k_{n}}}}(t)\sin\left[{\frac{\pi}{{180}}(\zeta+{e^{M}})}% \right]\leq{h_{n}}(t)\leq{\left({\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}% }{\mu_{LoS}}}}}\right)^{{1\mathord{\left/{\vphantom{1\alpha}}\right.\kern-1.2% pt}\alpha}}}}, (8)

where M={{\left[{\ln({{\frac{S}{{({\mu_{LoS}}-{\mu_{NLoS}})}}-\frac{{{\mu_{NLoS}}}}{% {{\mu_{LoS}}-{\mu_{NLoS}}}}}\mathord{\left/{\vphantom{{\frac{S}{{({\mu_{LoS}}-% {\mu_{NLoS}})}}-\frac{{{\mu_{NLoS}}}}{{{\mu_{LoS}}-{\mu_{NLoS}}}}}{{b_{1}}}}}% \right.\kern-1.2pt}{{b_{1}}}})}\right]}\mathord{\left/{\vphantom{{\left[{\ln({% {\frac{S}{{({\mu_{LoS}}-{\mu_{NLoS}})}}-\frac{{{\mu_{NLoS}}}}{{{\mu_{LoS}}-{% \mu_{NLoS}}}}}\mathord{\left/{\vphantom{{\frac{S}{{({\mu_{LoS}}-{\mu_{NLoS}})}% }-\frac{{{\mu_{NLoS}}}}{{{\mu_{LoS}}-{\mu_{NLoS}}}}}{{b_{1}}}}}\right.\kern-1.% 2pt}{{b_{1}}}})}\right]}{{b_{2}}}}}\right.\kern-1.2pt}{{b_{2}}}} and S=\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}}d_{{k_{n}}}^{\alpha}(t)}}. Other parameters are introduced in Section II.

Proof.

See Appendix B . ∎

Proposition 1 provides the altitude constraint for the UAVs. From Proposition, it is observed that the lower bound of the altitude is the function of distance {d_{{k_{n}}}}(t). At the same time, the upper bound of the altitude is the function of maximum transmit power {P_{\max}}. In this case, as the distance and transmit power varies, the altitude of the corresponding UAVs needs to be adjusted for proving reliable services to users.

II-C Quality-of-Experience Model

As illustrated in Fig.1, the requirements of transmit rate for different users varies. In this case, QoE model is essential to be considered in UAV-assisted communication networks.

Definition 1.

QoE is a metric that depends on a person’s preferences towards a particular object or service related to expectations, experiences, behaviour, cognitive abilities, object’s attributes and the environment [31].

MOS is invoked as the measure of the user’s QoE [32, 33], and the MOS model is defined as follow

\displaystyle{{\text{MOS}}({k_{n}})={\xi_{1}}{\text{MO}}{{\text{S}}_{delay}}({% k_{n}})+{\xi_{2}}{\text{MO}}{{\text{S}}_{rate}}({k_{n}})}, (9)

where {\xi_{1}} and {\xi_{2}} are coefficients and {\xi_{1}}+{\xi_{2}}=1.

Definition 2.

We consider five hypotheses in relation to the five alternatives (on the ordinal scale) for each QoE state, the measurement is the MOS received by users. These are: excellent (4.5), very good (3.5\sim 4.5), good (2\sim 3.5), fair (1\sim 2) and poor (1).

Derive from equation (9), the MOS model is simplified as follows [34]

\displaystyle{{\text{MOS}}({k_{n}})=-{C_{1}}\ln[d({r_{{k_{n}}}})]+{C_{2}}}, (10)

where d\left({{r_{{k_{n}}}}}\right) is the delay time related to the transmit rate, while {\text{MOS(}}{k_{n}}{\text{)}} denotes the MOS score ranging from 1 to 4.5. Finally, C_{1} and C_{2} are constants determined by analyzing the experimental results of the web browsing applications, which are set to be 1.120 and 4.6746, respectively [34].

The delay time in (10) is given by [34]

\displaystyle{d({r_{{k_{n}}}})=3{\text{RTT}}+\frac{{{\text{FS}}}}{{{r_{{k_{n}}% }}}}+L\left({\frac{{{\text{MSS}}}}{{{r_{{k_{n}}}}}}}\right)+{\text{RTT}}{\kern 1% .0pt}-\frac{{2{\text{MSS}}\left({{2^{L}}-1}\right)}}{{{r_{{k_{n}}}}}}}, (11)

where RTT[s] is the round trip time, while FS[bit] is the web page size and MSS[bit] is the maximum segment size. The parameter L=\left\{{{L_{1}},{L_{2}}}\right\} represents the number of slow start cycles with idle periods, which are defined as {L_{1}}={\log_{2}}\left({\frac{{{r_{{k_{n}}}}{\text{RTT}}}}{{{\text{MSS}}}}+1}% \right)-1 and {L_{2}}={\log_{2}}\left({\frac{{{\text{FS}}}}{{{\text{2MSS}}}}+1}\right)-1.

Remark 1.

When the UAVs fly towards some users to acquire better channel environment for these users, they get farther away from other users. Finally, the MOS of the users which are farther away from the UAVs become lower than those who are closer to the UAVs.

Our approach can accommodate other MOS models without loss of generality. Meanwhile, the model invoked in this paper can also be extended to other applications in particular scenarios.

II-D Problem Formulation

Let Q=\left\{{{q_{n}}(t),0\leq t\leq{T_{n}}}\right\} and H=\left\{{{h_{n}}(t),0\leq t\leq{T_{n}}}\right\}. Again, we aim for optimizing the positions of the UAVs at each timeslots, i.e., \left\{{{x_{n}}(t),{y_{n}}(t),{h_{n}}(t)}\right\},{\kern 1.0pt}{\kern 1.0pt}{% \kern 1.0pt}n=1,2,\cdots N,{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}t=0,1,\cdots% {T_{n}}, for maximizing the MOS of users. Our optimization problem is then formulated as

\displaystyle\mathop{\mathop{\max}\limits_{C,Q,H}{\kern 1.0pt}{\kern 1.0pt}{% \kern 1.0pt}{\text{ MO}}{{\text{S}}_{{\text{total}}}}=\sum\limits_{n=1}^{N}{% \sum\limits_{{k_{n}}=1}^{{{\rm K}_{n}}}{{\text{MO}}{{\text{S}}_{k_{n}}}({r_{{k% _{n}}}})}}}, (12a)
\displaystyle{\text{s}}{\text{.t}}{\text{.}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{K_{n}}% \cap{K_{{n^{\prime}}}}=\phi,{n^{\prime}}\neq n,\forall n, (12b)
\displaystyle{h_{\min}}\leq{h_{n}}(t)\leq{h_{\max}},\forall t,\forall n, (12c)
\displaystyle{\Gamma_{{k_{n}}}}(t)\geq\gamma_{K_{n}},\forall t,\forall{k_{n}},\hfill (12d)
\displaystyle\sum\limits_{{k_{n}}=1}^{{{\rm K}_{n}}}{{p_{{k_{n}}}}(t)}\leq{P_{% \max}},\forall t,\forall{k_{n}}, (12e)
\displaystyle{p_{{k_{n}}}}(t)\geq 0,\forall{k_{n}},\forall t, (12f)

where K(n) is the users partitioned to the cluster n, {h_{n}}(t) is the altitude of UAV n, while {p_{{k_{n}}}}(t) is the transmit power form UAV n to ground user k_{n}. Furthermore, (12b) denotes that each user is partitioned into a single cluster; (12c) indicates the altitude constraint of the UAVs; (12d) formulates the minimized SINR target for each user to connected to the network; (12e) and (12f) qualifies the transmit power constraint of UAVs. Given the bandwidth and power allocation of the UAVs, this problem is then simplified into the problem of UAVs’ movement optimization. Since the movement of users affect the received transmit rate to satisfy their QoE requirement, the UAVs have to travel based on the real-time movement of users to maintain the QoE of users, the problem of maximizing the MOS of the UAVs in (12a) inherently incorporates the design of dynamic movement for the UAVs.

Theorem 1.

Problem (12a) is a non-convex problem since the objective function is non-convex over {x_{n}}(t)’s, {y_{n}}(t)’s and {h_{n}}(t)’s [13, 8].

Proof.

See Appendix C . ∎

From Theorem 1, it can be observed that the MOS of users is impacted by the UAVs’ altitude. This is because that both the distance and LoS probability between the UAVs and the users are related to the UAVs’ altitudes. Increasing the UAV’s altitude leads to a higher path loss while a higher LoS probability is obtained, the UAVs have to increase the transmit power for satisfying the users’ QoE requirements.

Remark 2.

The sum MOS depends on the transmit power, number and positions (both horizonal positions and altitudes) of the UAVs.

Fig. 2: The procedure for solving the problem of deploying and movement design of multiple UAVs.

Due to the combinatorial features of cell partition and optimal position searching, exhaustive search solution provides a straightforward method for obtaining the final results. We propose an efficient GAK-means based IGK algorithm for solving this problem. In addition, a low-complexity Q-learning based 3D deployment algorithm is also developed for the aim of avoiding the IGK algorithm’s huge complexity. Furthermore, when the initial position of UAVs is fixed, obtaining the dynamic movement is also nontrivial due to the non-convex property of problem (12) in terms of the sum MOS maximization. Q-learning algorithm is proved to be an efficient solution for tackling the problem which has exact transition formulation. This is the motivation of applying Q-learning algorithm to tackle with this problem. This scheme is discussed in the next section.

III Three-Dimensional deployment of the UAVs

We consider the scenario that the UAV n is hovering above users with an alterable altitude while the users are static. The bandwidth and the transmit power of each UAV are allocated uniformly to each users. Thus, the optimization problem is simplified to a region segmentation problem, which is formulated as

\displaystyle\mathop{\mathop{\max}\limits_{C{\text{,}}Q,H}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\text{ MO}}{{\text{S}}_{{\text{total}}}}=\sum\limits_{n=1}^% {N}{\sum\limits_{{k_{n}}=1}^{{{\rm K}_{n}}}{{\text{MO}}{{\text{S}}_{k_{n}}}({r% _{{k_{n}}}})}}}, (13a)
\displaystyle{\text{s}}{\text{.t}}{\text{.}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{K_{n}}% \cap{K_{{n^{\prime}}}}=\phi,{n^{\prime}}\neq n,\forall n, (13b)
\displaystyle{h_{\min}}\leq{h_{n}}\leq{h_{\max}},\forall n, (13c)
\displaystyle{\Gamma_{{k_{n}}}}\geq\gamma_{K_{n}},\forall{k_{n}} (13d)
\displaystyle\sum\limits_{{k_{n}}=1}^{{{\rm K}_{n}}}{{p_{{k_{n}}}}}\leq{P_{% \max}},\forall{k_{n}} (13e)
\displaystyle{p_{{k_{n}}}}\geq 0,\forall{k_{n}}. (13f)
Theorem 2.

Problem of (13a) is a NP-hard problem. More specifically, problem (13a) is NP-hard even only consider the users clustering.

Proof.

See Appendix D . ∎

III-A Initial Algorithm for Cell Partition of Ground Users

User clustering is the first step for obtaining the deployment of the UAVs. It’s the process of partitioning the users into different cluster. In each cluster, the users are linked with a UAV. In this model, the users need to be partitioned into different clusters. In each cluster, a single UAV is employed. K-means algorithm, also named Lloyed algorithm can solve the problem of clustering and obtaining the initial 3D position of the UAVs with a low complexity. This algorithm is capable of partitioning users into different clusters based on the policy of nearest neighbor barycenter, and recalculate the barycenters of each cluster [35]. Specifically, K-means algorithm is capable of obtaining the results that the squared error between the empirical mean of a cluster and the points is minimized [36]. By invoking K-means algorithm, the users are partitioned into N clusters.

However, K-means is a greedy algorithm that the result of this algorithm has a high probability of converging to a local minimum even when clusters are well partitioned.The reason is the sensitivity of K-means algorithm to the initial center [35], the performance of K-means algorithm is deterministic over runs. For the aim of supplying a gap of K-means algorithm, GAK-means algorithm based on Genetic algorithm will be invoked for obtaining global optimum clustering results.

III-B Q-learning Algorithm for 3D Deployment of UAVs

In this section, given the cell partition of the users, our goal is for obtaining the 3D locations of the UAVs to maximize the sum MOS. In the process of cell partition, GAK-means algorithm is capable of obtaining a group of locations of the UAVs which have the minimal Euclidean distance. As the MOS of a particular user is related to the distance between this user and the UAV, so GAK-means could be regarded as a low-complexity scheme to obtain deployment of the UAVs. However, the objective of GAK-means is obtaining minimal sum Euclidean distance, it can be noticed from (11) that the transmit rate is not only related to Euclidean distance between users and UAV, but also related to the probability of LoS. So GAK-means is not effective to solve the 3D deployment problem. Although Iterative-GAKmean (IGK) algorithm is able to obtain the optimal altitudes of the UAVs through iterative process, the complexity of IGK algorithm is high, the worst case computational complexity of this algorithms is O(N({U^{2}}+U+1)). In this case, a more effective algorithm is required to solve the 3D deployment problem.

In the Q-learning model, the UAVs act as agents, and the Q-learning model consists of four parts: states, actions, rewards and Q-value. The aim of Q-learning is for attaining a policy that maximizes the observed rewards over the interaction time of the agents. At each time slot t during iteration, each agent observes a state, {s_{t}}, from the state space, S. Accordingly each agent carries out an action, {a_{t}}, from the action space, A, selecting the directions based on policy, J. The decision policy J is determined by a Q-table, Q({s_{t}},{a_{t}}). The principle of policy at each time slot is to choose an action which makes model obtain maximum Q-value. Following the action, the state of each agent transitions to a new state {s_{t+1}} and the agent receives a reward, {r_{t}}, determined by the instantaneous sum MOS of ground users.

III-B1 State Representation

The individual agents are presented using a three-state UAV model defined as: \xi=(x_{{\text{UAV}}},y_{{\text{UAV}}},h_{{\text{UAV}}}), where (x_{{\text{UAV}}},y_{{\text{UAV}}}) is the horizonal position of UAV and h_{{\text{UAV}}} is the altitude of UAV. As the UAVs are deployed in a particular district, the state space can be denoted as {x_{{\rm{UAV}}}}:\left\{{0,1,\cdots{X_{d}}}\right\}, {y_{{\rm{UAV}}}}:\left\{{0,1,\cdots{Y_{d}}}\right\}, {h_{{\rm{UAV}}}}:\left\{{{h_{\min}},\cdots{h_{\max}}}\right\}, where {X_{d}} and {Y_{d}} are the maximum coordinate of this particular district.

For each trial, the initial state (position) of each UAV is determined randomly, the update rate of Q-table is determined by the initial position of the UAVs and the number of users and UAVs. The closer each UAV is placed with the optimal position, the faster update rate will be.

Algorithm 1 Q-learning algorithm for 3-D deployment of UAVs
0:    Q-learning structure, environment simulator.
  Cell partition of ground users based on K-means algorithm.
  Initialize Q(s,a) arbitrarily, deploy UAVs at random points.
  repeat
      for each step of episode:
      Choose action a according to policy derived from Q to obtain maximum Q;
      Carry out action a, observe reward r, update state {s^{\prime}};
      Q(s,a)\leftarrow Q(s,a)+\alpha[r+\beta{\max_{a^{\prime}}}Q(s^{\prime},a^{% \prime})-Q(s,a)];
      s\leftarrow{s^{\prime}};
      Update Q-table based on reward r to obtain maximum r;
  until s is terminal
  Evaluation results.

III-B2 Action Space

At each step, each agent (UAV) carries out an action {a_{t}}\in A, which includes choosing a direction for themselves, according to the current state, {s_{t}}\in S, based on the decision policy J. Each UAV can fly towards variety of directions which makes the problem non-trivial to solve. However, by assuming that the UAVs fly with simple coordinated turns, we can simplify the model down to 7 directions. We can obtain a tradeoff between the accuracy and model complexity when the directions are appropriately chosen.

III-B3 State Transition Model

The transition from {s_{t}} to {s_{t+1}} with reward {r_{t}} when action {a_{t}} is carried out can be characterized by the conditional transition probability p({s_{t+1}},{r_{t}}|{s_{t}},{a_{t}}). The goal of the Q-learning model is for maximizing the long term rewards expressed as follow

\displaystyle{{G_{t}}={\rm E}[\sum\limits_{n=0}^{\infty}{{\beta^{n}}{r_{t+n}}}% ]}, (14)

where \beta is the discount factor.

The agents, states,actions and rewards in Q-learning model are defined as follows:

  • Agent: UAV n, n\in\mathbb{N}=\left\{{1,2,\cdots N}\right\}.

  • State: The UAVs’ 3D positions (horizontal coordinates and altitudes) are modeled as states.

  • Actions: Actions for each agent are the set of coordinates to present flying directions of the UAVs. (1,0,0) denotes the right turn of the UAV; (-1,0,0) indicates the the left turn of the UAV; (0,1,0) represents the the forward of the UAV; (0,-1,0) implies the backward of the UAV; (0,0,1) means the ascent of the UAV; (0,0,-1) indicates the descending of the UAV; (0,0,0) represents the UAV will stay static.

  • Rewards: The reward function is formulated by the instantaneous MOS of the users. If action that the agent carries out at current time t is able to improve the sum MOS, then the UAV receives a positive reward. The agent receives a negative reward if otherwise. Therefore, the reward function can be express as

    \displaystyle{{r_{t}}=\left\{{\begin{array}[]{*{20}{c}}{{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}1,{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{if}}{\kern 1.0pt}{\kern 1.0pt}{\text{MO}% }{{\text{S}}_{{\text{new}}}}{\kern 1.0pt}>{\text{MO}}{{\text{S}}_{{\text{old}}% }},{\kern 1.0pt}}\\ {-0.1,{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\text{if}}{\kern 1.0pt}{\kern 1.0pt}{\text{MO}}{{\text{S}}_{{\text{new}}% }}{\kern 1.0pt}{\text{ = MO}}{{\text{S}}_{{\text{old}}}},}\\ {{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}-1,{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.% 0pt}{\kern 1.0pt}{\text{if}}{\kern 1.0pt}{\kern 1.0pt}{\text{MO}}{{\text{S}}_{% {\text{new}}}}{\kern 1.0pt}<{\text{MO}}{{\text{S}}_{{\text{old}}}}.}\end{array% }}\right.} (15)
Remark 3.

The reward value -0.1 when {\text{MO}}{{\text{S}}_{{\text{new}}}}{\kern 1.0pt}={\text{MO}}{{\text{S}}_{{% \text{old}}}} is for preventing the model to converge to the local optimal solution. The reward value can be refined further to increase the convergence rate.

During the course of learning, the state-action value function for each agent can be iteratively updated and it can be given as (16) at the top of next page.

\displaystyle{Q(s,a)\leftarrow(1-\alpha)\cdot\underbrace{Q(s,a)}_{{\text{old}}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{value}}}+\underbrace{\alpha}_{{% \text{learning}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{rate}}}\cdot% \overbrace{[\underbrace{r}_{{\text{reward}}}+\underbrace{\gamma}_{{\text{% discount}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{factor}}}\cdot% \underbrace{{{\max}_{a^{\prime}}}Q(s^{\prime},a^{\prime})}_{{\text{estimate}}{% \kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{of}}{\kern 1.0pt}{\kern 1.0pt}{% \kern 1.0pt}{\text{optimal}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{% future}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{value}}}]}^{{\text{% learning}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\text{value}}}.} (16)

The policy in Q-learning based deployment algorithm is chosen according to the \epsilon-greedy policy. More specifically, the policy J that maximizes the Q-value is chosen with a high probability of 1-\epsilon, while other actions are selected with a low probability to avoid staying in a local maximum, i.e.,

\displaystyle{\begin{gathered}\displaystyle Pr(J=\widehat{J})=\left\{\begin{% matrix}1-\epsilon,&\widehat{a}=\text{argmax}Q\left(s,a\right),\\ \epsilon/\left(\left|a\right|-1\right),&otherwise.\end{matrix}\right.\end{% gathered}} (17)
Remark 4.

The convergence rate of the proposed algorithm for solving the problem of 3D deployment of the UAVs varies as the initial position of the UAVs is changed.

III-C Analysis of the Proposed Algorithms

TABLE II: Complexity of four algorithms
Algorithm K-means algorithm IGK algorithm Exhaust search Q-learning algorithm
Complexity O(TUN) O(TN({U^{2}}+U+1)) O(TU{N^{U}}) O(TUN)
Optimality Suboptimal Suboptimal Optimal Suboptimal
Stability Not Stable Not Stable Stable Stable

III-C1 Complexity

The computational complexity of evaluation of square-error of a given solution string is O(TU), the complexity of mutation operator is O(TU^{2}) and K-means operator is O(TUN). Thus, the total complexity of GAK-means is calculated as O(TN({U^{2}}+U+1)). As for the computational complexity of Q-learning algorithm, through training process off-line, when Q-table is updated, the complexity of Q-learning algorithm and K-means algorithm is the same, which can be calculated as O(TUN).

III-C2 Stability and Convergence

As K-means algorithm is very sensitive to initial center, the performance of K-means algorithm and IGK algorithm is deterministic over runs. On the contrary, it has been proved in [37] that when learning rate \alpha satisfies the conditions of convergence theorem \sum\nolimits_{t\geq 0}{{\alpha_{t}}=+\infty,} and \sum\nolimits_{t\geq 0}{\alpha_{t}^{2}}<+\infty, regardless the initial settings of Q(s,a), if trial time is sufficiently large, the Q-learning algorithm is capable of converging to the optimal value with probability 1. The core point of Q-learning algorithm is updating Q-table. Once the Q-table is updated, the position and altitude of UAVs are founded, and the stability of algorithm is stable.

IV Dynamic movement design of UAVs

In this section, a Q-learning based algorithm is proposed for obtaining UAVs’ dynamic movement, which can be proved as non-convex [8]. As the users are roaming at each time slot, the optimal position of UAVs in each cluster changes when the positions of the users change [16]. There is no standard solution for solving the calculated problem proposed efficiently, for the aim of tackling with the challenges, reinforcement learning approach will be invoked in this section.

Remark 5.

As the users move continuously, the optimal positions of the UAVs change at each timeslot. The UAVs have to travel to offer a better downlink service. Otherwise, the users’ QoE requirements will not be satisfied.

Please note that, as UAVs and ground users move continuously, users may move away from their initial areas and mix with other users initially allocated to other UAVs. In this case, updating the users-UAV association according to the new location of both users and UAVs is capable of further enhancing the quality of service for ground users. Nevertheless, it requires rising the size of action space from 7 to 7+2N\sum\nolimits_{n=1}^{N}{\left|{{K_{n}}}\right|} in Q-learning algorithm, in which N is the number of UAVs and \left|{{K_{n}}}\right| denotes the total number of users in the n-th cluster. In this case, the complexity of algorithm will increase tremendously and it will be non-trivial to obtain a policy that maximizes the observed rewards because of huge Q-table. The potential solution is invoking deep reinforcement learning algorithm to reduce the dimension of Q-table, which will be discussed in our future work. In this paper, we study the trajectory design of UAVs during a short time. So, we assume that the user could not roam into other clusters. The practical mobility of users will be considered in our future work.

Algorithm 2 Training Stage Procedure for 3D dynamic movement of UAVs
0:    Initial position of UAVs derive from Algorithm 1, real-time movement of ground users.
1:  Deploy UAVs at the initial positions, random initialize the policy J.
2:  Repeat (for each step of episode):
3:   Random choose an action a;
4:   Carry out action a, observe reward r, update state {s^{\prime}};
5:   Update the position of ground users;
6:   Choose action a according to policy derived from Q to obtain maximum Q;
7:   s\leftarrow{s^{\prime}};
8:   Update Q-table based on reward r to obtain maximum r.
8:  dynamic movement with maximum r.
Algorithm 3 Testing Stage Procedure for 3D dynamic movement of UAVs
0:    Initial position of UAVs derive from Algorithm 1, real-time movement of ground users.
1:  Deploy UAVs at the initial positions.
2:  Repeat (for each time slot):
3:   Select the action a to obtain \max Q;
4:   Update the position of ground users.
5:  until terminal.
5:  3D dynamic movement of UAVs.

Before obtaining the dynamic movement of UAVs, the mobility of the users is supposed to be considered firstly. There exists a variety of mobility models such as deterministic approach, hybrid approach and random walk model [38, 39]. In this paper, we choose random walk model, also denoted as Markovian mobility model111Our approach is capable of accommodating other mobility model without loss of generality., which can present the movement of the users. The direction of each user’s movement is uniformly distributed among left, right, forward, backward, ascent and descending. The speed of the users is assigned as a pedestrian between [0,c_{max}], where c_{max} denotes the maximum speed of a user. The real-time position of the users is presented by random walk model in this paper. This model can be replaced by variety of mobility models such as deterministic approach and hybrid approach. In this case, the proposed method to obtain the dynamic movement of the UAVs can be implemented no matter how ground users move. At each times lot, as the users move, the UAV takes an action {a_{t}}\in A, which also includes choosing a direction for itself, according to the current state, {s_{t}}\in S, based on the decision policy J, then, the state of next times lot is {s_{t+1}}={s_{t}}+{a_{t}}. Q-learning method is utilized to obtain each action at each time slot.

In contrast to the Q-learning based deployment algorithm, the state in the proposed Q-learning based dynamic movement design algorithm to obtain dynamic movement of the UAVs can be divided to two partitions: the dynamic 3D position of the UAVs and the dynamic 2D position of the users, which can be denoted as: \xi=(x_{{\text{UAV}}},y_{{\text{UAV}}},h_{{\text{UAV}}},x_{{\text{user}}},y_{{% \text{user}}}). ({x_{{\text{user}}}},{y_{{\text{user}}}}) is determined by the initial position and movement of the users, ({x_{{\text{UAV}}}},{y_{{\text{UAV}}}},{h_{{\text{UAV}}}}) is determined by the dynamic of position of the UAVs and the action they take at last time slot.

Remark 6.

The convergence rate of the proposed algorithm for solving the problem of dynamic movement of UAVs varies when the simulation timespan is changed. The smaller simulation timespan is, the more accurate dynamic movement will obtain, while the more complex model will be. There is a tradeoff between improving the QoE of users and the complexity of proposed algorithm.

It’s worth to be noted that the proposed Q-learning algorithm for 3D dynamic movement of UAVs is based on the results of 3D deployment of the UAVs. At the beginning of time slot, the UAVs are at their optimal position based on the Q-learning based deployment algorithm. Then, at each time slot, the optimal actions of the UAVs is attained based on the mobility of users, which make up the dynamic movement of the UAVs.

TABLE III: Simulation parameters
Parameter Description Value
f_{c} Carrier frequency 2GHz
{P_{\max}} UAVs’ maximum transmit power 20dBm
N_{0} Noise power spectral -170dBm/Hz
U Ground users’ number 100
N UAVs’ number 4
B Bandwidth 1MHz
b_{1},b_{2} Environmental parameters (dense urban) 0.36,0.21  [20]
\alpha Path loss exponent 2
{\mu_{LoS}} Additional path loss for LoS 3dB  [20]
{\mu_{NLoS}} Additional path loss for NLoS 23dB  [20]
\alpha_{l},\beta Learning rate, Discount factor 0.01,0.7

V Numerical Results

In this section, the numerical results are evaluated to the performance of the QoE of the users with proposed 3D deployment scheme and dynamic movement obtaining scheme. In this simulation, we consider that 100 users are uniformly and independently distributed within a geographical urban district with size 1{\text{km}}\times 1{\text{km}}, and the users roam follow random walk mode at each time slot. The other simulation parameters are given in Table III.

V-A Numerical Results of 3D Deployment Problem

Fig. 3: Three dimensional distribution of UAVs and ground users.
Fig. 4: QoE of each ground user.

Fig. 3 plots the 3D deployment of the UAVs at initial time. It can be observed that ground users are divided into 4 clusters asymmetrically, each cluster is served by one UAV. UAVs in different cluster obtains different altitudes. This is because that as the altitudes of the UAVs increase, a higher path loss and high LoS probability are obtained simultaneously. The altitudes of the UAVs are determined by the density and positions of ground users.

Fig. 4 plots the QoE of each ground user. It can be observed that as the distance from UAV to a ground user increases, the MOS of this ground user decreases. This result is also analyzed by the insights in Remark 1. In different cluster, the MOS figures are differentiated, this is because ground users has various QoE requirements. In this case, the MOS of a ground user is determined by the distance with UAV as well as the users’ QoE requirements.

Fig. 5: Comparisons of QoE over different algorithms and user number.
Fig. 6: Comparisons of QoE over different algorithms and UAV number.
Fig. 7: Rate of convergence over Q-learning algorithm.

Fig. 5 plots the QoE versus maximum number of users over different algorithms. As analysis in Section III, the complexity of exhaust search solution is non-trivial, for the aim of showing the result of exhaust search solution, different number of users are considered. From fig. 5, it can be observed that the total MOS is capable of being improved as the UAVs’ transmit power and the ground users’ number increase. Additionally, Q-learning algorithm is capable of achieving better performance than K-means algorithm and IGK algorithm when increasing the transmit power of UAVs.

Fig. 6 plots the QoE of ground users as the maximum transmit power of UAVs varies. K-means algorithm, IGK algorithm, random deployment are also demonstrated as benchmarks. One can observe that, the sum MOS attained increases when rising transmit power of UAVs, which is because that the received SINR at the users can be improved and improves the sum MOS of ground users. Moreover, it can be observed that increasing the number of cluster is capable of improving the sum MOS. This result is also analyzed by the insights in Remark 2. When the transmit power of UAVs is 5dBm, the MOS of different cluster number is nearly the same. This is because the received SINR is not capable of satisfying the rate requirement of users. It can also be observed from Fig. 6, increasing the transmit power is able to further enlarge the gap between performance of different cluster numbers. Furthermore, Q-learning algorithm outperforms K-means algorithm and IGK algorithm while closing to exhaustive search. It is also worth noting that, by training process off-line, when Q-table is updated, the complexity of Q-learning algorithm and K-means algorithm is the same.

Fig. 7 plots the convergence of Q-learning algorithm. Different line represents different initial position of UAVs. This figure illustrated the reduction of the convergence rate of Q-learning algorithm after about 20 iteration time in training process. It also investigates the stability of the convergence rate at the end, because the Q-table is already updated after training. This result is also analyzed by the insights in Remark 4. In this figure, different lines represents different initial position of UAVs in each trial, it can be observed that the optimal position will be found after about 20 trials regardless of the initial position of UAVs.

V-B Numerical Results of Dynamic Movement Problem

Fig. 8: Comparisons of QoE when ground users are moving or static.
Fig. 9: Dynamic movement for one of the four UAVs when ground users are roaming.

Fig. 8 plots the sum MOS with the UAVs’ movement derived from Q-learning algorithm. The sum MOS when UAVs are static and the sum MOS with the movement of UAVs derived from IGK algorithm are also illustrated as benchmark. When ground users move follow random walk model, UAVs are supposed to move along to the movement of users. Otherwise, the sum MOS will decrease as shown in 8. This result is also analyzed by the insights in Remark 5. One can also observe that Q-learning algorithm performs better than IGK algorithm. Furthermore, note that the computation complexity of IGK algorithm is non-trivial to obtain dynamic movement of UAVs. In contrast to IGK algorithm, Q-learning algorithm enables each UAV gradually learn its dynamic movement, then the dynamic movement is obtained.

Fig. 9 plots the dynamic movement of UAVs derived from proposed approach when ground users move. In this figure, the dynamic movement of a UAV is shown and the duration time is 100s. In this simulation, we assume all the UAVs can move at a constant speed. At each time slot, UAVs choose a direction from the action space which contains 7 directions, then the dynamic movement will maximize the QoE of ground users at each time slot. It’s important to note here, we can adjust the timespan to improve the accuracy of dynamic movement. This in turn increases the number of required iterations for convergence. Therefore, a tradeoff exists between improving the QoE of ground users and the running complexity of the proposed algorithm.

VI Conclusions

The QoE-driven 3D deployment and dynamic movement of multiple UAVs were jointly studied. The algorithm were proposed for solving the problem of maximizing the sum MOS of the users. By proving the 3D deployment problem as a non-convex problem, three steps were provided to tackle this problem. More particularly, GAK-means algorithm was invoked to obtain initial cell partition. A Q-learning-based deployment algorithm was proposed for attaining 3D placement of UAVs when users are static. A Q-learning based movement algorithm was proposed for obtaining 3D dynamic movement of UAVs. It is demonstrated that the proposed 3D deployment scheme outperforms K-means algorithm and IGK algorithm with low complexity. Additionally, with the aid of proposed approach, 3D dynamic movement of UAVs is obtained.

Appendix A: Proof of Lemma 1

The received SINR of user k_{n} in time t while connecting to UAV n, denoted by {\Gamma_{{k_{n}}}}(t), can be expressed as

{\Gamma_{{k_{n}}}}(t)=\frac{{{p_{{k_{n}}}}(t){g_{{k_{n}}}}(t)}}{{{\sigma^{2}}}% }\geq\gamma.

Equation (A.1) can be rewrite as

{P_{\max}}\geq\gamma{\sigma^{2}}{K_{0}}d_{{k_{n}}}^{\alpha}(t)({P_{LoS}}{\mu_{% LoS}}+{P_{NLoS}}{\mu_{NLoS}}).

It can be easily proved that ({P_{LoS}}{\mu_{LoS}}+{P_{NLoS}}{\mu_{NLoS}})\leq{\mu_{LoS}}. When the UAV-user link has a 100% probability of LoS connection, the equation will turn to equality.

Thus, we can obtain that

\begin{array}[]{l}{P_{\max}}\geq\gamma{\sigma^{2}}{K_{0}}d_{{k_{n}}}^{\alpha}(% t){\mu_{LoS}}\\ {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}\geq\gamma{\sigma^{2}}{K_{0}}{\mu_{LoS}}\cdot\max% \left\{{{h_{1}},{h_{2}},\cdots{h_{n}}}\right\}.\end{array}

The proof is completed.

Appendix B: Proof of Proposition 1

The received SINR of user k_{n} in time t while connecting to UAV n, denoted by {\Gamma_{{k_{n}}}}(t), can be expressed as Equation (A.1). Equation (A.1) can be rewrite as follows

{p_{{k_{n}}}}(t){K_{0}}^{-1}d_{{k_{n}}}^{-\alpha}(t){L^{-1}}\geq\gamma{\sigma^% {2}},

where L={P_{LoS}}({\theta_{{k_{n}}}}){\mu_{LoS}}+{P_{NLoS}}({\theta_{{k_{n}}}}){\mu_% {NLoS}}. Then, d_{{k_{n}}}^{\alpha}(t)\leq\frac{{{p_{{k_{n}}}}(t)}}{{\gamma{K_{0}}{\sigma^{2}% }L}}\leq\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}}L}}. In this case, we obtain that

\begin{gathered}\displaystyle{P_{LoS}}({\theta_{{k_{n}}}})={b_{1}}{(\frac{{180% }}{\pi}{\theta_{{k_{n}}}}-\zeta)^{{b_{2}}}}\\ \displaystyle{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}\geq\frac{S}{% {({\mu_{LoS}}-{\mu_{NLoS}})}}-\frac{{{\mu_{NLoS}}}}{{{\mu_{LoS}}-{\mu_{NLoS}}}% }\\ \end{gathered},

where S=\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}}d_{{k_{n}}}^{\alpha}(t)}}. Then, we have {\theta_{{k_{n}}}}(t)={\sin^{-1}}(\frac{{{h_{n}}(t)}}{{{d_{{k_{n}}}}(t)}})\geq% \frac{\pi}{{180}}[\zeta+{e^{M}}]. Finally, we obtain

{h_{n}}(t)\geq{d_{{k_{n}}}}(t)\sin\left[{\frac{\pi}{{180}}(\zeta+{e^{M}})}% \right].

Following from (B.4), we have d_{{k_{n}}}(t){\kern 1.0pt}\leq{\left[{\frac{{{p_{{k_{n}}}}(t)}}{{\gamma{K_{0}% }{\sigma^{2}}L}}}\right]^{{1\mathord{\left/{\vphantom{1\alpha}}\right.\kern-1.% 2pt}\alpha}}}. It can be proved that {P_{LoS}}({\theta_{{k_{n}}}}){\mu_{LoS}}+{\mu_{NLoS}}(1-{P_{LoS}}({\theta_{{k_% {n}}}})\leq{\mu_{LoS}} because of {\mu_{LoS}}<{\mu_{NLoS}}. So, we have d_{{k_{n}}}(t){\kern 1.0pt}\leq{(\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}% }{\mu_{LoS}}}})^{{1\mathord{\left/{\vphantom{1\alpha}}\right.\kern-1.2pt}% \alpha}}}. It’s easy to obtain that {h_{n}}(t)\leq d_{{k_{n}}}(t){\kern 1.0pt}, then, we have

{h_{n}}(t){\kern 1.0pt}\leq{(\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}}{% \mu_{LoS}}}})^{{1\mathord{\left/{\vphantom{1\alpha}}\right.\kern-1.2pt}\alpha}% }}.

The altitude constraint of UAV n is given by

{d_{{k_{n}}}}(t)\sin\left[{\frac{\pi}{{180}}(\zeta+{e^{M}})}\right]\leq{h_{n}}% (t){\kern 1.0pt}\leq{(\frac{{{P_{\max}}}}{{\gamma{K_{0}}{\sigma^{2}}{\mu_{LoS}% }}})^{{1\mathord{\left/{\vphantom{1\alpha}}\right.\kern-1.2pt}\alpha}}}.

The proof is completed.

Appendix C: Proof of Theorem 1

The sum MOS can be expressed as

\displaystyle\begin{array}[]{l}{\rm{MO}}{{\rm{S}}_{{\rm{total}}}}=\sum\limits_% {n=1}^{N}{\sum\limits_{{k_{n}}=1}^{{{\rm{K}}_{n}}}{{\rm{MO}}{{\rm{S}}_{k,n}}({% r_{{k_{n}}}})}}\\ {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% {\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\rm{=}}-{C_{1}}\sum\limits_{n=1}^{N}{% \sum\limits_{{k_{n}}=1}^{{{\rm{K}}_{n}}}{\ln(d({r_{{k_{n}}}}))}}+{C_{2}}.\end{array} (C.1)

Equation (C.1) can be represented and given by (C.2) at the top of next page.

\displaystyle{\text{ MO}}{{\text{S}}_{{\text{total}}}}={C_{2}}-{C_{1}}\sum% \limits_{n=1}^{N}{\sum\limits_{{k_{n}}=1}^{{{\text{K}}_{n}}}{\ln({\text{3}}RTT% +\frac{{FS}}{{{r_{{k_{n}}}}}}+L(\frac{{MSS}}{{{r_{{k_{n}}}}}}+RTT){\kern 1.0pt% }-\frac{{2MSS({2^{L}}-1)}}{{{r_{{k_{n}}}}}})}}. (C.2)

The instantaneous achievable rate of user {{r_{{k_{n}}}}} in (B.1) can be expressed as

\begin{array}[]{l}{r_{{k_{n}}}}(t)={B_{{k_{n}}}}{\log_{2}}(1+\frac{{{p_{{k_{n}% }}}(t){g_{{k_{n}}}}(t)}}{{{\sigma^{2}}}})\\ =\frac{B}{{{K_{n}}}}{\log_{2}}(1+\frac{{\frac{{{P_{\max}}}}{{{K_{n}}{{(\frac{{% 4\pi{f_{c}}}}{c})}^{2}}d_{{k_{n}}}^{\alpha}(t)[{P_{{\rm{LoS}}}}{\mu_{{\rm{LoS}% }}}+{P_{{\rm{NLoS}}}}{\mu_{{\rm{NLoS}}}}]}}}}{{{\sigma^{2}}}}).\end{array}

It is easy to verify that f(z)=\log(1+\frac{\gamma}{A}) is convex with regard to A>0, \gamma>0 and \gamma>A. Then, the non-convexity of (C.3) equals to the non-convexity of d_{{k_{n}}}^{\alpha}(t)({P_{{\text{LoS}}}}{\mu_{{\text{LoS}}}}+{P_{{\text{NLoS% }}}}{\mu_{{\text{NLoS}}}}).

It can be proved that d_{{k_{n}}}^{\alpha}(t)({P_{{\text{LoS}}}}{\mu_{{\text{LoS}}}}+{P_{{\text{NLoS% }}}}{\mu_{{\text{NLoS}}}}) is convex over the altitude of UAVs. However, as we have three variate (X-coordinate, Y-coordinate and altitude of UAVs), and {x_{n}}(t)’s, {y_{n}}(t)’s and {h_{n}}(t)’s varies at each time slot, d_{{k_{n}}}^{\alpha}(t)({P_{{\text{LoS}}}}{\mu_{{\text{LoS}}}}+{P_{{\text{NLoS% }}}}{\mu_{{\text{NLoS}}}}) is non-convex.In this case, (C.3) and (C.1) is also non-convex over {x_{n}}(t)’s, {y_{n}}(t)’s and {h_{n}}(t)’s.

The proof is completed.

Appendix D: Proof of Theorem 2

In order to prove that problem (13a) is NP-hard, three steps are supposed to be taken based on the computational complexity theory. Firstly, a known NP-complete decision problem Q is supposed to be chosen. Secondly, a polynomial time transformation from any instance of Q to an instance of problem (13a) is supposed to be constructed. Thirdly, the two instances are supposed to be proven that they have the same objective value.

In this article, we prove that (13a) is NP-hard even the QoE requirements of ground users are the same, in the meantime, the probability of LoS connections from UAVs towards ground users is one. Thus, problem (13a) can be rewrited as

\begin{gathered}\displaystyle\mathop{\max}\limits_{C{\text{,}}Q,H}{\kern 1.0pt% }{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt% }{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt% }{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt% }{R_{sum}}\\ \displaystyle{\text{s}}{\text{.t}}{\text{.}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{13(b),13(c),13(d),13(e),13(f)}.\\ \end{gathered}

As the probability of LoS connections is one, the overall achievable sum rate is

\begin{array}[]{l}{R_{{\rm{sum}}}}=\sum\limits_{n=1}^{N}{\sum\limits_{{k_{n}}=% 1}^{{{\rm{K}}_{n}}}{{r_{{k_{n}}}}(t)}}\\ =\sum\limits_{n=1}^{N}{\sum\limits_{{k_{n}}=1}^{{{\rm{K}}_{n}}}{\frac{B}{{{K_{% n}}}}{{\log}_{2}}\left({1+\frac{{{P_{\max}}}}{{{\sigma^{2}}{K_{n}}{{(\frac{{4% \pi{f_{c}}}}{c})}^{2}}d_{{k_{n}}}^{\alpha}(t){\mu_{{\rm{LoS}}}}}}}\right)}}% \end{array}.

Following from (D.2), the overall achievable sum rate only relate to the sum Euclidean distance. Furthermore, when the UAVs’ altitudes are assumed to be fixed, the problem (D.1) is simplified as

\begin{gathered}\displaystyle\mathop{\min}\limits_{C{\text{,}}Q}{\kern 1.0pt}{% \kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{% \kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{% \kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}% \sum\limits_{n=1}^{N}{\sum\limits_{{k_{n}}=1}^{{{\rm K}_{n}}}{{d_{{k_{n}}}}}}% \\ \displaystyle{\text{s}}{\text{.t}}{\text{.}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1% .0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{13(b),13(d)}.\\ \\ \end{gathered}

Problem (D.3) is known as the planar K-means problem. The authors in [40] demonstrate the NP-hardness of planar K-means problem by a reduction from planar 3-SAT problem, which is known as a NP-complete problem. The prove process will not be shown is this paper,and it can be seen in [40]. Note that problem (D.3) is a special case of problem (13a), then the original problem in (13a) is NP-hard.

The proof is completed.

References

  • [1] X. Liu, Y. Liu, and Y. Chen, “Deployment and movement for multiple aerial base stations by reinforcement learning,” in 2018 IEEE Globecom Workshops (GC Wkshps).   IEEE, 2018, pp. 1–6.
  • [2] J. Kosmerl and A. Vilhar, “Base stations placement optimization in wireless networks for emergency communications,” in IEEE Proc. of International Commun. Conf. (ICC), 2014, pp. 200–205.
  • [3] Y. Zeng, R. Zhang, and T. J. Lim, “Wireless communications with unmanned aerial vehicles: Opportunities and challenges,” IEEE Commun. Mag., vol. 54, no. 5, pp. 36–42, May 2016.
  • [4] A. Osseiran, F. Boccardi, V. Braun, Kusume et al., “Scenarios for 5G mobile and wireless communications: the vision of the METIS project,” IEEE Commun. Mag., vol. 52, no. 5, pp. 26–35, May 2014.
  • [5] Z. Qin, J. Fan, Y. Liu, Y. Gao, and G. Y. Li, “Sparse representation for wireless communications,” arXiv preprint arXiv:1801.08206, 2018.
  • [6] Z. Qin, Y. Gao, M. D. Plumbley, and C. G. Parini, “Wideband spectrum sensing on real-time signals at sub-Nyquist sampling rates in single and cooperative multiple nodes,” IEEE Trans. Signal Process., vol. 64, no. 12, pp. 3106–3117, Dec 2016.
  • [7] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Mobile Unmanned Aerial Vehicles (UAVs) for energy-efficient Internet of Things Communications,” IEEE Trans. Wireless Commun., vol. 16, no. 11, pp. 7574–7589, Nov. 2017.
  • [8] Q. Wang, Z. Chen, and S. Li, “Joint power and trajectory design for physical-layer secrecy in the UAV-aided mobile relaying system,” arXiv preprint arXiv:1803.07874, 2018.
  • [9] W. Khawaja, O. Ozdemir, and I. Guvenc, “UAV air-to-ground channel characterization for mmwave systems,” arXiv preprint arXiv:1707.04621, 2017.
  • [10] X. Zhang and L. Duan, “Optimal deployment of UAV networks for delivering emergency wireless coverage,” arXiv preprint arXiv:1710.05616, 2017.
  • [11] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Drone small cells in the clouds: Design, deployment and performance analysis,” in IEEE Proc. of Global Commun. Conf. (GLOBECOM), 2015, pp. 1–6.
  • [12] B. Van der Bergh, A. Chiumento, and S. Pollin, “LTE in the sky: trading off propagation benefits with interference costs for aerial nodes,” IEEE Commun. Mag., vol. 54, no. 5, pp. 44–50, May 2016.
  • [13] Y. Zeng and R. Zhang, “Energy-efficient UAV communication with trajectory optimization,” IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 3747–3760, Mar 2017.
  • [14] S. Kandeepan, K. Gomez, L. Reynaud, and T. Rasheed, “Aerial-terrestrial communications: terrestrial cooperation and energy-efficient transmissions to aerial base stations,” IEEE Trans. Aerosp. Electron. Syst., vol. 50, no. 4, pp. 2715–2735, Dec 2014.
  • [15] K. Li, W. Ni, X. Wang, R. P. Liu, S. S. Kanhere, and S. Jha, “Energy-efficient cooperative relaying for unmanned aerial vehicles,” IEEE Trans. Mobile Computing., vol. 15, no. 6, pp. 1377–1386, Aug 2016.
  • [16] L. Liu, S. Zhang, and R. Zhang, “CoMP in the sky: UAV placement and movement optimization for multi-user communications,” arXiv preprint arXiv:1802.10371, 2018.
  • [17] L. Bing, “Study on modeling of communication channel of UAV,” Procedia Computer Science, vol. 107, pp. 550–557, 2017.
  • [18] A. Al-Hourani, S. Kandeepan, and S. Lardner, “Optimal LAP altitude for maximum coverage,” IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 569–572, Jul 2014.
  • [19] M. Alzenad, A. El-Keyi, F. Lagum, and H. Yanikomeroglu, “3D placement of an unmanned aerial vehicle base station (UAV-BS) for energy-efficient maximal coverage,” IEEE Wireless Commun. Lett., May 2017.
  • [20] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Wireless communication using unmanned aerial vehicles (UAVs): Optimal transport theory for hover time optimization,” IEEE Trans. Wireless Commun., vol. 16, no. 12, pp. 8052–8066, Dec 2017.
  • [21] M. Mozaffari, W. Saad, and M. Bennis, “Efficient deployment of multiple unmanned aerial vehicles for optimal wireless coverage,” IEEE Commun. Lett., vol. 20, no. 8, pp. 1647–1650, Jun 2016.
  • [22] J. Lyu, Y. Zeng, and R. Zhang, “Cyclical multiple access in UAV-Aided communications: A throughput-delay tradeoff,” IEEE Wireless Commun. Lett., vol. 5, no. 6, pp. 600–603, Dec 2016.
  • [23] Y. Liu, Z. Qin, M. Elkashlan, Z. Ding, A. Nallanathan, and L. Hanzo, “Nonorthogonal multiple access for 5G and beyond,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2347–2381, Nov 2017.
  • [24] Y. Cai, Z. Qin, F. Cui, G. Y. Li, and J. A. McCann, “Modulation and multiple access for 5G networks,” IEEE Commun. Surv. Tut., Oct 2017.
  • [25] Q. Wu and R. Zhang, “Common throughput maximization in UAV-enabled OFDMA systems with delay consideration,” available online: arxiv. org/abs/1801.00444, 2018.
  • [26] A. Galindo-Serrano and L. Giupponi, “Distributed Q-Learning for aggregated interference control in cognitive radio networks,” IEEE Trans. Veh. Technol., vol. 59, no. 4, pp. 1823–1834, May 2010.
  • [27] U. Challita, W. Saad, and C. Bettstetter, “Cellular-connected UAVs over 5G: Deep reinforcement learning for interference management,” arXiv preprint arXiv:1801.05500, 2018.
  • [28] X. Lu, L. Xiao, and C. Dai, “UAV-aided 5G communications with deep reinforcement learning against jamming,” arXiv preprint arXiv:1805.06628, 2018.
  • [29] S. Zhang, Y. Zeng, and R. Zhang, “Cellular-enabled UAV communication: Trajectory optimization under connectivity constraint,” arXiv preprint arXiv:1710.11619, 2017.
  • [30] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Unmanned aerial vehicle with underlaid device-to-device communications: Performance and tradeoffs,” IEEE Trans. Wireless Commun., vol. 15, no. 6, pp. 3949–3963, Jun. 2016.
  • [31] Y. Gao, X. Wei, L. Zhou, and M. Guizani, “QoE in multimedia domain: a user-centric quality assessment,” International Journal of Multimedia Intelligence and Security, vol. 3, no. 2, pp. 162–186, 2018.
  • [32] K. Mitra, A. Zaslavsky, and C. Ahlund, “Context-aware QoE modelling, measurement, and prediction in mobile computing systems,” IEEE Trans. Mobile Computing., vol. 14, no. 5, pp. 920–936, May 2015.
  • [33] J. Cui, Y. Liu, Z. Ding, P. Fan, and A. Nallanathan, “QoE-based resource allocation for multi-cell NOMA networks,” IEEE Trans. Wireless Commun., vol. 17, no. 9, pp. 6160–6176, Sep. 2018.
  • [34] M. Rugelj, U. Sedlar, M. Volk, J. Sterle, M. Hajdinjak, and A. Kos, “Novel cross-layer QoE-aware radio resource allocation algorithms in multiuser OFDMA systems,” IEEE Commun. Mag., vol. 62, no. 9, pp. 3196–3208, Sept 2014.
  • [35] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient k-means clustering algorithm: Analysis and implementation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 7, pp. 881–892, Aug 2002.
  • [36] J. Cui, Z. Ding, and P. Fan, “The application of machine learning in mmWave-NOMA systems,” in 2018 IEEE 87th Vehicular Technology Conference (VTC Spring), 2018, pp. 1–6.
  • [37] A. Asheralieva and Y. Miyanaga, “An autonomous learning-based algorithm for joint channel and power level selection by D2D pairs in heterogeneous cellular networks,” IEEE Trans. Commun., vol. 64, no. 9, pp. 3996–4012, Sept. 2016.
  • [38] J. Ren, G. Zhang, and D. Li, “Multicast capacity for VANETs with directional antenna and delay constraint under random walk mobility model,” IEEE Access, vol. 5, pp. 3958–3970, 2017.
  • [39] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong, “Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-of-experience,” IEEE J. Sel. Areas Commun., vol. 35, no. 5, pp. 1046–1061, May 2017.
  • [40] M. Mahajan, P. Nimbhorkar, and K. Varadarajan, “The planar k-means problem is NP-hard,” Theoretical Computer Science, vol. 442, pp. 13–21, 2012.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
352740
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description