Noncooperative Aerial Base Station
Placement via Stochastic Optimization
Abstract
Autonomous unmanned aerial vehicles (UAVs) with onboard base station equipment can potentially provide connectivity in areas where the terrestrial infrastructure is overloaded, damaged, or absent. Use cases comprise emergency response, wildfire suppression, surveillance, and cellular communications in crowded events to name a few. A central problem to enable this technology is to place such aerial base stations (AirBSs) in locations that approximately optimize the relevant communication metrics. To alleviate the limitations of existing algorithms, which require intensive and reliable communications among AirBSs or between the AirBSs and a central controller, this paper leverages stochastic optimization and machine learning techniques to put forth an adaptive and decentralized algorithm for AirBS placement without interAirBS cooperation or communication. The approach relies on a smart design of the network utility function and on a stochastic gradient ascent iteration that can be evaluated with information available in practical scenarios. To complement the theoretical convergence properties, a simulation study corroborates the effectiveness of the proposed scheme.
Aerial communications, unmanned aerial vehicles, aerial base stations, stochastic optimization, stochastic gradient, autonomous aerial vehicles, drones.
I Introduction
A widespread recognition among the general public prescribes that unmanned aerial vehicles (UAVs) will sooner or later serve for a number of applications with a transformative impact on human life. For example, a collection of base stations on board autonomous UAVs can be deployed to provide data connectivity in areas where the communication infrastructure is overloaded, damaged, or absent; see e.g. [1]. Such a technology may benefit wildfire suppression, search and rescue operations, communications in crowded demonstrations or sport events (even out of cities such as bicycle races), emergency response, and even natural disaster management. This application requires algorithms that allow such aerial base stations (AirBSs) to adopt positions in 3D space that approximately optimizes quality of service (QoS) to the mobile users (MUs) without compromising operational safety.
The problem of AirBSs placement has been addressed for a single AirBS e.g. in [2, 3, 4, 5]. To accommodate multiple AirBSs, centralized schemes have been proposed in [6, 7, 8, 9], but they require a central controller that receives all system information in real time and instructs the AirBSs on how to navigate, which may be problematic in practice due to the unreliable nature of wireless communications. To alleviate this limitation, decentralized schemes that require communication and coordination only among neighboring AirBSs as well as between AirBSs and MUs have been developed in [10, 11]. However, failures of interAirBS communications or malfunctioning AirBSs may compromise safety. Besides, only updating the neighborhood of each AirBS in real time is errorprone and entails a considerable overhead. Other works are omitted due to space limitations; see e.g. references in [1]. To the best of our knowledge, all existing schemes for multipleAirBS placement are either centralized, or require intensive communication and coordination among AirBSs, or are not amenable to adaptive implementations, as necessary since practical scenarios are subject to constant change.
The goal of this paper is to address the aforementioned limitations. To the best of our knowledge, it is the first work to propose a framework for multipleAirBS placement in a fully adaptive and decentralized fashion, without any need for communication or coordination among UAVs. Besides its simplicity, the proposed scheme features low computational and communication requirements. The technical approach involves two steps. First, a network utility is designed so that each AirBS can determine the improve directions without information from other AirBSs. This naturally renders the developed algorithms noncooperative, decentralized, and robust to communication failures. A key idea is to leverage smooth surrogate functions from machine learning to construct a continuous and differentiable objective. This property is critical for adaptive implementations, since nonsmooth criteria (e.g. mixedinteger programs, [7]) may generate oscillations and erratic behaviour with slightly perturbed inputs.
Second, the designed utility is maximized invoking tools from stochastic optimization, which constitutes the workhorse of deep neural network training given their simplicity and low computational requirements. The resulting placement algorithm inherits these lightweight features, convergence guarantees, and can adapt to changes in an online fashion. This includes changes in MU data usage, MU location, or even the number of MUs or AirBSs. Together with the aforementioned smooth surrogate functions, applying stochastic optimization constitutes the main novelty of this work.^{1}^{1}1Although [12] claims to perform “stochastic gradient search” for AirBS placement [12, Fig. 2(b)], their approach is not related to stochatic optimization. Instead, artificial noise is added to the trajectory to avoid local optima.
Ii Model and Goal
Consider a setup where AirBSs, each one assembled on board an autonomous rotorcraft^{2}^{2}2Examples of rotorcraft, also called multicopters, include quadcopters and hexacopters. The proposed scheme cannot directly accommodate fixedwing UAVs since they are unable to hover at a fixed location. , must provide connectivity between a collection of MUs and the terrestrial cellular infrastructure. For clarity, the discussion focuses on the downlink; yet the scheme can readily accommodate the uplink. Each MU is associated with a single AirBS, which receives data packets from the terrestrial infrastructure an sends them to the MU. This process may also be performed in multiple hops, so a packet is relayed by multiple AirBSs before reaching the MU. The (possibly multihop) link between the infrastructure and the serving AirBS will be referred to as backhaul. Throughout, it will be assumed that the AirBSs can establish a backhaul connection with the terrestrial infrastructure in the entire geographic area of interest. This is a reasonable assumption unless the target area is too large for the number and range of the AirBSs, which would require additional considerations. The AirBSs may operate as relays at the physical layer or even have upperlayer capabilities like eNodeBs in LTE.
The location of the th AirBS is represented by its vector of spatial coordinates. Similarly, the location of the th MU is given by . The downlink channel (i.e. the data channel from the AirBSs to the MUs) is characterized by a function which provides the channel gain between the the th AirBS and the th MU when they respectively lie at locations and . Clearly, this quantity is determined by the antenna gains and propagation phenomena. To simplify the notation, let be the power transmitted by the th AirBS and let denote the power received by the th MU from the th AirBS. The th AirBS is assumed to know the gradient of with respect to . In practice, it is not possible to know exactly, so a certain performance degradation is expected due to errors in this model.
Although the proposed scheme can accommodate any model in the literature (see e.g. [1] for a survey) so long as this function is differentiable with respect to , a simple example is free space propagation. This choice is sometimes reasonable since airground channels have often a line of sight; see e.g. [1]. In that case, is given by
(1) 
where represents the channel gain at unit distance. It depends on and due to the influence of the antenna patterns as well as the lownoise and power amplifiers. Although the model (1) is simple and tractable, it is not appropriate for height optimization; see e.g. [1, Sec. IIIC]. Still, one may adopt (1) to set the horizontal position, i.e. the first two entries of , and determine a constant suitable height e.g. as in [13].
Besides the backhaul and downlink channel, note that there must exist a control channel that allows communication between the AirBSs and MUs even before the AirBSs have arrived at the vecinity of the MUs. This is necessary because the AirBSs need to know at least the approximate locations of the MUs to navigate to a suitable position. This can be implemented as a a satellite or lowfrequency (and therefore longrange and lowrate) terrestrial channel. In this work, only minimum requirements will be imposed on this control channel. Specifically, it is assumed that each MU can send short control packages through this channel and that all AirBSs receive them. It is not even needed that this channel allows bidirectional communication.
The problem of AirBS placement is that of selecting to maximize a given network utility that quantifies the QoS experienced by the MUs. The goal of this paper is to solve this problem in an adaptive and decentralized fashion without cooperation or control communication among AirBSs. Achieving this goal would increase safety and enable swift and flexible deployments of AirBSs, possibly with inexpensive equipment. Note that adaptability is not only critical to accommodate changes in the channel, MU locations, or MU data requirements, but also to accommodate changes in the number of AirBSs. The latter aspect is important since these devices are typically powered by batteries which need to be frequently recharged.
The problem of AirBS placement with any reasonable network utility has multiple local optima and therefore is intrinsically nonconvex. To see this, suppose that all AirBSs are identical and transmit with the same power. Then, permuting the locations of the AirBSs arranged in a locally optimal placement would yield another locally optimal placement. There exist approaches intended to find globally optimal placements (see e.g. [7]), but they require a central processor and are not even guaranteed to find global optima. Therefore, the main concern should not be to reach global optima but a reasonable local optimum; see also [1, Sec. IIID].
Iii Adaptive and Noncooperative Placement
As described in Sec. I and detailed in this section, the proposed framework comprises (i) a suitable network metric designed so that the AirBSs can update their locations without cooperating and (ii) a stochastic optimization algorithm to optimize such a metric.
Consider a network utility function of the form
(2) 
where quantifies the QoS experienced by the th MU. For instance, may be given by (see also Sec. IIIB for more details on this and other functions)
(3) 
where is the noise power.
If the operator wishes that the AirBSs favor areas with heavier traffic demands, one may generalize the average in (2) to assign a greater weight to those users with higher data rates, as described next. Among all packets received by the AirBS network from the terrestrial infrastructure to be delivered to the MUs, let denote the fraction of those packets that are intended for the th user, . In that case, one could think of replacing (2) with
(4) 
which therefore quantifies the average QoS per packet.^{3}^{3}3Moving from (2) to (4) is not necessary to apply the proposed scheme, but it will be instructive to understand the pursued stochastic optimization approach. Since , one can equivalently express (4) as
(5) 
where in (5) is a random variable that takes the value with probability . Thus, can be thought of as a random variable that indicates a packet recipient and is the probability that a given packet must go to the th MU. Note that are dictated by the infrastructure and thus cannot be modified by the AirBSs.
Although AirBSs may know the functional form of , they may not be able to evaluate it since it depends on unknown variables or parameters. For example, the AirBSs may know that is of the form (3) but they will not generally know for all MUs at all times. Collecting this information, which furthermore is subject to constant change, would certainly require a complicated methodology and would be challenging to implement in a decentralized fashion. As seen later, stochastic optimization bypasses this difficulty and allows the AirBSs to minimize without even knowing the number of MUs or AirBSs in the system.
But before delving into that, it is convenient to develop some intuition. To this end, note that the utility in (5) for each (fixed) value of can be estimated by considering packets that the AirBS network receives from the terrestrial infrastructure over a certain time interval. Specifically, suppose that it receives packets and that the th packet has to be delivered to the th MU through the downlink of the associated AirBS. Suppose also that, upon receiving the corresponding packet, the th MU sends through the control channel the information that the AirBSs need to calculate . For example, if is given by (3), then the th MU sends^{4}^{4}4Although the th MU may measure the power of (potentially) all AirBSs using their beacons, it is only associated with one of them. (assume for simplicity that the AirBSs know ). With this information, the AirBSs can obtain and therefore
(6) 
which is an unbiased estimator of under general conditions. By the law of the large numbers, converges to with probability 1 as if the indices are independent or if they make up an ergodic stochastic process.
The bottomline is that the AirBSs can estimate by just receiving information from a small fraction of MUs, which is more practical than maintaining a realtime database per AirBS with the information from all MUs. Although the scheme in the next section does not estimate but its gradient, the underlying idea is the same as illustrated with this toy example.
Finally, note that (6) provides a valid estimator for even if , yet in this case the estimates will be substantially noisy. Stochastic algorithms, like the one in Sec. IIIA, implicitly introduce averaging to counteract this effect.
Iiia Adaptive Stochastic Navigator
As a step towards the targeted technology, this section describes a technique that enables AirBS to update their location with only information that can be easily collected in practice and from only a few MUs at each time. Sec. IIIB will design utilities that allow location updates without information on the other AirBSs.
If there were a central controller with realtime access to all relevant system information and ideal communication links to all AirBSs, then could be maximized e.g. via gradient ascent as
(7) 
where is the iteration index, is a step size, is the initial placement, and (cf. (4))
(8) 
Constraints on could also be accommodated e.g. to impose a minimum safety distance between AirBSs, but this possibility is disregarded here to simplify the exposition. Unfortunately, the centralized approach in (7) is problematic in practice. First, failures in the communication links between the AirBSs and the central controller would limit the capacity of AirBSs to navigate to appropriate locations and could even compromise operational safety. Second, evaluating would generally require information on all MUs and AirBSs such as the communication channel between all MUs and all AirBSs, their locations and so on; see also the discussion earlier in Sec. III. But expecting such a hypothetical central controller to gather this information in real time is generally unrealistic. Besides, evaluating (8) would also require estimates of the (possibly timevarying) probabilities , which entails additional overhead.
A key idea in the proposed framework is to sidestep these difficulties by capitalizing on stochastic optimization methods. These methods stem from the observation that in (8) can be expressed as and replaced with an estimate, as done for in (6). The idea is to update every time an MU (or a certain number of MUs) sends the relevant information through the control channel. Specifically, suppose that at time , the AirBS network receives the th packet from the terrestrial infrastructure and that it must be delivered to the th MU. Upon receiving this packet, the th MU uses the control channel to send the information that the AirBSs need to compute . The AirBSs may then update their positions through a stochastic gradient step:
(9) 
Similarly to what was described around (6), constitutes an unbiased estimate of . This is the same idea utilized by the classical least mean squares (LMS) algorithm in signal processing.
Unlike (7), which requires information from all MUs, the update (9) only involves information from the th MU. The caveat is that the gradient estimates are noisy. To alleviate this effect, it is customary in stochastic optimization to average several of these gradient estimates before performing each update. In this case, this means that the AirBSs may update their position only every packets, where is referred to as the minibatch size.
Clearly, the stochastic update in (9) constitutes a valuable alternative to (7). Since stochastic gradient methods enjoy high popularity, their convergence is well analyzed. Due to space limitations, we omit those results here, but they can be found e.g. in [14]. Note also that almost no memory is required and, in part for this reason, the update can adapt to system changes.
Remark 1. The step size must be chosen in accordance with the dynamic restrictions of the UAVs, such as their maximum horizontal velocity. The sequence may be interpreted as a sequence of waypoints. The autopilot of each UAV will then issue lowlevel control commands to the rotors to follow such a sequence. Because of their dynamics, the UAVs are not capable of accurately following arbitrary waypoint sequences and, hence, the actual trajectory may be a smoothed or “filtered” version of the one indicated by the waypoints. Beforehand, this need not be a limitation since the gradient estimates, and hence , are intrinsically noisy. The aforementioned smoothing effect may even be beneficial for maximizing ; see [14].
IiiB Utility Functions for Noncooperative Placement
Equation (9) provides the update for all AirBS locations. It implies that the th AirBS must update its position as
(10) 
To apply this scheme without cooperation or communication among AirBSs, the user utility must be chosen so that each AirBS can compute without the need for information from other AirBSs. To this end, the key idea here is to focus on functions that can be expressed as
(11) 
for some . This is not a highly restrictive requirement since many usual network utilities, such as the sum rate, are indeed of this form. To see that such functions achieve this goal, note from the chain rule that
(12)  
where the dummy variable occupies the th argument of . Thus, the th AirBS can obtain if it knows both terms in brackets. The first can be obtained if the th MU reports its location through the control channel since the th AirBS already knows its own location and the gradient of ; cf. Sec. II. The second term in brackets in (12) can be computed by the th MU and sent likewise to the AirBSs through the control channel, since it only needs to measure the power received from the AirBSs. This can be done using e.g. their beacons. To sum up, the th MU sends its own location and the second term in brackets through the control channel. With this information, the AirBSs estimate the gradient, which points in a direction of increasing network utility on average.
It remains to design suitable functions of the form (11). The most direct choice of is the rate of the th user, which in turn means that is the expected rate the MUs. To obtain this rate, one may consider two scenarios:

(S1) Each MU is associated with the AirBS from which it receives most power. Since the area where the AirBSs need to be deployed is typically remote and therefore most of the cellular spectrum is empty, it is reasonable to assume that each AirBS operates in a different band and therefore there exists no interAirBS interference. This assumption may also be relaxed, but it will be adopted here for simplicity. Under these circumstances, the rate of the th MU is proportional to
(13) 
(S2) The AirBSs simply relay the signal transmitted by the terrestrial infrastructure. No association is required. To some extent, these relays act as active reflectors. This may be of interest e.g. for broadcasting applications as in sport events. Assuming no carrier phase synchronization among AirBSs, the signals relayed by the AirBSs add at the th MU in an incoherent fashion, which yields a total received power of . The data rate is therefore proportional to
(14)
Thus, one can directly set , where is given either by (13) or (14). However, it is wellknown (see e.g. [2]) that the sum rate is typically a poor network metric in terms of fairness since the resulting AirBS placement may yield a high rate for a small subset of MUs to the detriment of the rest of MUs, which may suffer from a low rate. Thus, both in S1 and S2, it may be preferable to pursue AirBS placements where a certain degree of fairness is promoted. The solution proposed here is to assign a utility of 1 to those users whose rate exceeds a preselected nominal value and 0 otherwise. A related idea has also been used in the singleAirBS scheme [2]. This can be implemented by setting , where is the unitstep function, returning 1 for positive arguments and 0 otherwise. Equivalently, one can impose a minimum requirement on the SNR or, directly, a minimum on the received power. This reads as for (S1) and for (S2), where and .
Although adopting this step function yields a network metric that promotes fairness, two difficulties arise. First, is not differentiable. Second, even if this nondifferentiability is somehow fixed, the resulting functions are flat for almost all^{5}^{5}5That is, except for a set with Lebesgue measure 0. values of . This means that the gradient is zero in those points and the update (10) would yield no movement of the AirBSs unless for very specific values of . Drawing inspiration from the machine learning literature, a solution proposed here is to replace with an appropriately modified sigmoid function. The wellknown sigmoid function is given by and illustrated in Fig. 2. Roughly speaking, it is close to zero for and close to 1 for . Therefore, exhibits the same transition but between and , where is selected by the user. Note that its derivative is , where is the derivative of . Besides being differentiable everywhere, for all and therefore the iteration in (10) will not stall unless the AirBSs are already in a locally optimal placement. Additionally, the operator in is another source for nondifferentiability and flat regions. Drawing inspiration from deep learning, one can replace this function by the logsumexp function , whose gradient is the wellknown softmax function.
Iv Simulation Study
To complement the theoretical convergence guarantees inherited from stochastic gradient methods [14], this section validates the proposed scheme in a setup where AirBSs act as picocells of an LTE system. The main area of interest is a square of km. The AirBSs are deployed initially uniformly at random in the Southwest fourth of that area. Their transmitted power is given by dBm per physical resource block (PRB). The AirBSs update their positions via (10) for with a constant step size , , and a minibatch size . The downlink occupies a 20 MHz band at 2.385 MHz (Sband). AirBSs are equipped with an antenna that radiates only downwards with a gain of 6 dBi. These parameters imply that the channel gain at 1 km from the AirBS is approximately dB assuming that the MUs are equipped with a single isotropic antenna and adopting the channel model (1). Since this model does not allow height optimization, the height of the AirBSs is kept fixed to m; see Sec. II for details and alternatives.
A total of MUs are deployed uniformly at random across the area. Two more users are respectively deployed out of the main area of interest at positions km and km, where the origin is in the bottom left part of the figures. The QoS they receive is quantified through (13). To determine , let the noise power be 112.4 dBm per PRB, which is a typical value in LTE [15, Clause 5.2.1.2]. The goal is to attain an SNR of 21.4 dB, which yields 90 % of the maximum throughput of a transport block size (TBS) of 84760 bits [16]. This yields 762 kbps for every PRB. Thus, the minimum received power is set to dBm. The modified sigmoid parameter is such that dBm.
Fig. 3 depicts the locations of the MUs (except for the two MUs outside the main area of interest) with dots and the final location of the AirBSs with squares. For visualization purposes, the background color at each point indicates the result of clipping the maximum power to the interval dBm, where is the power received from the th AirBS at location when the AirBSs are in their final placement . It is observed that most of the area receives above . The paths followed by the AirBSs (solid green lines) show that the AirBSs naturally spread throughout the region even though they do not cooperate or communicate among them. Although the paths are somewhat noisy due to the stochastic nature of the update, note that they just correspond to waypoints – the actual trajectories will be smoother; see Sec. IIIA. Observe that the final arrangement accounts for the different transmit power of the AirBSs.
Fig. 4 depicts the histograms of (i.e. before applying the proposed algorithm) and (i.e. after applying the algorithm). The final arrangement meets the target QoS at 198 out of the 202 MUs. As a benchmark, the histogram is compared with the one obtained if the AirBSs used Kmeans, which is the algorithm underlying the approaches in [6] and [17]. Kmeans performs poorly here because the two users off the area of interest shift the centroids. In contrast, the objective function designed in Sec. IIIB allows the proposed algorithm to “give up” those two remote users since serving them would require a placement for which many of the users are not served. Indeed, the Kmeans algorithm fails to serve 73 users, which is 18 times more than the proposed method. Other algorithms in the literature are not fairly comparable with the proposed one since they require interAirBS communication or a central controller.
A video with more simulations can be found in [18]. The code will be posted on the first author’s website.
V Conclusions
A framework has been developed for AirBS placement in a fully noncooperative, decentralized, and adaptive fashion. AirBSs move at each iteration in a direction that improves the network utility on the average. The gradient of that utility is obtained via short messages transmitted by the MUs through a lowbandwidth control channel. Existing convergence analysis carries over and performance is validated in a simulation study. Future research will accommodate backhaul constraints for scenarios where the area of interest is large relative to the number of AirBSs and their communication range.
References
 [1] Y. Zeng, Q. Wu, and R. Zhang, “Accessing from the sky: A tutorial on UAV communications for 5G and beyond,” arXiv preprint arXiv:1903.05289, 2019.
 [2] I. BorYaliniz, A. ElKeyi, and H. Yanikomeroglu, “Efficient 3D placement of an aerial base station in next generation cellular networks,” in Proc. IEEE Int. Conf. Commun. IEEE, 2016, pp. 1–5.
 [3] J. Chen and D. Gesbert, “Optimal positioning of flying relays for wireless networks: A LOS map approach,” in Proc. IEEE Int. Conf. Commun. IEEE, 2017, pp. 1–6.
 [4] Z. Han, A. L. Swindlehurst, and K. J. R. Liu, “Optimization of MANET connectivity via smart deployment/movement of unmanned air vehicles,” IEEE Trans. Veh. Technol., vol. 58, no. 7, pp. 3533–3546, 2009.
 [5] D.J. Lee, “Autonomous unmanned flying robot control for reconfigurable airborne wireless sensor networks using adaptive gradient climbing algorithm,” J. Korea Robotics Society, vol. 6, no. 2, pp. 97–107, May 2011.
 [6] X. Liu, Y. Liu, and Y. Chen, “Reinforcement learning in multipleUAV networks: Deployment and movement design,” arXiv preprint arXiv:1904.05242, 2019.
 [7] D.Y. Kim and J.W. Lee, “Integrated topology management in flying ad hoc networks: Topology construction and adjustment,” IEEE Access, vol. 6, pp. 61196–61211, 2018.
 [8] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Efficient deployment of multiple unmanned aerial vehicles for optimal wireless coverage,” IEEE Commun. Lett., vol. 20, no. 8, pp. 1647–1650, 2016.
 [9] Z. Wang, L. Duan, and R. Zhang, “Adaptive deployment for UAVaided communication networks,” arXiv preprint arXiv:1812.03267, 2018.
 [10] D.J. Lee and R. Mark, “Decentralized control of unmanned aerial robots for wireless airborne communication networks,” Int. J. Advanced Robotic Syst., vol. 7, no. 3, pp. 22, 2010.
 [11] S. Park, K. Kim, H. Kim, and H. Kim, “Formation control algorithm of multiUAVbased network infrastructure,” Applied Sciences, vol. 8, no. 10, pp. 1740, 2018.
 [12] O. Andryeyev and A. MitscheleThiel, “Increasing the cellular network capacity using selforganized aerial base stations,” in Proc. Workshop Micro Aerial Veh. Netw., Syst., Appl. ACM, 2017, pp. 37–42.
 [13] A. AlHourani, S. Kandeepan, and S. Lardner, “Optimal LAP altitude for maximum coverage,” IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 569–572, 2014.
 [14] L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for largescale machine learning,” Siam Review, vol. 60, no. 2, pp. 223–311, 2018.
 [15] 3GPP, “Study on provision of lowcost machinetype communications (MTC) user equipments (UEs) based on LTE,” TR 36.888, Jun. 2013.
 [16] 3GPP TSG RAN WG1, “Discussion on modulation enhancements,” Meeting 92, Mar. 2018.
 [17] H. El Hammouti, M. Benjillali, B. Shihada, and M.S. Alouini, “A distributed mechanism for joint 3D placement and user association in UAVassisted networks,” in IEEE Wireless Commun. Netw. Conf., Marrakech, Morocco, Apr. 2019.
 [18] D. Romero, “Noncooperative placement video,” {https://youtu.be/ZNQiVQ3TtGI}, May 2019, [Online].