Optimal Caching Designs for Perfect, Imperfect and Unknown File Popularity Distributions in LargeScale MultiTier Wireless Networks
Abstract
Most existing caching solutions for wireless networks rest on the ideal assumption that the file popularity distribution is perfectly known. In this paper, we consider optimal random caching designs for perfect, imperfect and unknown file popularity distributions in a largescale multitier wireless network. First, in the case of perfect file popularity distribution, we formulate the successful transmission probability (STP) optimization problem, which is nonconvex. We propose an efficient parallel iterative algorithm to obtain a stationary point based on parallel successive convex approximation (SCA). Then, in the case of imperfect file popularity distribution, we formulate the worstcase STP maximization problem. To solve this challenging robust optimization problem, we transform it to an equivalent complementary geometric programming (CGP), and propose an efficient iterative algorithm to obtain a stationary point based on SCA. To the best of our knowledge, this is the first work explicitly considering the estimation error of file popularity distribution in the optimization of caching design. Next, in the case of unknown file popularity distribution, we formulate the stochastic STP (i.e., the STP in the stochastic form) maximization problem. This is a challenging nonconvex stochastic optimization problem, and we develop an efficient iterative algorithm to obtain a stationary point based on stochastic parallel SCA. As far as we know, this is the first work considering stochastic optimization in a largescale wireless network. Finally, by numerical results, we show that the proposed solutions achieve notable gains over existing schemes in all three cases, and reveal the values of the robust caching optimization and stochastic caching optimization in the cases of imperfect file popularity distribution and unknown file popularity distribution, respectively.
I Introduction
The rapid proliferation of smart mobile devices has triggered an unprecedented growth of the global mobile data traffic. Caching content closer to end users, e.g., at base stations (BSs) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] or even at end users [11, 12], has been proposed as an effective way to support the dramatic traffic growth, by reducing the distance between popular contents and requesters, and alleviating the backhaul load. In addition, caching has also been jointly designed with multicast [5, 7, 10] and cooperation [1, 2]. In this paper, we focus on caching at BSs, and a large body of recent research would be out of the scope of this paper.
Caching in singletier wireless networks has been actively studied [3, 4, 5]. Specifically, in [3], the authors consider the optimal caching design and transmission strategy to minimize the required link capacity in a square grid wireless network. In [4] and [5], the authors consider random caching at BSs, analyze and optimize the hit probability [4] and the successful transmission probability (STP) [5] in largescale wireless networks that capture the stochastic nature of geographic locations of BSs and users.^{1}^{1}1 The stochastic network model is a widely used tractable model that allows to study the average behavior of a network using mathematical tools from stochastic geometry and is about as accurate as the standard grid model, when compared to an actual network [13]. As caching can successfully alleviate the urgent backhaul requirement for small cells, a significant amount of research effort has been devoted to optimal caching design in multitier wireless networks [6, 7, 8, 9, 10]. For instance, in [6], the authors consider coded and uncoded caching at small BSs to minimize the expected downloading time in a macro cell with multiple small BSs and users at fixed locations. In [7, 8, 9, 10], the authors consider hybrid caching [7] and random caching [8, 9, 10] in largescale multitier wireless networks, and focus on the analysis and optimization of the STP. Specifically, in our previous work [7], we obtain optimal hybrid caching design for a twotier wireless network. In [8] and [9], the authors obtain optimal caching design for a multitier network in the case of uniform signaltointerference ratio (SIR) threshold for all users. In the general case of arbitrary SIR thresholds for users, the optimization problem is nonconvex, and in [9], an optimal caching solution of a simplified convex problem is used as a suboptimal solution of the original nonconvex problem. In our previous work [10], a stationary point of the nonconvex problem is obtained only for a twotier wireless network. Note that most existing works on caching [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] assume that the file popularity distribution is perfectly known. In practice, however, such an assumption cannot be reasonably justified [14].
Some recent works consider caching design in the case that the file popularity distribution is not known and only instantaneous file requests from users can be observed[15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. These works generally fall into two categories. One category adopts twostage methods for caching design in the case of unknown file popularity distribution [15, 17, 18, 19, 16]. Specifically, in the first stage, the file popularity distribution is estimated based on historical file requests, via various learning approaches; in the second stage, caching schemes are proposed based on the estimated popularity distribution. To be specific, in [15] and [19], the file popularity distribution is estimated using transfer learning [15] and file feature space partitioning [19]. Based on the estimated file popularity distribution, the authors consider caching the most popular files, and analyze the backhaul offloading [15] and cache hit [19]. In [17], the request frequency for each file obtained from historical file requests is considered as the popularity of the file, and the performance gap for minimizing the offloading time caused by the estimation error is analyzed. In [18], the file popularity distribution is estimated by learning user preferences via probabilistic latent semantic analysis, and a greedy algorithm is proposed to obtain a caching solution of the offloading probability maximization problem (under the estimated file popularity distribution) with performance guarantee. In [16], given apriori information on the file popularity distribution, the file request rate is estimated using Bayesian framework, and an asymptotically optimal policy is proposed to maximize the hit probability (under the estimated file popularity distribution). Note that these works [15, 17, 18, 19, 16] consider either simple caching design without performance guarantee, such as caching the most popular files at each BS [15, 19], or optimizationbased caching design obtained by optimizing simple performance metrics that may not fully reflect natures of wireless networks (such as fading [17] and stochastic locations of BSs and users [18, 16]). In addition, note that [15, 17, 18, 19, 16] fail to consider estimation errors in designing caching schemes.
The other category adopts singlestage methods for caching design [20, 21, 22, 23, 24] in the case of unknown file popularity, where caching solutions are gradually updated while accumulating file requests using stochastic optimization [20] or reinforcement learning [21, 22, 23, 24]. Compared with twostage methods, the caching solutions obtained by single stage methods can be continuously improved as new file requests are observed. Specifically, in [20], the authors optimize power control and caching for video streaming in a multicell multiuser MIMO network using techniques for stochastic optimization. In [21] and [22], the authors formulate the optimal caching design problem for a singlecell wireless network [21] and the optimal cooperative caching design problem for a multicell wireless network [22], and develop lowcomplexity algorithms to obtain approximate solutions using results for multiarmed bandit problems. In [23] and [24], the authors formulate the dynamic optimal caching design problem for a singlecell wireless network [23] and a multicell wireless network [24], and develop approximate solutions using Qlearning techniques. However, the fixed network topologies considered in [20, 21, 22, 24, 23] cannot fully capture the geographic features of the locationsof BS and users. In addition, the caching solutions in [20, 21, 22, 24, 23] are for one BS [21, 23] or a singletier of BSs [20, 22, 24].
Therefore, further studies are needed to optimize caching design in multitier wireless networks when perfect file popularity distribution is not known. In this paper, we consider optimal random caching design in three cases, i.e., the case of perfect file popularity distribution (where the file popularity distribution has been estimated, and the estimation error is negligible), the case of imperfect file popularity distribution (where the file popularity distribution has been estimated, and a deterministic bound of the estimation error is known) and the case of unknown file popularity distribution (where there is no prior information of the file popularity distribution, but instantaneous file requests from some users can be observed over time), in a largescale multitier wireless network. Our main contributions are summarized below.

In the case of perfect file popularity distribution, we formulate the STP maximization problem, which is nonconvex with a complicated objective function. We propose an efficient parallel iterative algorithm to obtain a stationary point based on parallel successive convex approximation (SCA) [25]. Specifically, by carefully designing an approximation function for each tier, we obtain closedform optimal solutions for the approximate functions of all tiers at each iteration, and hence can significantly reduce the computational complexity and improve the convergence speed of the iterative algorithm.

In the case of imperfect file popularity distribution, we formulate the worstcase STP (over all possible values of the true file popularity distribution) maximization problem. This is a challenging robust optimization problem which does not lie in the category of convexconcave games that can be easily solved. We transform it to an equivalent complementary geometric programming (CGP) and propose an efficient iterative algorithm to obtain a stationary point using SCA [26]. Note that this case corresponds to that considered in the second stage of the twostage methods proposed in [15, 17, 18, 19, 16]. But the essential difference is that we explicitly consider the estimation error of file popularity distribution in the optimization of caching design.

In the case of unknown file popularity distribution, we formulate the stochastic STP (i.e., the STP in the stochastic form) maximization problem. This is a challenging nonconvex stochastic optimization problem, and we develop an efficient parallel iterative algorithm to obtain a stationary point based on stochastic parallel SCA [27]. Specifically, by carefully designing an approximation function for each tier, we obtain closedform optimal solutions for the approximate functions of all tiers at each iteration, make full use of instantaneous file requests from users at each slot, and significantly improve the convergence speed of the iterative algorithm. Note that this case corresponds to that considered in the singlestage methods proposed in [20, 21, 22, 23, 24]. The key difference is that we consider stochastic optimization in a largescale wireless network which captures the channel fading, interference and stochastic nature of wireless networks.

Finally, by numerical simulations, we show the convergence of the proposed solutions. We also show that the proposed solutions achieve notable gains over existing schemes in all three cases.
Ii System Model
Iia Network Model
In this part, we first elaborate on the network model which extends those in [8, 9, 10] in the sense that besides perfect file popularity distribution, it also models imperfect and unknown file popularity distributions. We consider a largescale tier network consisting of tiers of BSs, where ,^{2}^{2}2Note that for , the case of perfect file popularity distribution has been studied in our previous work [5], and the cases of imperfect and unknown file popularity distributions can be easily investigated following the results for in this paper. as shown in Fig. 1. The locations of the BSs in tier are spatially distributed as an independent homogeneous Poisson point process (PPP), denoted as , with density , for all . The locations of the users are also distributed as an independent homogeneous PPP . Consider a discretetime system with time being slotted. Let denote the slot index. Each BS in tier has one transmit antenna with fixed transmission power . Each user has one receive antenna. All BSs are operating on the same frequency band and each BS adopts an orthogonal transmission mechanism over frequency or time at each slot. Both path loss and smallscale fading are considered: for path loss, a transmitted signal from either tier with distance is attenuated by a factor , where is the path loss exponent [7, 8, 9, 10]; for smallscale fading, at each slot, Rayleigh fading channels are adopted. Since a multitier network is primarily interferencelimited, we ignore the thermal noise for simplicity[9].
Let denote the set of files in the tier network.
For ease of illustration, as in [6, 7, 8, 9, 10, 15], assume that all files have the same size.^{3}^{3}3The results in this paper can be easily extended to the case of different file sizes by considering file combinations of the same total size, but formed by files of possibly different sizes [10].
At each slot, a user requests at most one file at random. Let denote the file request status of user at slot , where if user does not request any file and if user requests file .
For ease of analysis, assume , , are i.i.d. with respect to and [7, 8, 9, 10, 15].^{4}^{4}4
The results in this paper can be extended to the case where there are multiple classes of users and file requests of users in the same class are i.i.d..
Denoted , , , where .
Thus, represents the file popularity distribution, which usually evolves at a slower timescale.
In this paper, we consider the following three cases of file popularity distribution.
Perfect file popularity distribution: In this case, we assume that the file popularity distribution has been estimated by some learning methods, and the estimation error is negligible. That is, the exact value of is known.
Imperfect file popularity distribution: In this case, we assume that the file popularity distribution has been estimated by some learning methods, and a deterministic bound of the estimation error is known. Note that this case corresponds to that considered in the second stage of the twostage methods proposed in [15, 17, 18]. Let denote the estimated file popularity distribution, and let denote the estimation error. Assume , , and for some known , for all , which are usually satisfied for effective learning methods. The (true) file popularity distribution is given by , and satisfies , where , and for all .
Unknown file popularity distribution: In this case, we assume that there is no prior information of the file popularity distribution , but the file requests from the users in some set without bias can be observed over time. Here can represent the set of users located in the physical area of a BS or a cluster of BSs.^{5}^{5}5
For example, a user can submit its file request to its nearest BS, and each BS can gather file requests corresponding to an unbiased observation of the file popularity distribution.
Note that this case corresponds to that considered in the singlestage methods proposed in [20, 21, 22, 23, 24].
The tier network consists of cacheenabled BSs. In tier , each BS is equipped with a cache of size (in number of files) to store different popular files out of . We say every different files form a combination. Thus, there are in total different combinations, each with different files. Let denote the set of indices for the combinations. More detailed descriptions of and can be found in [5, 7, 10].
IiB Caching and User Association
To provide high spatial file diversity, we consider random caching in the cacheenabled tier network [8, 9, 10], as illustrated in Fig. 1. In particular, each BS in tier stores different files with certain probability [8, 9, 10]. The probability that combination is stored at each BS in tier is , where satisfies:
(1)  
(2) 
A random caching design is specified by the caching distributions for file combinations . Let denote the set of indices for the combinations containing file . Based on , we also define the probability that file is stored at a BS in tier , i.e.,
(3) 
From [4, 5, 7, 10], we know that the constraints on in (1), (2) and (3) can be equivalently rewritten as the following constraints on :
(4)  
(5) 
The constraints in (5) are due to the fact that each file combination in tier contains different files and the sum of the caching probabilities for all file combinations is one. The details can be found in [5]. For any satisfying (4) and (5), one corresponding satisfying (1), (2) and (3) can be easily obtained using the method proposed in [4].^{6}^{6}6When the file popularity changes, the cache content of each BS can be easily updated based on its previous cached content. Random caching design is specified by the caching distributions for file combinations , the size of which is . However, later we shall see that the performance metrics and optimal random caching design problems considered in this paper depend only on the caching probabilities for files , the size of which is .
If a file is stored in a tier, a user requesting the file is associated with the BS which provides the maximum longterm average received power among all BSs storing the file [8, 9, 10]. Otherwise, the user will be served through other service mechanisms [8, 9, 10], the investigation of which is beyond the scope of this paper.^{7}^{7}7 As in [8, 9, 10], we assume that user association can be done through some signaling mechanisms. For example, a user can submit its file request to its nearest BS and associates with its serving BS via the help of its nearest BS. The probability that an arbitrary user requesting file is associated with tier is given by [8, 9, 10]:
(6) 
IiC Performance Metrics
In this paper, we study w.l.o.g. the performance of a typical user , which is located at the origin. Suppose requests file and is associated with tier . Let denote the index of the serving BS of . We denote and as the distance and the smallscale channel between BS and , respectively. For analytical tractability, as in [9, 10], we assume all BSs are active for serving their own users.^{8}^{8}8This assumption corresponds to the worstcase interference strength for the typical user. The performance obtained under this assumption provides a lower bound on the performance of the practical network where some void BSs may be shut down. In this case, the SIR of , denoted by , is given by [10]:
(7) 
Note that the distribution of is affected by .
We assume that file delivered from tier can be decoded correctly at if , where represents a threshold for tier [9].
Requesters are mostly concerned about whether their desired files can be successfully received.
In the following, we introduce the performance metrics in the three cases of file popularity distribution.
Perfect file popularity distribution: When the exact value of is known, we adopt the probability that a randomly requested file by is successfully transmitted, called the STP [10]:
(8) 
as the performance metric,^{9}^{9}9 Note that the STP with fixed STR thresholds can be interpreted as the STP under multicast at the high user density region [5, 7, 10], and is a widely used performance metric in wireless caching [5, 7, 8, 9, 10], as it is tractable and is also closely related to some other important performance metrics, such as average transmission rate and average transmission delay. where and are given by:
(9) 
(10) 
respectively.
Here, and denote the complementary incomplete Beta function and the Beta function, respectively.
Note that is a linear function of and a nonconcave function of .
Imperfect file popularity distribution: When the exact value of is not known except that it falls within a known set , we adopt the worstcase STP:
(11) 
as the performance metric.^{10}^{10}10Note that one of the major techniques for designing systems that are robust against modeling uncertainties is to optimize the worstcase performance.
Unknown file popularity distribution: When there is no prior information of , but can be obtained at each slot for some , we adopt the STP in the stochastic form, called the stochastic STP:
(12) 
as the performance metric, where with denoting the indicator function, , and the expectation is take over . Note that as for all and .
In Section III, Section IV and Section V, we shall maximize the STP, the worstcase STP and the stochastic STP in the cases of perfect, imperfect and unknown file popularity distribution, respectively, as shown in Fig 2.^{11}^{11}11 Based on the solutions in this paper. we can obtain promising caching designs under multicast at the general usesr density region using the method proposed in our previous work [5, 7].
Iii Performance Optimization for Perfect File Popularity Distribution
In this section, we consider the case of perfect file popularity distribution. In this case, we would like to maximize the STP, by optimizing the caching probabilities. We formulate the optimal random caching design problem as follows.
Problem 1 (Optimization for Perfect File Popularity Distribution)
s.t. 
where is given by (8).
Problem 1 is equivalent to Problem 0 in [9]. It is nonconvex (as the objective function is nonconvex in ), and in [9] a suboptimal solution of it is obtained by solving an approximate convex problem. In the following, we extend the technique in [10] for the case of to the case of , and develop an efficient parallel iterative algorithm to obtain a stationary point of Problem 1 using parallel SCA. Different from the cyclic computation mechanism in [10], the parallel computation mechanism here can speed up the computation, especially for large . Specifically, this algorithm updates the caching probabilities of the tiers, i.e., , , at each iteration in a parallel manner, by maximizing approximate functions of .
For notation convenience, define:
(13) 
where . Note that can be rewritten as:
Let denote the caching probabilities of tier obtained at iteration , and denote . At iteration , choose:
(14) 
as an approximation function of for updating . Note that the strongly concave component function of , i.e., , is left unchanged, and the other nonconcave (actually convex) component functions, i.e., , , , are linearized at . This choice of the approximate function is beneficial from several aspects[10]. Firstly, it can guarantee the convergence of the algorithm to a stationary point of Problem 1, which will be shown in Theorem 1. Secondly, it usually leads to fast convergence of the algorithm by exploiting the partial concavity of the objective function, which will be shown in Fig. 3. Thirdly, it yields a closedform optimal solution of the optimization problem for each tier at each iteration, which will be shown in Lemma 1, and hence a lowcomplexity algorithm.
Specifically, at iteration , we first solve the following problem for each tier separately, in a parallel manner.
Problem 2 (Approximate Convex Problem of Problem 1 for Tier at Iteration )
s.t.  (15)  
(16) 
Problem 2 is a convex optimization problem and Slater’s condition is satisfied, implying that strong duality holds. Based on KKT conditions, we can obtain a closedform optimal solution of Problem 2.
Lemma 1 (Optimal Solution of Problem 2)
For all , the optimal solution of Problem 2 is given by:
(17) 
where and is the Lagrange multiplier that satisfies .
Note that in Lemma 1 can be efficiently obtained by using bisection search which achieves a desired accuracy with computational complexity . Then, we update the caching probabilities of tier by:
(18) 
where is a positive diminishing stepsize satisfying
(19) 
Finally, the details of the proposed parallel iterative algorithm are summarized in Algorithm 1. Based on [25, Theorem 1], we can show the following result.
Theorem 1 (Convergence of Algorithm 1)
Proof:
Please refer to Appendix A. \qed
Iv Robust Optimization for Imperfect File Popularity Distribution
In this section, we consider the case of imperfect file popularity distribution. In this case, we would like to maximize the worstcase STP, by optimizing the caching probabilities. We formulate the robust optimal random caching design problem as follows.
Problem 3 (Robust Optimization for Imperfect File Popularity Distribution)
s.t. 
where is given by (8).
Problem 3 is a challenging maximin problem, which does not lie in the category of convexconcave games that can be easily solved (as is a nonconcave function of ). In the following, we solve it in two steps.
Firstly, we transform the maximin problem in Problem 3 to an equivalent maximization problem. As the inner problem is a linear programming (LP) with respect to and strong duality holds for LP, the inner problem shares the same optimal value with its dual problem. Thus, we can transform Problem 3 to the following equivalent maximization problem by replacing the inner problem with its dual problem.
Problem 4 (Equivalent Problem of Problem 3)
s.t.  
(20) 
where and .
Note that , and are dual variables for the dual problem of the inner problem, corresponding to , , , and , respectively.
Proof:
Please refer to Appendix B. \qed
Based on Lemma 2, we can solve Problem 4 instead of Problem 3. Problem 4 is nonconvex, as the constraints in (20) are nonconvex. In what follows, we show how to obtain a stationary point of Problem 4 using SCA. We first rewrite as with , and define new variables :^{12}^{12}12Note that , as , and in most practical cases, [10]. Thus, in the rest of this paper, we consider the case where for all and .
(21) 
We also introduce a new variable which serves as a lower bound of the objective function of Problem 4:
(22) 
Therefore, Problem 4 can be equivalently transformed to the following problem.^{13}^{13}13For ease of analysis, in Problem 5, we consider instead of , which does not change the optimal value or affect the numerical solution.
Problem 5 (Equivalent Problem of Problem 4)
s.t.  (23)  
(24)  
(25)  
(26)  
(27) 
Note that the inequality constraints in (24), (25) and (27) are active at any optimal solution of Problem 5, and hence can replace the equality constraints in (20), (69) and (5), respectively. In Problem 5, a monomial is maximized subject to upper bounds on posynomials (i.e., (25), (26) and (27)) and upper bounds on the ratios of posynomials (i.e., (23) and (24)). Thus, Problem 5 is a CGP, and can be solved by the method proposed in [26], which is based on SCA. The main idea is to solve a sequence of successively refined geometric programmings (GPs), each of which is obtained by approximating the denominators of the ratios of posynomials in (23) and (24) with monomials. Specifically, at iteration , update by solving the following approximate GP of Problem 5, which is parameterized by obtained at iteration .
Problem 6 (Approximate GP at Iteration )
s.t.  
(28)  
(29) 
where
Problem 6 is a standard GP, which can be readily transformed into a convex problem and solved using standard convex optimization techniques, such as the barrier method which achieves a desired accuracy with computational complexity . The details for solving Problem 5 are summarized in Algorithm 2. By the convergence result in [26, Proposition 3], and by comparing the KKT conditions of Problem 4 and Problem 5, we have the following result.
Theorem 2 (Convergence of Algorithm 2)
Proof:
Please refer to Appendix C. \qed
V Stochastic Optimization for Unknown File Popularity Distribution
In this section, we consider the case of unknown file popularity distribution. In this case, we would like to maximize the stochastic STP, by optimizing the caching probabilities. We formulate the stochastic optimal random caching design problem as follows.
Problem 7 (Stochastic Optimization for Unknown File Popularity Distribution)
s.t. 
where is given by (8).
Note that although cannot be calculated without knowledge of the statistics of , it can be optimized using stochastic optimization.^{14}^{14}14The basic idea of stochastic optimization is to optimize a function in the presence of randomness based on the fact that realizations of random parameters can be obtained. Problem 7 is a nonconvex stochastic optimization problem, which is more challenging than a convex one. In the following, we develop an efficient parallel iterative algorithm to obtain a stationary point of Problem 7, using stochastic parallel SCA [27]. Similarly, the parallel computation mechanism here can speed up the computation, especially for large . Specifically, this algorithm updates the caching probabilities of the tiers, i.e., , , at each slot in a parallel manner, by maximizing approximate functions of .
Let denote the caching probabilities of tier obtained at slot , and denote . At slot , choose:
(30) 
as an approximation function of for updating . Here, ,^{15}^{15}15 We consider a simple way of making use of instantaneous file requests at each slot without assuming any apriori information of the file popularity distribution. Our focus here is to optimize random caching design using stochastic optimization, instead of pure estimation of file popularity. , is a positive diminishing stepsize satisfying:
(31) 
and is given by:
(32) 
where , , .
Note that the strongly concave component function of , i.e., , is left unchanged, and the other nonconcave (actually convex) component functions, i.e., , , , are linearized at . In addition, note that the approximation of at each slot , i.e., , becomes more accurate as increases, and the approximation of based on accumulated instantaneous file requests , , i.e., , becomes more accurate as increases. This choice of the approximate function, , given in (30), is beneficial for similar reasons as in the case of perfect file popularity distribution.
Specifically, at slot , we first solve the following problem for each tier separately, in a parallel manner.
Problem 8 (Approximate Convex Problem of Problem 7 for Tier at Slot )
s.t. 