Optimal Caching Designs for Perfect, Imperfect and Unknown File Popularity Distributions in Large-Scale Multi-Tier Wireless Networks

# Optimal Caching Designs for Perfect, Imperfect and Unknown File Popularity Distributions in Large-Scale Multi-Tier Wireless Networks

Chencheng Ye, Ying Cui, Yang Yang and Rui Wang C. Ye and Y. Cui are with Shanghai Jiao Tong University, China. Y. Yang is with University of Luxembourg, Luxembourg. R. Wang is with Southern University of Science and Technology, China.
###### Abstract

Most existing caching solutions for wireless networks rest on the ideal assumption that the file popularity distribution is perfectly known. In this paper, we consider optimal random caching designs for perfect, imperfect and unknown file popularity distributions in a large-scale multi-tier wireless network. First, in the case of perfect file popularity distribution, we formulate the successful transmission probability (STP) optimization problem, which is nonconvex. We propose an efficient parallel iterative algorithm to obtain a stationary point based on parallel successive convex approximation (SCA). Then, in the case of imperfect file popularity distribution, we formulate the worst-case STP maximization problem. To solve this challenging robust optimization problem, we transform it to an equivalent complementary geometric programming (CGP), and propose an efficient iterative algorithm to obtain a stationary point based on SCA. To the best of our knowledge, this is the first work explicitly considering the estimation error of file popularity distribution in the optimization of caching design. Next, in the case of unknown file popularity distribution, we formulate the stochastic STP (i.e., the STP in the stochastic form) maximization problem. This is a challenging nonconvex stochastic optimization problem, and we develop an efficient iterative algorithm to obtain a stationary point based on stochastic parallel SCA. As far as we know, this is the first work considering stochastic optimization in a large-scale wireless network. Finally, by numerical results, we show that the proposed solutions achieve notable gains over existing schemes in all three cases, and reveal the values of the robust caching optimization and stochastic caching optimization in the cases of imperfect file popularity distribution and unknown file popularity distribution, respectively.

Cache, multi-tier wireless network, stochastic geometry, robust optimization, stochastic optimization, complementary geometric programming

## I Introduction

The rapid proliferation of smart mobile devices has triggered an unprecedented growth of the global mobile data traffic. Caching content closer to end users, e.g., at base stations (BSs) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] or even at end users [11, 12], has been proposed as an effective way to support the dramatic traffic growth, by reducing the distance between popular contents and requesters, and alleviating the backhaul load. In addition, caching has also been jointly designed with multicast [5, 7, 10] and cooperation [1, 2]. In this paper, we focus on caching at BSs, and a large body of recent research would be out of the scope of this paper.

Caching in single-tier wireless networks has been actively studied [3, 4, 5]. Specifically, in [3], the authors consider the optimal caching design and transmission strategy to minimize the required link capacity in a square grid wireless network. In [4] and [5], the authors consider random caching at BSs, analyze and optimize the hit probability [4] and the successful transmission probability (STP) [5] in large-scale wireless networks that capture the stochastic nature of geographic locations of BSs and users.111 The stochastic network model is a widely used tractable model that allows to study the average behavior of a network using mathematical tools from stochastic geometry and is about as accurate as the standard grid model, when compared to an actual network [13]. As caching can successfully alleviate the urgent backhaul requirement for small cells, a significant amount of research effort has been devoted to optimal caching design in multi-tier wireless networks [6, 7, 8, 9, 10]. For instance, in [6], the authors consider coded and uncoded caching at small BSs to minimize the expected downloading time in a macro cell with multiple small BSs and users at fixed locations. In [7, 8, 9, 10], the authors consider hybrid caching [7] and random caching [8, 9, 10] in large-scale multi-tier wireless networks, and focus on the analysis and optimization of the STP. Specifically, in our previous work [7], we obtain optimal hybrid caching design for a two-tier wireless network. In [8] and [9], the authors obtain optimal caching design for a multi-tier network in the case of uniform signal-to-interference ratio (SIR) threshold for all users. In the general case of arbitrary SIR thresholds for users, the optimization problem is nonconvex, and in [9], an optimal caching solution of a simplified convex problem is used as a sub-optimal solution of the original nonconvex problem. In our previous work [10], a stationary point of the nonconvex problem is obtained only for a two-tier wireless network. Note that most existing works on caching [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] assume that the file popularity distribution is perfectly known. In practice, however, such an assumption cannot be reasonably justified [14].

The other category adopts single-stage methods for caching design [20, 21, 22, 23, 24] in the case of unknown file popularity, where caching solutions are gradually updated while accumulating file requests using stochastic optimization [20] or reinforcement learning [21, 22, 23, 24]. Compared with two-stage methods, the caching solutions obtained by single stage methods can be continuously improved as new file requests are observed. Specifically, in [20], the authors optimize power control and caching for video streaming in a multi-cell multi-user MIMO network using techniques for stochastic optimization. In [21] and [22], the authors formulate the optimal caching design problem for a single-cell wireless network [21] and the optimal cooperative caching design problem for a multi-cell wireless network [22], and develop low-complexity algorithms to obtain approximate solutions using results for multi-armed bandit problems. In [23] and [24], the authors formulate the dynamic optimal caching design problem for a single-cell wireless network [23] and a multi-cell wireless network [24], and develop approximate solutions using Q-learning techniques. However, the fixed network topologies considered in [20, 21, 22, 24, 23] cannot fully capture the geographic features of the locationsof BS and users. In addition, the caching solutions in [20, 21, 22, 24, 23] are for one BS [21, 23] or a single-tier of BSs [20, 22, 24].

Therefore, further studies are needed to optimize caching design in multi-tier wireless networks when perfect file popularity distribution is not known. In this paper, we consider optimal random caching design in three cases, i.e., the case of perfect file popularity distribution (where the file popularity distribution has been estimated, and the estimation error is negligible), the case of imperfect file popularity distribution (where the file popularity distribution has been estimated, and a deterministic bound of the estimation error is known) and the case of unknown file popularity distribution (where there is no prior information of the file popularity distribution, but instantaneous file requests from some users can be observed over time), in a large-scale multi-tier wireless network. Our main contributions are summarized below.

• In the case of perfect file popularity distribution, we formulate the STP maximization problem, which is nonconvex with a complicated objective function. We propose an efficient parallel iterative algorithm to obtain a stationary point based on parallel successive convex approximation (SCA) [25]. Specifically, by carefully designing an approximation function for each tier, we obtain closed-form optimal solutions for the approximate functions of all tiers at each iteration, and hence can significantly reduce the computational complexity and improve the convergence speed of the iterative algorithm.

• In the case of imperfect file popularity distribution, we formulate the worst-case STP (over all possible values of the true file popularity distribution) maximization problem. This is a challenging robust optimization problem which does not lie in the category of convex-concave games that can be easily solved. We transform it to an equivalent complementary geometric programming (CGP) and propose an efficient iterative algorithm to obtain a stationary point using SCA [26]. Note that this case corresponds to that considered in the second stage of the two-stage methods proposed in [15, 17, 18, 19, 16]. But the essential difference is that we explicitly consider the estimation error of file popularity distribution in the optimization of caching design.

• In the case of unknown file popularity distribution, we formulate the stochastic STP (i.e., the STP in the stochastic form) maximization problem. This is a challenging nonconvex stochastic optimization problem, and we develop an efficient parallel iterative algorithm to obtain a stationary point based on stochastic parallel SCA [27]. Specifically, by carefully designing an approximation function for each tier, we obtain closed-form optimal solutions for the approximate functions of all tiers at each iteration, make full use of instantaneous file requests from users at each slot, and significantly improve the convergence speed of the iterative algorithm. Note that this case corresponds to that considered in the single-stage methods proposed in [20, 21, 22, 23, 24]. The key difference is that we consider stochastic optimization in a large-scale wireless network which captures the channel fading, interference and stochastic nature of wireless networks.

• Finally, by numerical simulations, we show the convergence of the proposed solutions. We also show that the proposed solutions achieve notable gains over existing schemes in all three cases.

## Ii System Model

### Ii-a Network Model

In this part, we first elaborate on the network model which extends those in [8, 9, 10] in the sense that besides perfect file popularity distribution, it also models imperfect and unknown file popularity distributions. We consider a large-scale -tier network consisting of tiers of BSs, where ,222Note that for , the case of perfect file popularity distribution has been studied in our previous work [5], and the cases of imperfect and unknown file popularity distributions can be easily investigated following the results for in this paper. as shown in Fig. 1. The locations of the BSs in tier are spatially distributed as an independent homogeneous Poisson point process (PPP), denoted as , with density , for all . The locations of the users are also distributed as an independent homogeneous PPP . Consider a discrete-time system with time being slotted. Let denote the slot index. Each BS in tier has one transmit antenna with fixed transmission power . Each user has one receive antenna. All BSs are operating on the same frequency band and each BS adopts an orthogonal transmission mechanism over frequency or time at each slot. Both path loss and small-scale fading are considered: for path loss, a transmitted signal from either tier with distance is attenuated by a factor , where is the path loss exponent [7, 8, 9, 10]; for small-scale fading, at each slot, Rayleigh fading channels are adopted. Since a multi-tier network is primarily interference-limited, we ignore the thermal noise for simplicity[9].

Let denote the set of files in the -tier network. For ease of illustration, as in [6, 7, 8, 9, 10, 15], assume that all files have the same size.333The results in this paper can be easily extended to the case of different file sizes by considering file combinations of the same total size, but formed by files of possibly different sizes [10]. At each slot, a user requests at most one file at random. Let denote the file request status of user at slot , where if user does not request any file and if user requests file . For ease of analysis, assume , , are i.i.d. with respect to and  [7, 8, 9, 10, 15].444 The results in this paper can be extended to the case where there are multiple classes of users and file requests of users in the same class are i.i.d.. Denoted , , , where . Thus, represents the file popularity distribution, which usually evolves at a slower timescale. In this paper, we consider the following three cases of file popularity distribution.
Perfect file popularity distribution: In this case, we assume that the file popularity distribution has been estimated by some learning methods, and the estimation error is negligible. That is, the exact value of is known.
Imperfect file popularity distribution: In this case, we assume that the file popularity distribution has been estimated by some learning methods, and a deterministic bound of the estimation error is known. Note that this case corresponds to that considered in the second stage of the two-stage methods proposed in [15, 17, 18]. Let denote the estimated file popularity distribution, and let denote the estimation error. Assume , , and for some known , for all , which are usually satisfied for effective learning methods. The (true) file popularity distribution is given by , and satisfies , where , and for all .
Unknown file popularity distribution: In this case, we assume that there is no prior information of the file popularity distribution , but the file requests from the users in some set without bias can be observed over time. Here can represent the set of users located in the physical area of a BS or a cluster of BSs.555 For example, a user can submit its file request to its nearest BS, and each BS can gather file requests corresponding to an unbiased observation of the file popularity distribution. Note that this case corresponds to that considered in the single-stage methods proposed in [20, 21, 22, 23, 24].

The -tier network consists of cache-enabled BSs. In tier , each BS is equipped with a cache of size (in number of files) to store different popular files out of . We say every different files form a combination. Thus, there are in total different combinations, each with different files. Let denote the set of indices for the combinations. More detailed descriptions of and can be found in [5, 7, 10].

### Ii-B Caching and User Association

To provide high spatial file diversity, we consider random caching in the cache-enabled -tier network [8, 9, 10], as illustrated in Fig. 1. In particular, each BS in tier stores different files with certain probability [8, 9, 10]. The probability that combination is stored at each BS in tier is , where satisfies:

 0≤pm,i≤1, m∈M,i∈Im, (1) ∑i∈Impm,i=1,m∈M. (2)

A random caching design is specified by the caching distributions for file combinations . Let denote the set of indices for the combinations containing file . Based on , we also define the probability that file is stored at a BS in tier , i.e.,

 Tm,n≜∑i∈Im,npm,i,m∈M, n∈N. (3)

From [4, 5, 7, 10], we know that the constraints on in (1), (2) and (3) can be equivalently rewritten as the following constraints on :

 0≤Tm,n≤1,m∈M, n∈N, (4) ∑n∈NTm,n=Km,m∈M. (5)

The constraints in (5) are due to the fact that each file combination in tier contains different files and the sum of the caching probabilities for all file combinations is one. The details can be found in [5]. For any satisfying (4) and (5), one corresponding satisfying (1), (2) and (3) can be easily obtained using the method proposed in [4].666When the file popularity changes, the cache content of each BS can be easily updated based on its previous cached content. Random caching design is specified by the caching distributions for file combinations , the size of which is . However, later we shall see that the performance metrics and optimal random caching design problems considered in this paper depend only on the caching probabilities for files , the size of which is .

If a file is stored in a tier, a user requesting the file is associated with the BS which provides the maximum long-term average received power among all BSs storing the file [8, 9, 10]. Otherwise, the user will be served through other service mechanisms [8, 9, 10], the investigation of which is beyond the scope of this paper.777 As in [8, 9, 10], we assume that user association can be done through some signaling mechanisms. For example, a user can submit its file request to its nearest BS and associates with its serving BS via the help of its nearest BS. The probability that an arbitrary user requesting file is associated with tier is given by [8, 9, 10]:

 Am,n(T)=λmTm,nλmTm,n+∑l∈M∖{m}λlTl,n(PlPm)2α,m∈M,n∈N. (6)

### Ii-C Performance Metrics

In this paper, we study w.l.o.g. the performance of a typical user , which is located at the origin. Suppose requests file and is associated with tier . Let denote the index of the serving BS of . We denote and as the distance and the small-scale channel between BS and , respectively. For analytical tractability, as in [9, 10], we assume all BSs are active for serving their own users.888This assumption corresponds to the worst-case interference strength for the typical user. The performance obtained under this assumption provides a lower bound on the performance of the practical network where some void BSs may be shut down. In this case, the SIR of , denoted by , is given by [10]:

 SIRm,n,0=D−αm,ℓ0,0∣∣hm,ℓ0,0∣∣2∑ℓ∈Φm∖{ℓ0}D−αm,ℓ,0∣∣hm,ℓ,0∣∣2+∑j∈M∖{m}∑ℓ∈ΦjD−αj,ℓ,0∣∣hj,ℓ,0∣∣2PjPm. (7)

Note that the distribution of is affected by .

We assume that file delivered from tier can be decoded correctly at if , where represents a threshold for tier  [9]. Requesters are mostly concerned about whether their desired files can be successfully received. In the following, we introduce the performance metrics in the three cases of file popularity distribution.
Perfect file popularity distribution: When the exact value of is known, we adopt the probability that a randomly requested file by is successfully transmitted, called the STP [10]:

 q(a,T)≜ ∑m∈M∑n∈NanAm,n(T)Pr[SIRm,n,0≥τm] = ∑m∈M∑n∈NanTm,n∑l∈Mθl,mTl,n+ηm, (8)

as the performance metric,999 Note that the STP with fixed STR thresholds can be interpreted as the STP under multicast at the high user density region [5, 7, 10], and is a widely used performance metric in wireless caching [5, 7, 8, 9, 10], as it is tractable and is also closely related to some other important performance metrics, such as average transmission rate and average transmission delay. where and are given by:

 θl,m=2λlαλm(PlPmτm)2α(B′(2α,1−2α,11+τm)−B(2α,1−2α))+λlλm(PlPm)2α, (9)
 ηm=∑l∈M2λlαλm(PlPmτm)2αB(2α,1−2α), (10)

respectively. Here, and denote the complementary incomplete Beta function and the Beta function, respectively. Note that is a linear function of and a nonconcave function of .
Imperfect file popularity distribution: When the exact value of is not known except that it falls within a known set , we adopt the worst-case STP:

 qwt(A,T)≜mina∈Aq(a,T), (11)

as the performance metric.101010Note that one of the major techniques for designing systems that are robust against modeling uncertainties is to optimize the worst-case performance.
Unknown file popularity distribution: When there is no prior information of , but can be obtained at each slot for some , we adopt the STP in the stochastic form, called the stochastic STP:

 qst(a,T)≜E[q(ξ,T)], (12)

as the performance metric, where with denoting the indicator function, , and the expectation is take over . Note that as for all and .

In Section III, Section IV and Section V, we shall maximize the STP, the worst-case STP and the stochastic STP in the cases of perfect, imperfect and unknown file popularity distribution, respectively, as shown in Fig 2.111111 Based on the solutions in this paper. we can obtain promising caching designs under multicast at the general usesr density region using the method proposed in our previous work [5, 7].

## Iii Performance Optimization for Perfect File Popularity Distribution

In this section, we consider the case of perfect file popularity distribution. In this case, we would like to maximize the STP, by optimizing the caching probabilities. We formulate the optimal random caching design problem as follows.

###### Problem 1 (Optimization for Perfect File Popularity Distribution)
 q∗(a)≜maxT q(a,T) s.t.

where is given by (8).

Problem 1 is equivalent to Problem 0 in [9]. It is non-convex (as the objective function is non-convex in ), and in [9] a suboptimal solution of it is obtained by solving an approximate convex problem. In the following, we extend the technique in [10] for the case of to the case of , and develop an efficient parallel iterative algorithm to obtain a stationary point of Problem 1 using parallel SCA. Different from the cyclic computation mechanism in [10], the parallel computation mechanism here can speed up the computation, especially for large . Specifically, this algorithm updates the caching probabilities of the tiers, i.e., , , at each iteration in a parallel manner, by maximizing approximate functions of .

For notation convenience, define:

 qj(a,Tm,T−m)≜∑n∈NanTj,n∑l∈Mθl,jTl,n+ηj, (13)

where . Note that can be rewritten as:

 q(a,Tm,T−m)≜∑j∈Mqj(a,Tm,T−m).

Let denote the caching probabilities of tier obtained at iteration , and denote . At iteration , choose:

 hm(a,Tm,T(k))≜ (14)

as an approximation function of for updating . Note that the strongly concave component function of , i.e., , is left unchanged, and the other nonconcave (actually convex) component functions, i.e., , , , are linearized at . This choice of the approximate function is beneficial from several aspects[10]. Firstly, it can guarantee the convergence of the algorithm to a stationary point of Problem 1, which will be shown in Theorem 1. Secondly, it usually leads to fast convergence of the algorithm by exploiting the partial concavity of the objective function, which will be shown in Fig. 3. Thirdly, it yields a closed-form optimal solution of the optimization problem for each tier at each iteration, which will be shown in Lemma 1, and hence a low-complexity algorithm.

Specifically, at iteration , we first solve the following problem for each tier separately, in a parallel manner.

###### Problem 2 (Approximate Convex Problem of Problem 1 for Tier m at Iteration k)
 ¯¯¯¯T(k)m≜argmaxTm hm(,a,Tm,T(k−1)) s.t. 0≤Tm,n≤1,n∈N, (15) ∑n∈NTm,n=Km. (16)

Problem 2 is a convex optimization problem and Slater’s condition is satisfied, implying that strong duality holds. Based on KKT conditions, we can obtain a closed-form optimal solution of Problem 2.

###### Lemma 1 (Optimal Solution of Problem 2)

For all , the optimal solution of Problem 2 is given by:

 ¯¯¯¯T(k)m,n=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1θm,m         ⎷an(∑l∈M,l≠mθl,mT(k−1)l,n+ηm)ν∗(k)m+∑j∈M,j≠manθm,jT(k−1)j,n(∑l∈Mθl,jT(k−1)l,n+ηj)2−∑l∈M,l≠mθl,mT(k−1)l,n+ηmθm,m⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦10, (17)

where and is the Lagrange multiplier that satisfies .

Note that in Lemma 1 can be efficiently obtained by using bisection search which achieves a desired accuracy with computational complexity . Then, we update the caching probabilities of tier by:

 T(k)m=(1−γ(k))T(k−1)m+γ(k)¯¯¯¯T(k)m, (18)

where is a positive diminishing stepsize satisfying

 γ(k)>0,limk→∞γ(k)=0,∞∑k=1γ(k)=∞,∞∑k=1(γ(k))2<∞. (19)

Finally, the details of the proposed parallel iterative algorithm are summarized in Algorithm 1. Based on [25, Theorem 1], we can show the following result.

###### Theorem 1 (Convergence of Algorithm 1)

If the stepsize satisfies (19), then every limit point of generated by Algorithm 1 is a stationary point of Problem 1.

###### Proof:

Please refer to Appendix A. \qed

## Iv Robust Optimization for Imperfect File Popularity Distribution

In this section, we consider the case of imperfect file popularity distribution. In this case, we would like to maximize the worst-case STP, by optimizing the caching probabilities. We formulate the robust optimal random caching design problem as follows.

###### Problem 3 (Robust Optimization for Imperfect File Popularity Distribution)
 q∗wt≜maxT mina∈Aq(a,T)=qwt(A,T) s.t.

where is given by (8).

Problem 3 is a challenging maximin problem, which does not lie in the category of convex-concave games that can be easily solved (as is a nonconcave function of ). In the following, we solve it in two steps.

Firstly, we transform the maximin problem in Problem 3 to an equivalent maximization problem. As the inner problem is a linear programming (LP) with respect to and strong duality holds for LP, the inner problem shares the same optimal value with its dual problem. Thus, we can transform Problem 3 to the following equivalent maximization problem by replacing the inner problem with its dual problem.

###### Problem 4 (Equivalent Problem of Problem 3)
 q⋆wt≜maxT,λ⪰0,μ⪰0,ν ∑n∈N(λna––n−μn¯¯¯an)−ν s.t. ∑m∈MTm,n∑l∈Mθl,mTl,n+ηm+μn−λn+ν=0,n∈N, (20)

where and .

Note that , and are dual variables for the dual problem of the inner problem, corresponding to , , , and , respectively.

###### Lemma 2 (Equivalence between Problem 3 and Problem 4)

Problem 3 and Problem 4 have the same optimal value and optimal caching probabilities.

###### Proof:

Please refer to Appendix B. \qed

Based on Lemma 2, we can solve Problem 4 instead of Problem 3. Problem 4 is nonconvex, as the constraints in (20) are nonconvex. In what follows, we show how to obtain a stationary point of Problem 4 using SCA. We first rewrite as with , and define new variables :121212Note that , as , and in most practical cases,  [10]. Thus, in the rest of this paper, we consider the case where for all and .

 xm,n=M∑l=1θl,mTl,n+ηm,m∈M,n∈N. (21)

We also introduce a new variable which serves as a lower bound of the objective function of Problem 4:

 y≤∑n∈N(λna––n−μn¯¯¯an)−ν. (22)

Therefore, Problem 4 can be equivalently transformed to the following problem.131313For ease of analysis, in Problem 5, we consider instead of , which does not change the optimal value or affect the numerical solution.

###### Problem 5 (Equivalent Problem of Problem 4)
 maxT,x,λ,μ≻0ν1,ν2,y>0 y s.t. y+∑n∈Nμn¯¯¯an+ν1∑n∈Nλna––n+ν2≤1, (23) λn+ν2∑m∈MTm,nx−1m,n+μn+ν1≤1,n∈N, (24) ∑l∈Mθl,mTl,n+ηmxm,n≤1,m∈M, n∈N, (25) Tm,n≤1,m∈M, n∈N, (26) (27)

Note that the inequality constraints in (24), (25) and (27) are active at any optimal solution of Problem 5, and hence can replace the equality constraints in (20), (69) and (5), respectively. In Problem 5, a monomial is maximized subject to upper bounds on posynomials (i.e., (25), (26) and (27)) and upper bounds on the ratios of posynomials (i.e., (23) and (24)). Thus, Problem 5 is a CGP, and can be solved by the method proposed in [26], which is based on SCA. The main idea is to solve a sequence of successively refined geometric programmings (GPs), each of which is obtained by approximating the denominators of the ratios of posynomials in (23) and (24) with monomials. Specifically, at iteration , update by solving the following approximate GP of Problem 5, which is parameterized by obtained at iteration .

###### Problem 6 (Approximate GP at Iteration k)
 (T(k),x(k),λ(k),μ(k),ν1(k),ν2(k),y(k))≜argmaxT,x,λ,μ≻0ν1,ν2,y>0y s.t. y+∑Nn=1μn¯¯¯an+ν1∏n∈N(λna––nσ(k)n)σ(k)n(ν2γ(k)1)γ(k)1≤1, (28) λn+ν2∏m∈M(Tm,nx−1m,nβ(k)m,n)β(k)m,n(μnγ(k)2,n)γ(k)2,n(ν1γ(k)3,n)γ(k)3,n≤1,n∈N, (29)

where

 σ(k)n≜ λ(k−1)na––n∑n∈Nλ(k−1)na––n+ν(k−1)2, β(k)m,n≜ T(k−1)m,n(x(k−1)m,n)−1∑m∈MT(k−1)m,n(x(k−1)m,n)−1+μ(k−1)n+ν(k−1)1, γ(k)1≜ ν(k−1)2∑n∈Nλ(k−1)na––n+ν(k−1)2, γ(k)2,n≜ μ(k−1)n∑m∈MT(k−1)m,n(x(k−1)m,n)−1+μ(k−1)n+ν(k−1)1, γ(k)3,n≜ ν(k−1)1∑m∈MT(k−1)m,n(x(k−1)m,n)−1+μ(k−1)n+ν(k−1)1.

Problem 6 is a standard GP, which can be readily transformed into a convex problem and solved using standard convex optimization techniques, such as the barrier method which achieves a desired accuracy with computational complexity . The details for solving Problem 5 are summarized in Algorithm 2. By the convergence result in [26, Proposition 3], and by comparing the KKT conditions of Problem 4 and Problem 5, we have the following result.

###### Theorem 2 (Convergence of Algorithm 2)

, , , , , , obtained by Algorithm 2 converges to a stationary point of Problem 5, as . Furthermore, the limit point of , , , is a stationary point of Problem 4.

###### Proof:

Please refer to Appendix C. \qed

## V Stochastic Optimization for Unknown File Popularity Distribution

In this section, we consider the case of unknown file popularity distribution. In this case, we would like to maximize the stochastic STP, by optimizing the caching probabilities. We formulate the stochastic optimal random caching design problem as follows.

###### Problem 7 (Stochastic Optimization for Unknown File Popularity Distribution)
 maxT E[q(ξ,T)]=qst(a,T) s.t.

where is given by (8).

Note that although cannot be calculated without knowledge of the statistics of , it can be optimized using stochastic optimization.141414The basic idea of stochastic optimization is to optimize a function in the presence of randomness based on the fact that realizations of random parameters can be obtained. Problem 7 is a nonconvex stochastic optimization problem, which is more challenging than a convex one. In the following, we develop an efficient parallel iterative algorithm to obtain a stationary point of Problem 7, using stochastic parallel SCA [27]. Similarly, the parallel computation mechanism here can speed up the computation, especially for large . Specifically, this algorithm updates the caching probabilities of the tiers, i.e., , , at each slot in a parallel manner, by maximizing approximate functions of .

Let denote the caching probabilities of tier obtained at slot , and denote . At slot , choose:

 ˆhm(ξ(t),Tm,T(t−1))=ρ(t)(qm(ξ(t),Tm,T(t−1)−m)− ∑j∈M,j≠m∑n∈Nξ(t)nθm,j,KjT(t−1)j,n(Tm,n−T(t−1)m,n)(∑l∈Mθl,j,KjT(t−1)l,n+ηj,Kj)2)+(1−ρ(t))∑n∈N(Tm,n−T(t−1)m,n)f(t−1)m,n (30)

as an approximation function of for updating . Here, ,151515 We consider a simple way of making use of instantaneous file requests at each slot without assuming any apriori information of the file popularity distribution. Our focus here is to optimize random caching design using stochastic optimization, instead of pure estimation of file popularity. , is a positive diminishing stepsize satisfying:

 ρ(t)>0,limt→∞ρ(t)=0,∞∑t=1ρ(t)=∞,∞∑t=1(ρ(t))2<∞, (31)

and is given by:

 f(t)m,n=(1−ρ(t))f(t−1)m,n+ρ(t)⎛⎜ ⎜⎝ξn∑l∈Mθl,mT(t)l,n+ηm−∑j∈Mξnθm,jT(t)j,n(∑l∈Mθl,jT(t)l,n+ηj)2⎞⎟ ⎟⎠, (32)

where , , .

Note that the strongly concave component function of , i.e., , is left unchanged, and the other nonconcave (actually convex) component functions, i.e., , , , are linearized at . In addition, note that the approximation of at each slot , i.e., , becomes more accurate as increases, and the approximation of based on accumulated instantaneous file requests , , i.e., , becomes more accurate as increases. This choice of the approximate function, , given in (30), is beneficial for similar reasons as in the case of perfect file popularity distribution.

Specifically, at slot , we first solve the following problem for each tier separately, in a parallel manner.

###### Problem 8 (Approximate Convex Problem of Problem 7 for Tier m at Slot t)
 ˆT(t)m≜argmaxTm ˆhm(ξ(t),Tm,T(t−1)) s.t.

Problem 8 is a convex optimization problem and Slater’s condition is satisfied, implying that strong duality holds. Based on KKT conditions, we can obtain a closed-form optimal solution of Problem 8.

###### Lemma 3 (Optimal Solution of Problem 8)

For all , the optimal solution of Problem 8 is given by:

 ˆT(t)m,n=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣1θm,m         ⎷ρ(t)ξ(t)n(∑l≠m,l∈Mθl,mT(t−1)l,n+ηm)ν∗(t)m+ρ(t)∑j≠mj∈Mξ(t)nθm,jT(t−1)j,n(∑l∈Mθl,jT(t−1)l,n+ηj)2−(1−ρ(t))f(t−1)m,n−∑l