Gibbsian OnLine Distributed Content Caching Strategy for Cellular Networks
^{1}
^{2}
Abstract
In this paper, we develop Gibbs sampling based techniques for learning the optimal placement of contents in a cellular network. We consider the situation where a finite collection of base stations are scattered on the plane, each covering a cell (possibly overlapping with other cells). Mobile users request for downloads from a finite set of contents according to some popularity distribution which may be known or unknown to the base stations. Each base station has a fixed memory space that can store only a strict subset of the contents at a time; hence, if a user requests for a content that is not stored at any of its serving base stations, the content has to be downloaded from the backhaul. Hence, we consider the problem of optimal content placement which minimizes the rate of download from the backhaul, or equivalently maximize the cache hit rate. It is known that, when multiple cells can overlap with one another (e.g., under dense deployment of base stations in small cell networks), it is not optimal to place the most popular contents in each base station. However, the optimal content placement problem is NPcomplete. Using ideas of Gibbs sampling, we propose simple sequential content update rules that decide whether to store a content at a base station (if required from the base station) and which content has to be removed from the corresponding cache, based on the knowledge of contents stored in its neighbouring base stations. The update rule is shown to be asymptotically converging to the optimal content placement for all nodes under the knowledge of content popularity. Next, we extend the algorithm to address the situation where content popularities and cell topology are initially unknown, but are estimated as new requests arrive to the base stations; we show that our algorithm working with the running estimates of content popularities and cell topology also converges asymptotically to the optimal content placement. Finally, we demonstrate the improvement in cache hit rate compared to most popular content placement and independent content placement strategies via numerical exploration.
ontent caching, online cache update, cellular network, hit rate maximization, Gibbs sampling.
1 Introduction
The proliferation of smartphones and tablets equipped with 3G and 4G connectivity and the fast growing demand for downloading multimedia files have resulted in severe overload in the internet backhaul, and it is expected to be worse with the advent of 5G in near future. Recent idea of densifying cellular networks will improve wireless throughput, but this will eventually push the backhaul bandwidth to its limit. In order to alleviate this problem, the idea of caching popular multimedia contents has recently been proposed. Given the fact that the popular contents are requested many times which results in network congestion, one way to reduce the congestion is to cache the popular contents at various intermediate nodes in the network. In case of cellular network, this requires adding physical memory to base stations (BSs): macro, micro, nano and pico. This has several advantages: (i) Caching contents at base stations reduce backhaul load. (ii) Caching reduces delay in fetching the content, thereby reducing the multimedia playback time. (iii) Caching will allow the end user to download a lower quality content in case his channel quality or bad or in case he wants to control his total amount of download.
Under dense placement of base stations, it is often the case that the cells
(a cell is defined to be a region around a BS where
the user is able to get sufficient downlink data rate from the BS) of different
BSs might overlap with each other in an arbitrary manner
(see [1]).
Hence, if a user is covered by multiple BSs, she has the option to download a content
from any one of the serving BSs. This gives rise to the
problem of optimal content placement in the caches of cellular BSs (see [2], [3]);
the tradeoff is that ideally the caching strategy should avoid placing the same
content in two BSs whose cells have a significant overlap,
while it is not desirable for the nonoverlapped region.
1.1 Related Work
There have been considerable amount of work in the literature dedicated to cellular caching. Benefits and challenges for caching in 5G networks have been described in [6]. The authors of [7] have developed a method to analyze the performance of caches (isolated or networked), and shown that placing the most popular subset of contents in each cache is not optimal in case of interconnected caches. The paper [3] deals with optimal content placement in wireless caches given BSuser association. The authors of [8] have addressed the problem of optimal content placement under user mobility. The authors of [2] have proposed a randomized content placement scheme in cellular BS caches in order to maximize cache hit rate, but their scheme assumes that the contents are placed independently across the caches, which is obviously suboptimal. This work was later extended to the case of heterogeneous networks in [9]. The authors of [10] have again considered independent probabilistic caching in a random heterogeneous network. The paper [11] has addressed the problem of cache miss minimization in a random network setting. The authors of [12] have studied the problem of distributed caching in ultradense wireless small cell networks using mean field games; however, this formulation requires us to take base station density to infinity (which may not be true in practice), and it does not provide any guarantee on the optimality of this caching strategy. The paper [13] proposes a pricing based scheme for jointly assigning content requests to cellular BSs and updating the cellular caches; but this paper focuses on certain cost minimization instead of hit rate maximization, and it is optimal only when we can represent the data by very large number of chunks which can be used in employing rateless code. The problem of collaborative but decentralized caching among small base stations for a certain cost minimization has been analyzed in [14], under the assumption that the caches have access to the contents of other caches connected to the same gateway; their formulation involves a certain cost for retrieval of a content from another cache. The authors of [15] address the problem of minimizing energy consumption under multicell transmission cooperation for interference reduction and content caching in heterogeneous networks. Since content providers might have to pay cellular network operators for caching their contents, an important question is how to cache contents among multiple base stations to that the content placement charge is minimized; this problem has been addressed by the work reported in [16]. The authors of [17] have considered the problem of collaborative content placement at caches of multiple base stations, but under the assumption that cache sizes at base stations are unlimited. The paper [18] discusses cooperative content caching and delivery policy among multiple base stations. The paper [19] provides a fast but suboptimal solution based on potential game formulation, to the problem of minimizing cache miss rate when multiple base stations have overlapping cells. [19] also provided one simulated annealingbased algorithm (different from our Gibbs sampling approach) that minimizes the cache miss rate.
The paper [20] analyzes a stochastic geometry framework where cacheenabled small base stations are randomly placed on infinite two dimensional plane, and calculated the expressions for the outage probability of a typical user (jointly in terms of SINR and content availability at the cache), as well as the delivery rate. The authors of [21], for a randomly deployed heterogeneous network, derive approximate expressions for the average delivery rate considering intertier and intratier dependence. The authors of [22] analyze the average delay of users for a random twotier network under perfect knowledge of content popularity distribution and randomized caching policies. All of these papers consider a stochastic geometry framework for base station locations, and assume limited backhaul. However, in our current paper, we consider known placement of a finite number of base stations, and seek to maximize the cache hit rate over the entire network; thus, our work seeks to reduce the load in the backhaul without imposing a hard constraint on the backhaul capacity. It is worth mentioning that, under this setting, we provide decentralized cache update schemes which are hitrate optimal for a finite network in a timeaverage sense.
The authors of [23] and [24] propose learning schemes for unknown timevarying popularity of contents, but their scheme does not have theoretical guarantee of convergence to the optimal content placement across the network when cells of different BSs overlap with each other. The paper [25] establishes that, when popularity is dynamic, any scheme that separates content popularity estimation and cache update (i.e., control) phases is strictly orderwise suboptimal in terms of hit rate. A big data approach has been taken in [26] for estimating content popularities empirically from mobile traffic data collected from a telecom operator. The authors of [27] proposed simulated annealing based caching for a single cache, and also addressed the issue of unknown content popularities by proposing an algorithm that avoids direct popularity estimation.
Contrary to the prior literature, our current paper provides theoretical guarantee of convergence for an optimal distributed cellular cache update scheme that maximizes the timeaverage cache hit rate over the network involving caches in multiple base stations with overlapping cells; this minimizes the amount of data downloaded from the backhaul. The results also hold when popularities and cell topology are unknown initially and are learnt over time using the information of request arrivals in the base stations.
1.2 Organization and Our Contribution
The rest of the paper is organized as follows.

The system model has been defined in Section 2.

In Section 3, we propose an update scheme for the caches based on the knowledge of the contents cached in neighbouring BSs. The update scheme is based on Gibbs sampling techniques, and cache updates are made only when new content requests arrive. The scheme asymptotically converges to a nearoptimal content placement in the network, since the scheme is proposed for a finite “inverse temperature” to be defined later. We prove convergence of the proposed scheme. To the best of our knowledge, such a scheme has never been used in the context of caching in cellular network.

In Section 5, we discuss how to adapt the update schemes to the situation when unknown content popularities and cell topology are learnt over time as new content requests arrive to the BSs over time.

In Section 6, we numerically demonstrate that the proposed Gibbs sampling approach has the potential to significantly improve the cache hit rate in cellular networks.

Finally, we conclude in Section 7.
2 System Model and Notation
2.1 Network Model
We consider a finite set of base stations (BSs) on the twodimensional Euclidean space. The location of the base stations are deterministic and arbitrary; for example, the locations could come from a given realization of a point process over a finite geographical region. The set of points covered by a BS constitute the cell of the corresponding BS. This coverage could be signal to noise ratio (SNR) based coverage where a point is covered by a BS if and only if the SNR at that point from the BS exceeds some threshold. We denote the cell of BS () by . Let us define . The area of any subset of is denoted by . We allow the cells of various BSs to have arbitrary and different finite areas. The cells of two BSs might have a nonzero intersection; any downlink mobile user located at such an intersection is covered by more than one BS. Let us denote by the collection of all subsets of , and let denote one such generic subset. Let us denote by the region in which is covered only by the BSs from the subset . See Figure 1 for a better understanding of the cell model.
2.2 Content Request Process
Contents from a set are requested by users located inside . We assume that each of these contents have the same size, though we will explain at the end of Section 3 how to easily take care of unequal content sizes in our analysis. Content () is requested by users according to a homogeneous Poisson point process in space (inside ) and time with intensity ; this is the expected number of requests for content per second per square meter inside . Let . Note that, denotes the probability that a content request is for content ; in other words, is the popularity of content . We also assume that .
It is worth mentioning that the cache request process essentially follows the popularly known independent request model (IRM) as described in [28].
2.3 Content Caching at BSs
We assume that each BS can store number of contents, where . Let denote a generic configuration of content placement in caches of the network. is defined as a matrix with if content is stored at the cache of BS , and otherwise. Note that, any feasible must satisfy for all ; we rule out the possibility of since that will be a waste of cache memory resources in BSs. Let us denote the set of all feasible configurations by . Clearly, the cardinality of is . Apart from , we will also use the symbol for a generic configuration belonging to set .
2.4 Cache Hit Rate Maximization Problem
We assume that, whenever a new request for a content arrives, it is served
by one BS covering that point and having the content in its cache; if a content request is served from the cache, we call
the event as a cache hit. In case no covering BS has the content (i.e., no cache hit, or cache miss),
the content needs to be downloaded by one of the covering BSs and served to the user
(this will be explained later). The requests do not tolerate any delay; i.e., we do not consider
the possibility of holding the requests in a queue and serving the content to users in batch
once the content becomes available in a BS. Also, we assume infinite bandwidth available for all downlink transmissions; i.e.,
each content is assumed to be served instantaneously.
Let the random variable denote the number of cache hits in the entire network in unit time, under configuration . We define the cache hit rate where the expectation is over the randomness in the content request arrival process. Clearly,
(1) 
In this paper, we are interested in finding an optimal configuration which achieves:
(2) 
Cache hit rate has been considered as the objective function in prior literature; see [11], [2], [7] and [29] for reference. The authors of [29] considered hit rate maximization under coded caching. However, one can consider other objective functions such as latency in content delivery as in [30]; we choose cache hit rate since it is a commonly used objective function. Cache hit rate is a suitable objective function when the requested content needs to be served instantaneously; if the requests are delaytolerant, then queueing of the requests and contents are allowed and there latency in content delivery would be a more suitable objective function. It is worth mentioning that, in case requests and contents are allowed to be queued at the base station, there is no formal proof that maximizing hit rate will minimize the latency in content delivery, though intuitively one can expect so.
(2) is an optimization problem with integer variables, nonlinear objective function and linear constraints. This class of problems has been shown to be NPcomplete (see [31]), and hence, we cannot expect any polynomial time algorithm to solve (2). Hence, in this section, we provide iterative, distributed cache update scheme that asymptotically solves the problem. However, since the algorithm is iterative, we cannot use the optimal configuration over infinite time horizon. Hence, we seek to design a randomized iterative cache update scheme which yields
(3) 
where is the configuration of all caches in the network at time . Our iterative scheme is randomized, which renders a random variable; hence, we work with the expectation .
It is important to note that, by maximizing the cache hit rate, we seek to minimize the download rate from the backhaul; this is necessary because backhaul capacity is limited in practice, and, also, downloading a content from a server via the backhaul link might involve certain cost. However, we do not consider any specific upper limit on the backhaul link capacity. If the backhaul link is blocked due to heavy load or due to finite backhaul capacity, a content request which is not able to find a match in the caches of its covering base stations can either be dropped or kept waiting for service hoping that the backhaul load will be reduced later. If the content request arrival statistics is approximately known to the network operator prior to cache installation at the base stations, the operator can simulate the cache update scheme and estimate the average download rate required for the backhaul under the scheme; this estimate can be used as a design guideline for choosing the backhaul capacity. Hence, for the rest of the paper, we assume sufficient backhaul capacity to deal with cache miss.
3 Cache Update via Basic Gibbs Sampling
In this section, we propose an iterative, randomized cache update scheme so that the timeaverage occupancy of each under the scheme follows certain distribution called Gibbs distribution. In Section 4, we explain how tuning a certain parameter of the Gibbs distribution helps us in solving problem (3).
Let us rewrite (1) as where
(4) 
We call to be the cache hit rate seen by BS under configuration . This will be the true cache hit rate seen by BS under configuration if a new content request is served by one covering BS chosen uniformly from the set of covering BSs having that content. Note that, if more than one covering BSs have that content, choice of the serving BS will not affect the hit rate; hence, we can safely assume uniform choosing of the serving BS.
In order to solve , we propose to employ Gibbs sampling techniques (see [32, Chapter ]). Let us assume that each BS maintains a virtual cache capable of storing contents. The broad idea is that one can update the virtual cache contents in an iterative fashion using Gibbs sampling. Whenever a content is requested from a BS not having the content in its physical (real) cache, the BS will download it from the backhaul and, at the same time, will decide to store it in the real cache depending on whether it is stored in its virtual cache or not.
We will update the virtual caches according to a stochastic iterative algorithm so that the steady state probability of configuration becomes:
(5) 
where is called the “inverse temperature” (motivated by literature from statistical Physics), and is called the partition function.
Note that, . Hence, if we choose configuration for all virtual caches with probability , then, for sufficiently large , the chosen configuration will belong to with probability close to . If real cache configuration closely follows virtual cache configuration, we can achieve nearoptimal cache hit rate for real caching system.
3.1 Gibbs sampling approach for “virtual” cache update
Let us consider discrete time instants when virtual cache contents are updated; this is different from the continuous time used before. Let us denote the configuration in all virtual caches in the network after the th decision instant by , where . The Gibbs sampling algorithm simulates a discretetime Markov chain on state space , whose stationary probability distribution is given by .
Let us define the set of neighbours of BS (including BS ) as . Let us denote by the restriction of configuration to all BSs except BS , i.e., is obtained by deleting the th column of . Let denote the conditional distribution of the networkwide configuration conditioned on , under the joint distribution . Clearly, if .
If , then
(6) 
where is the sum of all components of the vector .
Note that, there is common factor in both numerator and denominator of the expression in (6), since this term does not depend on the contents in the virtual cache in BS . Hence, (6) can be further simplified as:
(7) 
Now, let us define to be the hit rate seen by BS under configuration due to the content requests generated from the region . Clearly, , since the hit rate at BS under configuration is equal to the sum of hit rates by requests generated from all possible segments . Now, note that, the term is a common factor in the numerator and denominator of the expression in (7), since this factor does not depend on the contents in the virtual cache of BS . Hence, when , we can simplify (7) further as follows:
(8) 
where
(9) 
We now describe an algorithm for sequentially updating the networkwide virtual cache configuration .
Algorithm 1
Start with an arbitrary . At discrete time , pick a node randomly having uniform distribution from . Then, update the contents in the virtual cache of BS by picking up a networkwide virtual cache configuration with probability . Only contents in the virtual cache of BS are modified by this operation.Proposition 1
Under Algorithm 1, is a reversible Markov chain, and it achieves the steadystate probability distribution .
The proof is standard, and it follows from the theory in [32, Chapter ]).
Remark 1
In Algorithm 1, in order to make an update at time , BS needs to know the contents of the virtual caches only from . This requires information exchange between BS and its neighbours in each slot. Such information exchange may happen through the backhaul network, but this does not exert much load on the backhaul since the actual contents are not exchanged via the backhaul in this process.
Remark 2
The denominator in the simplified sampling probability expression in (8) requires a summation over all possible virtual cache configurations in . This allows the system to avoid the huge combinatorial problem of calculating which requires addition operations. The advantage will be even more visible if we consider the possibility of varying with time or learning over time if they are not known; the optimization problem will change over time in this case, and it will require calculation of the partition function in each slot. However, for large and , the computations per iteration in (8) can still be large; in this case, at each , one can randomly remove one content from the virtual cache of and then replace it by one content (from contents not present in the virtual cache of ) using Gibbs sampling; this will involve a summation in the denominator of (8) over all possible configurations that can possibly result from this replacement, and it will require only computations. One can easily show that will be a reversible Markov chain with stationary distribution under this variant. However, for the sake of notational simplicity, we do not consider this variant in the theory part of the paper.
3.2 The real cache update scheme for fixed
Now we propose a cache update scheme for the real caches present in the BSs. Our scheme decides to store a content in the cache of a BS only when the content is requested from that BS. This eliminates the necessity of any unnecessary download from the backhaul.
Let us consider content request arrivals at continuous time (denoted by again) to the BS. Let us recall that the virtual caches are updated only at discrete times . We assume that these discrete time instants units are superimposed on the continuous time axis . Hence, is defined to be equal to for , where .
Let us consider an increasing sequence of positive real numbers (viewed as time durations) such as as . Let . Let and .
The real cache update scheme is given as follows:
Algorithm 2
Start with some arbitrary . At time , if the request for content arrives at BS (either because no other covering BS has this content or because BS has been chosen from among the covering BSs having content ), then BS does the following: If BS has content , it will serve that. If BS does not have content , it serves the content by downloading from the backhaul. Then content is stored in the real cache of BS if and only if (i.e., if content was stored in the virtual cache of BS at time ). If the BS decides to store content then, in order to make room for the newly stored content , any content such that and , is removed from the real cache of BS .Remark 3
The idea behind taking as in Algorithm 2 is as follows. We know that reaches the distribution as . As , the fraction of time spent during in copying the contents present in to real caches becomes negligible, and the real caches are allowed to operate larger and larger fraction of time under content distribution close to .
Now we make the following assumption:
Assumption 1
for all .
Theorem 1
See Appendix B.
Remark 4
Note that, Assumption 1 is very crucial in the proof of Theorem 1, because this assumption ensures that every BS gets content requests at some nonzero arrival rate, and hence can update its real cache at strictly positive rate. If Assumption 1 is not satisfied, then one can still achieve near optimal hit rate in real caches. It is achieved under a scheme where a new content request is sent to any of its covering BSs with very small probability , and otherwise the request is sent to a covering BS having that content. Similar analysis as in this paper can show that the timeaverage expected hit rate under this scheme differs from the optimal hit rate only by a small margin which goes to as .
Remark 5
Note that, Algorithm 2 will work for any sequence so long as the sequence increases to infinity. However, the speed of convergence will depend on the specific choice of the sequence, and also on system parameters such as content popularities, arrival rates and cellular network topology. An analytical characterization of the speed of convergence as a function of is hard, so we leave it for future research endeavours on this topic.
Incorporating unequal content sizes in our model: If is the size of content in bytes and is the memory of a cache in bytes, then any feasible configuration for unequal content sizes must satisfy the condition for all (instead of for all as required for equal content sizes with being the maximum possible number of contents per cache); the collection of such feasible matrices is called . Clearly, the set of feasible configurations is redefined for unequal content sizes. However, given this new , Algorithm 2 will still work since the virtual and real cache update schemes depend on the set (which is a collection of matrices) and not on the actual content sizes. Convergence of all algorithms proposed later will also hold in case content sizes are unequal, though the convergence rates will vary depending on the exact . Note that, for unequal content sizes, the best choice of is the mean cache hit rate in bytes per second, i.e., . This new objective function is separable across base stations and hence the virtual cache update rules for fixed will have similar form as (7) and (8); as a result, this modified will not alter the structures of the algorithms at all. For the rest of the paper, we will use (1) as a definition of for the sake of simplicity.
4 Varying to Reach Optimality
In this section, we discuss how to vary the inverse temperature to infinity with time so that the Gibbs sampling algorithm (used to update virtual caches) converges to the optimizer of (2). Here the intuition is that, Gibbs sampling with increasing , combined with Algorithm 2 for real cache update, will achieve optimal timeaverage expected cache hit rate for problem (3).
Let us define
Algorithm 3
This algorithm is analogous to Algorithm 1 except that, at discrete time instant , we use instead of fixed , where is the initial inverse temperature satisfying and .Theorem 2
Under Algorithm 3 for virtual cache update, the discrete time nonhomogeneous Markov chain is strongly ergodic, and the limiting distribution satisfies:
See Appendix C for the proof. The definition of strong ergodicity can be found in Appendix A. We have used some results from [32, Chapter ] in the proof.
Theorem 3
The first part of the proof follows using similar arguments as in the proof of Theorem 1. The second part follows from the first part using the fact that .
Remark 6
From [2, Figure ], we notice that independent placement of contents across BSs can significantly outperform the placement of most popular contents in each BS cache (for a Poisson distributed network). However, our proposed scheme yields the optimal hit rate for every realization of the location of BSs, so long as the number of BSs is finite. Hence, we can safely claim that our proposed scheme significantly outperforms the placement of most popular contents in each BS cache.
4.1 Convergence rate of the virtual cache update scheme
While we are not aware of any closedform bound on the convergence rate for Algorithm 3, by using [32, Chapter , Theorem ], we can provide convergence rate guarantee for Algorithm 1. Let us consider the Markov chain , where , evolving under Algorithm 1, and let us denote the corresponding transition probability matrix (t.p.m.) by . Let us denote the Dobrushin ergodic coefficient of by (see the proof of Theorem 2 in Appendix C).
Let us define
Note that, for any , the quantity for does not depend on the contents in the caches of base stations outside .
Now, let us recall Equation (8). In a way similar to the proof of Theorem 2 in Appendix C, we can show . Then, by [32, Chapter , Theorem ], the total variation distance between (i.e., the probability distribution of ) and the steady state distribution is upper bounded as:
We can prove similar results for the Markov chain for any . Clearly, the R.H.S. of the above equation increases with . Hence, under Algorithm 3, we can expect slower convergence rate as time increases. It has to be noted that there is a tradeoff between convergence rate and the accuracy of the virtual cache update scheme using Gibbs sampling; higher accuracy (by taking very large ) obviously requires longer time because of slow convergence rate. It also suggests that the rate of convergence decreases with (provided that other parameters such as and are fixed). Note that, and the difference between these two terms is large for large . Hence, this provides a reasonably tight bound on the convergence rate for large .
5 Learning Content Popularities and Cell Topology
In previous sections, we assumed that the content request arrival rates per unit area, , and the areas are known to all BSs. But, in practice, these quantities may not be known apriori, and one has to estimate these quantities over time as new content requests arrive to the system. In this section, we will extend Algorithm 3 to adapt to learning of these quantities.
At time slot , the BS (uniformly chosen from the set of BSs) chooses its virtual contents in such a way that the probability of choosing networkwide configuration at time is .
Let us recall the expression for from (9). Clearly, if one can estimate for all possible , then one can have an estimate of . This can be done by estimating the request arrival rate for content from the region ; this is easy to do because this is a timehomogeneous Poisson process with rate request per unit time.
Let us assume that each BS has an estimate for in slot . This can be done through continuous message exchange among the BSs which observe the content request arrival process over time.
Now we present the virtual cache update algorithm.
Algorithm 4
This algorithm is same as Algorithm 3 except that the estimate is used at slot by BS , instead of the actual value of .Assumption 2
almost surely for all .
Assumption 2 ensures that each BS has an estimate of the total request arrival rate for content in the segment of the plane, and this estimate converges to the true value as time progresses. This can simply be achieved if the number of arrivals for various contents at each are recorded in the system, and are communicated periodically to all base stations in the network. As time progresses, more requests come to each segment and the estimates become better and closer to their respective mean values.
Assumption 3
is unique.
Theorem 4
See Appendix D.
Remark 7
Assumption 3 is a technical requirement for Theorem 4. The reason is that, when , the limiting transition probability matrix of the nonhomogeneous Markov chain is ergodic if there is a single maximizer in , otherwise the ergodicity cannot be guaranteed; ergodicity of is a technical requirement in the proof of Theorem 4. However, we considered Algorithm 3 for virtual cache update in the statement of Theorem 4, since it uses increasing . In practical applications, will be kept constant at a large but finite value, and will be irreducible, ergodic in that case even when there are more than one maximizers; hence, Algorithm 1 for virtual cache update along with popularity and topology learning, will return an optimal configuration with the same high probability even when there are more than one maximizer configurations. Also, uniqueness of the maximizer is a practical assumption since, due to the nonuniform cell structure over a large region, it is highly unlikely that two different configurations will have the same hit rate.
Theorem 5
The proof is similar to that of Theorem 3.
6 Performance Improvement Using Gibbs Sampling
In this section we discuss the performance of the proposed Gibbs sampling content placement (GSCP), which is based on Algorithm 1. We compare it with two popular reference solutions: (i) most popular content placement (MPCP) in each BS, and (ii) independent content placement (ICP) as in [2]. Let us recall that, this latter method involves supplying all BSs a common distribution with which each of them has to randomly choose its cache contents; this distribution is calculated as a function of the content popularities and BS coverage probabilities, so as to maximize the average cache hit probability of a typical request.
6.1 Optimality
The MPCP is hit rate optimal when there is no cell overlapping.
The ICP maximizes the conditional cache hit probability (given coverage) averaged over all possible locations of BS in the (infinite, stationary) model, assuming some given coverage probabilities (which is the distribution of the number of BSs covering a typical point) and independent selection of cache contents at each base station. It outperforms (on average) the MPCP (which can be seen as independent content placement with some particular, nonoptimized, deterministic content distribution); see [2] for details. The gain with respect to the MPCP is bigger when there is more cell overlapping in the model. Our GSCP maximizes the hit rate (for a finite network deployment region) for any given placement of BSs.
6.2 Asymptotic performance
It might be interesting to compare first the asymptotic performance of the three solutions under two extremal situations:
Little overlapping cells
By this we mean a network where the overlapping of cells is negligible. A specific example would be a Poisson Boolean model for which the product of the intensity of BSs and the mean area of a cell is small. An extremal nonoverlapping model is the Voronoi or, more generally, any tessellation.
It is easy to see that in this regime all three solutions MPCP, ICP and GSCP are equivalent; all will tend to store the most popular content in all BS. Hence, the conditional hit probability of a typical request, given coverage, is equal to , and the cache hit rate per unit covered area becomes .
Highly overlapping cells
By this we mean a network where the number of stations covering the typical location increases in some sense to infinity as it is the case, e.g., for the Poisson Boolean model with the product of the intensity of BSs and the mean area of a cell going large.
While MPCP always offers the same conditional hit probability given coverage (equal to ), it can be shown under mild conditions that ICP and GSCP are again equivalent with this conditional hit probability tending to , thus significantly outperforming the MPCP.
Sparse network and very dense network scenarios are not of practical interest. Hence, we provide some numerical examples to show potential performance improvement of GSCP with respect to ICP and MPCP. It is to be noted that these numerical examples are provided only to demonstrate the potential for performance improvement via Gibbs sampling approach. Providing guarantees for the actual margin of performance improvement for a more realistic network topology (such as Poisson Boolean model for cells) is left for future research endeavours.
6.3 Distributed nature
The MPCP is completely distributed, i.e., all BSs fill in their caches independently, provided that they know the content popularity distribution. This popularity can be locally estimated, as it is suggested in Section 5.
The ICP is also distributed, provided that the specific modeloptimal distribution of the contents is fed to the BSs. This distribution depends on the coverage probabilities, which can be estimated only over the entire network; they cannot be calculated locally. Hence the ICP requires a central authority for the calculation of the optimal content distribution.
Our GSCP is distributed in the sense that each BS updates its cache using only local estimation and local information exchange.
6.4 Numerical example of performance improvement via Gibbs sampling for various values of
We consider six BSs placed inside the unit square bounded by the lines on the plane. There are four contents with popularity vector . Each BS can store at most two contents (i.e., ). Content requests are being generated over the unit square according to a time and space homogeneous Poisson point process with intensity requests per unit time per unit area.
We consider two possible scenarios for the cells of base stations:

Scenario : We assume that the six cells are either square or rectangular in size, and together cover the entire unit square. The corners of the cells are given by ,
, ,
,
and . 
Scenario : The six base stations are placed uniformly and independently inside the unit square (random placement). The cell of a base station is a circular region centered at it and with radius units. The placement realization in this numerical example left area of the unit square uncovered by base stations; this area does not contribute to the cache hit rate. The location of the six base stations for this particular realization are , , , , and .
For both scenarios, under most popular content placement, the cache hit rate is multiplied by the fraction of area covered by the base stations (this fraction is for scenario but less than for scenario ).
For scenario , we have also considered the case where all BSs choose the contents independently with the same probability distribution tuned to maximize the expected hit rate; the expected hit rate turned out to be in this case.
If the contents in all caches are chosen probabilistically according to the steady state Gibbs distribution , one can expect that the expected cache hit rate improves as increases, and converges to the maximum possible cache hit rate as .
The above phenomena for scenario and scenario have been captured in Figure 2. This figure also shows that even with finite but large , significantly higher cache hit rate can be achieved asymptotically compared to the most popular content placement strategy for all BSs, and even w.r.t. independent placement of contents in the BSs.


6.5 Effect of finite number of iterations, , and cell overlap
In this subsection, we demonstrate the caching performance of Gibbs sampling with only a finite number of iterations. We consider two different cases: (i) three base stations on the plane, each with unit radius, more overlap among cells, and (ii) three base stations on the plane, each with unit radius, less overlap among cells. The set of contents are with their popularities coming from a Zipf distribution with parameter . Each cache can store at most two contents (i.e., ).
For these two cases, for various values of , we simulated the Gibbs sampling algorithm (Algorithm 1) for iterations, noted the configuration obtained after the th iteration, and computed the cache hit rates for these configurations via simulation. Next, we compared them against cache hit rates for most popular content placement and independent content placement schemes. The results are summarized in Figure 3, where hit rates are computed per unit area of the entire window and not over the region covered by base stations alone. By the discussion provided in Section 6.2, we can expect that Gibbs sampling and independent content placement algorithms are both optimal if the cells become more overlapping. It is indeed seen in Figure 3 that the performances of Gibbs sampling and independent content placement algorithms are much better than most popular content placement, in case there is more overlapping among cells. It is also seen that the performance of Gibbs sampling tends to be better than independent content placement algorithm for large . However, it is important to remember that we have only provided result for one sample path for each ; since we have taken only iterations for Gibbs sampling, the results will vary if another independent sample path is chosen for the Gibbs sampling algorithm. Hence, Figure 3 only demonstrates the potential performance improvement by Gibbs sampling over finite time; on the other hand, Section 6.4 demonstrates that Gibbs sampling asymptotically achieves higher hit rate than independent content placement strategy and most popular content placement strategy.
6.6 Numerical example for mixing time and performance improvement of Gibbs sampling
Now we demonstrate the speed of convergence of Gibbs sampling for fixed . Location of base stations are generated independently with uniform distribution over the unit square, and the cell radius is assumed to be . Popularities of contents are generated either independently with uniform distribution, or they are assumed to follow Zipf distribution with parameter .
Results for small system size: For GSCP, each cache is assumed to be empty at . The performance of GSCP for , averaged over multiple independent sample paths, is compared against MPCP and ICP for various values of , and the cache size ; for GSCP, at each , hit rate for the current cache configuration is considered. Cache hit rate under GSCP is plotted against in Figure 4. The results show that, GSCP outperforms MPCP and ICP significantly and reaches stationary distribution for even if ; for , the stationary distribution is nearly achieved starting from . Of course, the convergence will be slower if , and are increased further; for large values of , and , one can simply use GSCP with only highly popular contents (for example, most popular contents whose collective popularity is or above).
Results for large system size: As discussed in Remark 2, the computations per iteration in Gibbs sampling (using (8)) can be prohibitive for GSCP to be applied to a large scale system. We alleviate this problem by proposing a simple modified GSCP algorithm (which we call MGSCP) where, at each iteration, only one randomly selected content is removed from a randomly selected BS, and then it is replaced by one content (absent in the cache after the removal) randomly via Gibbs sampling; thus, the denominator in (8) is replaced by a summation over all configurations that can possibly result via this replacement operation. Clearly this requires only computations per iteration of Gibbs sampling and hence is easily implementable. This might reduce the convergence speed, but that can be compensated if one runs this iteration multiple times between two successive discrete time instants. However, here we assume that this update is done only once at each . To reduce computation, we compute the hit rate only when is an integer multiple of either or . Figure 5 demonstrates that the MGSCP algorithm may take at most a few hundred iterations before it starts outperforming ICP, and the convergence to steady state distribution is also clear from the plots; a few hundred iterations is not big for this large scale system (with and or ), especially keeping in mind that multiple iterations can be performed in practice between two successive decision instants. Thus, MGSCP provides a fast, distributed, optimal algorithm for content placement in a large system.
7 Conclusion
In this paper, we have provided algorithms for cache content update in a cellular network, motivated by Gibbs sampling techniques. The algorithms were shown to converge asymptotically to the optimal content placement in the caches. It turns out that the computation and communication cost is affordable for practical cellular network base stations.
While the current paper solves an important problem, there are still possibilities for numerous interesting extensions: (i) We assumed uniform download cost from the backhaul network for all base stations. However, this is not in general true. Depending on the backhaul architecture, backhaul link capacities and congestion scenario, it might be more desirable to avoid download from some specific base stations. Even different base stations might have different link capacities, and in practice, this will result in queueing delay for the download process. Contents might be of various classes, and hence may not have fixed size. Hence, a combined formulation of cache update and backhaul network state evolution will be necessary. (ii) Different cells might witness different content popularities, but this has not been addressed in the current paper. (iii) Once a content becomes irrelevant (e.g., a news video), it has to be removed completely from all caches; one needs to develop techniques to detect when to remove a content from all caches. (iv) Providing convergence rate guarantees when the inverse temperature is increasing and when arrival rates and cell topology are learnt over time, is a very challenging problem. We leave these issues for future research endeavours on this topic.
[]Arpan Chattopadhyay obtained his B.E. in Electronics and Telecommunication Engineering from Jadavpur University, Kolkata, India in the year 2008, and M.E. and Ph.D in Telecommunication Engineering from Indian Institute of Science, Bangalore, India in the year 2010 and 2015, respectively. He is currently working in the Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles as a postdoctoral researcher. Previously he worked as a postdoc in INRIA/ENS Paris. His research interests include wireless networks, cyberphysical systems, machine learning and control.
[] Bartlomiej Blaszczyszyn received his PhD degree and Habilitation qualification in applied mathematics from University of Wroclaw (Poland) in 1995 and 2008, respectively. He is now a Senior Researcher at Inria (France), and a member of the Computer Science Department of Ecole Normale Superieure in Paris. His professional interests are in applied probability, in particular in stochastic modeling and performance evaluation of communication networks. He coauthored several publications on this subject in major international journals and conferences, as well as a twovolume book on Stochastic Geometry and Wireless Networks NoW Publishers, jointly with F. Baccelli.
[]H. Paul Keeler graduated in physics and applied mathematics in 2006 from Griffith University. He received a Ph.D. in applied mathematics in 2010 from the University of Melbourne. After stints as a consultant in market research and banking and a guest researcher at the University of Zaragoza, Spain, he worked for two years as a researcher at Inria and EÌcole Normale SupeÌrieure, Paris, where he was partly funded by Orange Labs. Currently he is a researcher at Weierstrass Institute (or WIAS), Berlin. His research interests lie in applied probability, numerical and asymptotic methods, communication networks.
Appendix A Definition of weak and strong ergodicity
Let us consider a discretetime inhomogeneous Markov chain whose transition probability matrix (t.p.m.) between and is given by . Let be the collection of all possible distributions (each element in is assumed to be a row vector) on the state space. Then is called weakly ergodic if, for all ,
where is the total variation distance between two distributions.
is called strongly ergodic if there exists such that, for all ,
Appendix B Proof of Theorem 1
Fix a small . Under configuration of the virtual caches, let us denote the total time (a generic random variable) taken by the arrival process so that, for all possible pairs there is at least one request for content to BS if virtual configuration suggests placing content at BS ; clearly , since we have made Assumption 1. Let us consider large enough such that: (i) for all integer , (ii) .
Now,
where has the same distribution as . The equality step follows from the fact that for , we have , since all real caches are updated to within .
Hence,
Since is arbitrarily small, we have:
On the other hand,
Now,
where the second inequality follows from the fact that for :
Since is arbitrarily small, we can say that:
Hence, .
Appendix C Proof of Theorem 2
In this proof, we will use the notion of weak and strong ergodicity of timeinhomogeneous Markov chains from [32, Chapter , Section ]), which is provided in Appendix A.
Fix . We will first show that the Markov chain in weakly ergodic.
Let us consider the transition probability matrix (t.p.m.) for the inhomogeneous Markov chain , where . Then, the Dobrushin’s ergodic coefficient is given by (see [32, Chapter , Section ] for definition) . The Markov chain is weakly ergodic if (by [32, Chapter , Theorem ]).
Now, with positive probability, virtual caches in all nodes are updated over a period of slots. Hence, any can be reached over a period of slots, starting from any other . Note that, once a base station is chosen in Algorithm 1 at discrete time , the sampling probability for any set of contents in its virtual cache in a slot is lower bounded by