Joint Pushing and Caching for Bandwidth Utilization Maximization in Wireless Networks
Abstract
Joint pushing and caching is recognized as an efficient remedy to the problem of spectrum scarcity incurred by tremendous mobile data traffic. In this paper, by exploiting storage resources at end users and predictability of user demand processes, we design the optimal joint pushing and caching policy to maximize bandwidth utilization, which is of fundamental importance to mobile telecom carriers. In particular, we formulate the stochastic optimization problem as an infinite horizon average cost Markov Decision Process (MDP), for which there generally exist only numerical solutions without many insights. By structural analysis, we show how the optimal policy achieves a balance between the current transmission cost and the future average transmission cost. In addition, we show that the optimal average transmission cost decreases with the cache size, revealing a tradeoff between the cache size and the bandwidth utilization. Then, due to the fact that obtaining a numerical optimal solution suffers the curse of dimensionality and implementing it requires a centralized controller and global system information, we develop a decentralized policy with polynomial complexity w.r.t. the numbers of users and files as well as cache size, by a linear approximation of the value function and optimization relaxation techniques. Next, we propose an online decentralized algorithm to implement the proposed lowcomplexity decentralized policy using the technique of Qlearning, when priori knowledge of user demand processes is not available. Finally, using numerical results, we demonstrate the advantage of the proposed solutions over some existing designs. The results in this paper offer useful guidelines for designing practical cacheenabled contentcentric wireless networks.
I Introduction
The rapid proliferation of smart mobile devices has triggered an unprecedented growth of global mobile data traffic [1], resulting in the spectrum crunch problem in wireless systems. In order to improve the bandwidth utilization and support the sustainability of wireless systems, researchers have primarily focused on increasing the access rate of wireless systems and the density of network infrastructures, e.g., base stations (BSs). However, the expansive growth of both the access rate and the density of network infrastructures entails prohibitive network costs. On the other hand, modern data traffic exhibits a high degree of asynchronous content reuse [2]. Thus, caching is gradually recognized as a promising approach to further improve the bandwidth utilization by placing contents closer to users, e.g., at BSs or even at end users, for future requests. Recent investigations show that caching can effectively reduce the traffic load of wireless and backhaul links as well as userperceived latency [6, 7, 3, 4, 8, 5].
Based on whether content placement is updated, caching policies can be divided into two categories, i.e., static caching policies and dynamic caching policies. Static caching policies refer to the caching policies under which content placement remains unchanged over a relatively long time. For example, [3, 4, 5] consider static caching policies at BSs to reduce the traffic load of backhaul links. In addition, in [6, 7], static caching policies at end users are proposed to not only alleviate the backhaul burden but also reduce the traffic load of wireless links. However, all the static caching policies in [6, 7, 3, 4, 5] are designed based on content popularity, e.g., the probability of each file being requested, which is assumed to be known in advance, and cannot exploit temporal correlation of a demand process to further improve performance of cacheassisted systems. Dynamic caching policies refer to the caching policies where content placement may update from time to time by making use of instantaneous user request information. In this way, dynamic caching policies can not only operate without priori knowledge of content popularity but also capture the temporal correlation of a demand process. The least recently used (LRU) policy and the least frequently used (LFU) policy [8] are two of the commonly adopted dynamic caching policies, primarily due to ease of implementation. However, they are both heuristic designs and may not guarantee promising performance in general.
Pure dynamic caching policies only focus on caching contents which have been requested and delivered to the local cache, and hence have limitations in smoothing traffic load fluctuations and enhancing bandwidth utilization. To address these limitations, joint pushing (i.e., proactively transmitting) and caching has been receiving more and more attention, as it can further improve bandwidth utilization. Specifically, the underutilized bandwidth at low traffic time can be exploited to proactively transmit contents for satisfying future user demands. Therefore, it is essential to design intelligent joint pushing and caching policy based on the knowledge of user demand processes.
For instance, [9] considers joint pushing and caching to minimize the energy consumption, assuming complete knowledge of future content requests. In most cases, the assumption cannot be satisfied, and hence the proposed joint design has limited applications. To address this problem, [15, 10, 12, 13, 14, 11] consider joint pushing and caching based on statistical information of content requests (e.g., content popularity), while [16] considers online learningaided joint design adaptive to instantaneous content requests and without priori knowledge of statistical information of content requests. Specifically, [10] optimizes joint pushing and caching to maximize the network capacity in a pushbased converged network with limited user storage. In [11], the authors maximize the number of user requests served by small BSs (SBSs) via optimizing the pushing policy using Markov decision process (MDP). Note that in [11], the cache size at each user is assumed to be unlimited, and thus caching design is not considered. In [12] and [13], the optimal joint pushing and caching policies are proposed to maximize the number of user requests served by the local caches in the scenarios of a single user and multiple users, respectively. [14] studies the optimal joint pushing and caching policy to minimize the transmission cost. However, the joint designs in [12, 13, 14] do not take into account future reuse of requested files, and thus cannot be applied to certain applications which involve reusable contents, such as music and video streaming. Moreover, in [12, 13, 14, 10, 11], temporal correlation of a demand process is not captured, and hence the potential of joint pushing and caching cannot be fully unleashed. In contrast, [15] and [16] exploit temporal correlation in the joint designs. In particular, [15] investigates efficient transmission power control and caching to minimize both the access delay and the transmission cost using MDP. In [16], the authors maximize the average reward obtained by proactively serving user demands and propose an online learningaided control algorithm. However, in [15] and [16], only a single user setup is considered without reflecting asynchronous demands for common contents from multiple users, and hence the proposed joint designs may not be directly applied to practical networks with multiple users. Moreover, the pushing policy in [16] can predownload contents only one time slot ahead.
To further exploit the promises of joint pushing and caching in bandwidth utilization, in this paper, we investigate the optimal joint pushing and caching policy and reveal the fundamental impact of storage resource on bandwidth utilization. Specifically, we consider a cacheenabled contentcentric wireless network consisting of a single server connected to multiple users via a shared and errorless link. Each user is equipped with a cache of limited size and generates inelastic file requests. We model the demand process of each user as a Markov chain, which captures both the asynchronous feature and temporal correlation of file requests. By the majorization theory [18], we choose a nondecreasing and strictly convex function of the traffic load as the perstage cost and consider the time averaged transmission cost minimization. In particular, we formulate the joint pushing and caching optimization problem as an infinite horizon average cost MDP. Note that there generally exist only numerical solutions for MDPs, which suffer from the curse of dimensionality and cannot offer many design insights. Hence, it is a great challenge to design an efficient joint pushing and caching policy with acceptable complexity and offering design insights. In this paper, our main contributions are summarized as below.

First, we analyze structural properties of the optimal joint pushing and caching policy. In particular, by deriving an equivalent Bellman equation, we show that the optimal pushing policy balances the current transmission cost with the future average transmission cost, while the optimal caching policy achieves the lowest future average transmission cost given the optimal pushing policy. In addition, based on coupling and interchange arguments, we prove that the optimal average transmission cost decreases with the cache size, revealing the tradeoff between the cache size and the bandwidth utilization. Moreover, via relative value iteration, we analyze the partial monotonicity of the value function, based on which the sizes of both the state space and the caching action space are reduced, and thereby the complexity of computing the optimal joint design is reduced.

Then, considering that obtaining the optimal policy requires computational complexity exponential with the number of users and combinatorial with the number of files as well as the cache size , and implementing it requires a centralized controller and global system information, we develop a lowcomplexity (polynomial with , and ) decentralized joint pushing and caching policy by using a linear approximation of the value function [19, 20] and optimization relaxation techniques.

Next, noting that our proposed lowcomplexity decentralized policy requires statistic information of user demand processes, we propose an online decentralized algorithm (ODA) to implement the lowcomplexity decentralized policy using the technique of Qlearning [21], when priori knowledge of user demand processes is not available.

Finally, by numerical results, we compare the performance of our proposed solutions with some existing designs at different system parameters, including the user number, file number, cache size and some key factors of user demand processes.
The key notations used in this paper are listed in Table I.
Ii System Model
Notation  Meaning 

,  set of all files, set of all users, user index, file index 
, ,  file number, user number, cache size 
, 
system demand process, system cache state 
,  state of user , system state 
transition matrix of demand process of user  
, ,  reactive transmission action, pushing action, caching action 
,  system action space under X, joint pushing and caching policy 
,  optimal average cost, value function of system state X 

Iia Network Architecture
As in [22], we consider a cacheenabled contentcentric wireless network with a single server connected through a shared errorfree link to users,^{1}^{1}1Note that the server can be a BS and each user can be a mobile device or a SBS. denoted as , as shown in Fig. 1.
The server is accessible to a database of files, denoted as . All the files are of the same size. Each user is equipped with a cache of size (in files). The system operates over an infinite time horizon and time is slotted, indexed by . At the beginning of each time slot, each user submits at most one file request, which is assumed to be delay intolerant and must be served before the end of the slot, either by its own cache if the requested file has been stored locally, or by the server via the shared link. At each slot, the server can not only reactively transmit a file requested by some users at the slot but also push (i.e., proactively transmit) a file which has not been requested by any user at the slot. Each transmitted file can be received by all the users concurrently before the end of the time slot.^{2}^{2}2We assume that the duration of each time slot is long enough to average the smallscale channel fading process, and hence the ergodic capacity can be achieved using channel coding. After being received, a file can be stored into some user caches.
IiB System State
IiB1 Demand State
At the beginning of time slot , each user generates at most one file request. Let denote the demand state of user at the beginning of time slot , where indicates that user requests nothing, and indicates that user requests file . Here, denotes the demand state space of each user which is of cardinality . Let denote the system demand state (of the users), where represents the system demand state space. Note that the cardinality of is , which increases exponentially with .
For user , we assume that evolves according to a firstorder state Markov chain, denoted as , which captures temporal correlation of order one of user ’s demand process and is a widely adopted traffic model [16]. Let denote the transition probability of going to state at time slot given that the demand state at time slot is for user ’s demand process. Assume that is timehomogeneous and denote . Furthermore, we restrict our attention to an irreducible Markov chain. Denote with the transition probability matrix of . We assume that the timehomogeneous Markov chains, i.e., , , are independent of each other. Thus, we have , where and .
IiB2 Cache State
Let denote the cache state of file in the storage of user at time slot , where means that file is cached in user ’s storage and otherwise. Under the cache size constraint, we have
(1) 
Let denote the cache state of user at time slot , where represents the cache state space of each user. Here, the user index is suppressed considering that the cache state space is the same across all the users. Let denote the system cache state at time slot , where represents the system cache state space. The cardinality of is , which increases with the number of users exponentially.
IiB3 System State
At time slot , denote with the state of user , where represents the state space of user . The system state consists of the system demand state and the system cache state, denoted as , where represents the system state space. Note that .
IiC System Action
IiC1 Pushing Action
A file transmission can be reactive or proactive at each time slot. Denote with the reactive transmission action for file at time slot , where when there exists at least one user who requests file but cannot find it in its local cache and otherwise. Thus, we have
(2) 
which is determined directly by .^{3}^{3}3Note that we do not need to design the reactive transmission action. Denote with the system reactive transmission action at time slot . Also, denote with the pushing action for file at time slot , where denotes that file is pushed (i.e., transmitted proactively) and otherwise. Considering that file is transmitted at most once at time slot , we have
(3) 
where is given by (2). Furthermore, if file has already been cached in each user’s storage, there is no need to push it. Hence, we have
(4) 
Denote with the system pushing action at time slot , where represents the system pushing action space under X.
System pushing action together with reactive transmission action incurs a certain transmission cost. We assume that the transmission cost is an increasing and continuously convex function of the corresponding traffic load, i.e., , denoted by . In accordance with practice, we further assume that . For example, we can choose with or with .^{4}^{4}4Note that by choosing , can represent the energy consumption at time slot . Here, we note that the perstage transmission cost is bounded within set . By the technique of majorization [18], a small timeaveraged transmission cost with such a perstage cost function corresponds to a small peaktoaverage ratio of the bandwidth requirement, i.e., a high bandwidth utilization, which is of fundamental importance to a mobile telecom carrier, as illustrated in Fig. .
IiC2 Caching Action
After the transmitted files being received by all the users, the system cache state can be updated. Let denote the caching action for file at user at the end of time slot , where means that file is stored into the cache of user , implies that the cache state of file at user does not change, and indicates that file is removed from the cache of user . Accordingly, the caching action satisfies the following cache update constraint:
(5) 
where is given by (2). In (5), the first inequality is to guarantee that file can be removed from the cache of user only when it has been stored at user , and the second inequality is to guarantee that file can be stored into the cache of user only when it has been transmitted from the server. The cache state evolves according to:
(6) 
Since belongs to and also satisfies (1), we have the following two cache update constraints:
(7) 
(8) 
From (5), (7) and (8), we denote with the caching action of user at the end of time slot , where represents the caching action space of user under its state , system reactive transmission action R and pushing action . Let denote the system caching action at the end of time slot , where represents the system caching action space under system state X and pushing action .
IiC3 System Action
At time slot , the system action consists of both the pushing action and caching action, denoted as , where represents the system action space under system state X.
Iii Problem Formulation
Given an observed system state X, the joint pushing and caching action, denoted as , is determined according to a policy defined as below.
Definition 1 (Stationary Joint Pushing and Caching Policy).
A stationary joint pushing and caching policy is a mapping from system state X to system action , i.e., . Specifically, we have and .
From the properties of and , we see that the induced system state process under policy is a controlled Markov chain. The time averaged transmission cost under policy is given by
(9) 
where is given by (2) and the expectation is taken w.r.t. the measure induced by the Markov chains. Note that can reflect the bandwidth utilization, as illustrated in Fig. .
In this paper, we aim to obtain an optimal joint pushing and caching policy to minimize the time averaged transmission cost defined in (9), i.e., maximizing the bandwidth utilization. Before formally introducing the problem, we first illustrate a simple example that highlights how the joint pushing and caching policy affects the average cost, i.e., bandwidth utilization.
Motivating Example. Consider a scenario with , , and . The user demand model is illustrated in Fig. 3 (a). A sample path of the user demand processes is shown in Fig. 3 (b). Note that at time slot , there is no file request, while at time slot , the number of file requests achieves the maximum value, i.e., . Fig. 3 (c)(h) illustrate the system cache states and the multicast transmission actions over three time slots under the following three policies: the most popular (MP) caching policy in which the most popular files (i.e., the first files with the maximum limiting probabilities) are cached at each user [4], the LRU caching policy and a joint pushing and caching (JPC) policy. We can calculate the average cost over the three time slots under the aforementioned three policies, i.e., , and . Note that . From Fig. 3 (h), we learn that under the joint pushing and caching policy, the bandwidth at low traffic time (e.g., time slot ) can be exploited to proactively transmit contents for satisfying future user demands (e.g., at time slot ), thereby improving the bandwidth utilization.
Problem 1 (Joint Pushing and Caching Optimization).
where denotes the minimum time averaged transmission cost under the optimal policy , i.e., .
Problem is an infinite horizon average cost MDP. According to Definition and Proposition in [21], we know that there exists an optimal policy that is unichain. Hence, in this paper, we restrict our attention to stationary unichain policies. Moreover, the MDP has finite state and action spaces as well as a bounded perstage cost. Thus, there always exists a deterministic stationary unichain policy that is optimal and it is sufficient to focus on the deterministic stationary unichain policy space. In the following, we use to refer to a deterministic stationary unichain policy.
Iv Optimal Policy
Iva Optimality Equation
We can obtain the optimal joint pushing and caching policy through solving the following Bellman equation.
Lemma 1 (Bellman equation).
There exist a scalar and a value function satisfying
(10) 
where is given by (2) and . is the optimal value of Problem for all initial system states , and the optimal policy can be obtained from
(11) 
Proof.
Please see Appendix A. ∎
From (1), we see that the optimal policy achieves a balance between the current transmission cost (i.e., the first term in the objective function of (1)) and the future average transmission cost (i.e., the second term in the objective function of (1)). Moreover, how achieves the balance is illustrated in the following corollary.
Corollary 1.
The optimal pushing policy is given by
(12) 
where is a nonincreasing function of . Furthermore, the optimal caching policy is given by
(13) 
where is obtained from (12).
Proof.
Remark 1 (Balance between Current Transmission Cost and Future Average Transmission Cost).
Note that the current transmission cost increases with and the future average transmission cost decreases with . Thus, the optimal pushing policy in (12) achieves the perfect balance between the current transmission cost and the future average transmission cost for all X. In addition, from (13), we learn that the optimal caching policy achieves the lowest future average transmission cost under the optimal pushing policy .
From Lemma and Corollary , we note that depends on system state X via the value function . Obtaining involves solving the equivalent Bellman equation in (1) for all X, and there generally exist only numerical results which cannot offer many design insights [21]. In addition, obtaining numerical solutions using value iteration or policy iteration is usually infeasible for practical implementation, due to the curse of dimensionality [21]. Therefore, it is desirable to study optimality properties of and exploit these properties to design lowcomplexity policies with promising performance.
IvB Optimality Properties
First, we analyze the impact of cache size on the optimal average transmission cost . For ease of exposition, we rewrite as a function of cache size , i.e., , and obtain the following lemma based on coupling and interchange arguments [15].
Lemma 2 (Impact of Cache Size).
decreases with when and when .
Proof.
Please see Appendix B. ∎
Remark 2 (Tradeoff between Cache Size and Bandwidth Utilization).
As illustrated in Fig. , a lower average transmission cost always corresponds to a higher bandwidth utilization. Hence, Lemma reveals the tradeoff between the cache size and the bandwidth utilization.
In the following, we focus on the case of . By analyzing the partial monotonicity of value function , we obtain the next lemma.
Lemma 3 (Transient System States).
Any with is transient under , where .
Proof.
Please see Appendix C. ∎
Remark 3 (Reduction of System State Space and Caching Action Space).
Lemma reveals that the optimal policy makes full use of available storage resources. Also, considering the expected sum cost over the infinite horizon incurred by a transient state is finite and negligible in terms of average cost, we restrict our attention to the reduced system state space without loss of optimality. Also, the cache update constraint in (7) is replaced with , and thus the caching action space can be further reduced.
Remark 4 (Computational Complexity and Implementation Requirement).
To obtain the optimal policy from (1) under the reduced system state space given in Lemma 3, we need to compute , , by solving a system of equations in (1), the number of which increases exponentially with the number of users and combinatorially with the number of files as well as the cache size . Moreover, given , computing for all X involves bruteforce search over the action space , which requires complexity of . In practice, , and are relatively large, and hence the complexity of computing is not acceptable. Besides, the implementation of requires a centralized controller and system state information, resulting in large signaling overhead.
V LowComplexity Decentralized Policy
To reduce the computational complexity and achieve decentralized implementation without much signaling overhead, we first approximate the value function in (1) by the sum of peruser perfile value functions. Based on the approximate value function, we obtain a lowcomplexity decentralized policy for practical implementation.
Va Value Approximation
To alleviate the curse of dimensionality in computing , for all , motivated by [19, 20], we approximate in (1) as follows:
(14) 
where and for all , , satisfy:
(15) 
Here, ,^{6}^{6}61() represents the indicator function throughout this paper. and . The equation in (15) corresponds to the Bellman equation of a peruser perfile MDP for user with unit cache size. and denote the average cost and value function of the peruser perfile MDP for user , respectively. Specifically, at time slot , denotes the system state, where denotes the demand state and denotes the cached file; denotes the caching action; the demand state evolves according to the Markov chain and the cache state evolves according to ; denotes the perstage cost. The peruser perfile MDPs are obtained from the original MDP by eliminating the couplings among the users and the cache units of each user, which are due to the multicast transmission and the cache size constraint, respectively.
In the following, we characterize the performance of the value approximation in (14) from the perspectives of the average transmission cost and the complexity reduction, respectively. First, by analyzing the relaxation from the original MDP to the peruser perfile MDPs, we have the following relationship between the average cost of the original MDP and the sum of the average costs of the peruser perfile MDPs.
Lemma 4.
and , satisfy that .
Proof.
Please see Appendix D. ∎
In addition, note that obtaining requires to solve a system of equations given in (1), while obtaining , , only requires to solve a system of equations given in (15). Therefore, under the value function approximation in (14), the nonpolynomial computational complexity is eliminated.
Remark 5.
The linear value function approximation adopted in (14) differs from most existing approximation methods. Firstly, different from the traditional linear approximation in [23], our approach is not based on specific basis functions. Secondly, compared with the randomized approach proposed in [19, 20], our approach leads to a lower bound of the optimal average cost as illustrated in Lemma .
VB Lowcomplexity Decentralized Policy
By replacing in (1) with in (14), the minimization problem in (1) which determines the optimal policy is approximated by:
Problem 2 (Approximate Joint Pushing and Caching Optimization).
For all ,
where , and . Let denote the corresponding optimal solution.
Note that due to the coupling among users incurred by the multicast transmission, solving Problem still calls for complexity of and centralized implementation with system state information, which motivates us to develop a lowcomplexity decentralized policy. Specifically, given system state X, first ignore the multicast opportunities in pushing and separately optimize the peruser pushing action of each user under given state and reactive transmission R. Then, the server gathers the information of the peruser pushing actions of all the users and multicasts the corresponding files. Next, each user optimizes its caching action given the files obtained from the multicast transmissions. The details are mathematically illustrated as follows.
First, for all , replace with and by adding constraints , we obtain an equivalent problem of Problem . The constraint in (3) is rewritten as
(16) 
which is to guarantee that each file is transmitted at most once to user at each time slot . The constraints in (4) and (5) can be replaced by
(17) 
(18) 
Via omitting the constraints , , we attain a relaxed optimization problem of Problem . Given R, by (16) (17) and (18), the relaxed problem can be decomposed into separate subproblems, one for each user, as shown in Problem .
Problem 3 (Pushing Optimization for User ).
For all state and R,
where . Let denote the optimal solution.
Then, we obtain as follows. Denote with the optimal pushing action for user when the number of pushed files for user is . From the definition of , we learn that user always pushes the first files with the minimum values of , . Hence, we obtain as follows. Given and R in (2), sort the elements in in ascending order, let denote the index of the file with the th minimum in , and we have
(19) 
Based on (19), we can easily obtain , as summarized below.
Optimal Solution to Problem 3: For all state and R, , where is given by (19) and is given by
(20) 
Next, based on , , we propose a lowcomplexity decentralized policy, denoted as , which reconsiders the multicast opportunities in pushing. Specifically, for all , we have and , where
(21) 
(22) 
Finally, we characterize the performance of . Lemma illustrates the relationship among the optimal values of Problem and Problem as well as the objective value of Problem at .
Lemma 5.
For all , , where the equality holds if and only if