Demand Shaping in Cellular Networks
Abstract
Demand shaping is a promising way to mitigate the wireless cellular capacity shortfall in the presence of everincreasing wireless data demand. In this paper, we formulate demand shaping as an optimization problem that minimizes the variation in aggregate traffic. We design a distributed and randomized offline demand shaping algorithm under complete traffic information and prove its almost surely convergence. We further consider a more realistic setting where the traffic information is incomplete but the future traffic can be predicted to a certain degree of accuracy. We design an online demand shaping algorithm that updates the schedules of deferrable applications (DAs) each time when new information is available, based on solving at each timeslot an optimization problem over a shrinking horizon from the current time to the end of the day. We compare the performance of the online algorithm against the optimal offline algorithm, and provide numerical examples to complement the theoretical analysis.
Demand shaping, offline algorithm, online algorithm, steepest descent algorithm, supermartingale, deferrable applications, cellular networks.
I Introduction
We have witnessed in recent years rapid increase in demand for wireless data, driven by the proliferation of smart mobile devices. The global mobile traffic in 2016 has nearly reached 84 exabytes, more than 80 times greater than the entire global Internet traffic in 2000; yet, this number is expected to be increasing at a compound annual growth rate (CAGR) of in the coming five years, i.e., a sevenfold growth from 2016 to 2021 [20]. However, despite frequent upgrades of cellular networks technology from 2G to 4G LTE and beyond, wireless service providers fall short of keeping up with this increasing wireless data demand, leading to congestion in the network, especially in areas of dense population. As a result, users’ data rates have to be throttled to ease congestions [6, 9, 2], at the cost of the degraded quality of service (QoS).
Admittedly, the capacity shortfall of cellular networks can be mitigated by allocating more wireless spectrum and deploying more wireless infrastructures including more and smaller cells and WiFi networks offloading, etc. However, spectrum allocation and infrastructure upgrading are not only costly but also timeconsuming, while WiFi networks may not always be available and secure. A promising alternative, inspired by the similar problem of demand response in power networks, is to improve spectrum and infrastructure efficiency through managing wireless data traffic (i.e., demand). Notice that wireless traffic or demand usually fluctuates with a large peaktovalley ratio throughout a day; see Fig. 1 for a trace of smartphone web browsing activity over a day. However, wireless capacity needs to be provisioned to meet the peak demand rather than the average. This means that the cellular network is usually stressed in peak hours while largely underutilized at other times. If the demand profile can be shaped to reduce the peak and smooth the time variation, not only can more traffic be accommodated under limited existing capacity constraints, but also additional spectrum allocation and infrastructure upgrades can be slowed down, which together greatly improve wireless network efficiency and QoS, and yield huge savings for service providers.
In this paper, we focus on designing demand shaping algorithms for cellular networks. We divide wireless traffic into two categories: nondeferrable traffic and deferrable traffic. Nondeferrable traffic refers to the traffic of those applications such as online gaming that have no or low delay tolerance, and constitutes the base traffic whose profile cannot be shaped. Deferrable traffic refers to the traffic of those applications such as file uploading/downloading that are flexible in time and only require being served by a designated deadline, e.g., finishing photo backup on cellphone by 12 am. Deferrable applications (DAs) are further divided into two major types: (1) continuousrate interruptible applications such as photos backup and applications update that allow any data rates—e.g., the delayed offloading in [27, 30], and (2) discreterate noninterruptible applications such as online movie streaming and video conference that usually require certain constant data rate [4, 3] and should not be interrupted once started, e.g., one can schedule movie watching or video conference to the “valley” time to enjoy better graphic quality and incur less data cost if he/she has the time flexibility. See Table I for a summary of traffic types and examples. We seek to schedule the deferrable traffic to flatten the aggregate traffic profile over a day.
Specifically, we formulate the cellular traffic demand shaping as an optimization problem that minimizes the (time) variation in the aggregate traffic profile subject to the time and rate specification on each DA. We first assume complete traffic information and design an offline demand shaping algorithm. There are two challenging issues in the offline algorithm design. First, the optimization problem is nonconvex because of discreterate noninterruptible applications. We instead solve its convex relaxation and design a randomized scheme based on the solution to the relaxed problem. Second, demand shaping involves potentially a huge number of applications and users. A centralized algorithm is not scalable. We instead design an iterative and distributed algorithm based on the descent method. We establish the almost surely convergence for the algorithm based on supermartingale theory.
We then consider a more realistic setting with incomplete information where we can only predict future traffic to a certain degree of accuracy, and design an online and distributed demand shaping algorithm that updates the schedules of DAs each timeslot when new information and updated prediction are available, based on the offline algorithm for an optimization problem over a shrinking horizon from the current time to the end of the day. We compare the performance of the online algorithm against the optimal offline algorithm, and provide numerical examples to complement the theoretical analysis.
The rest of the paper is organized as follows. Section II briefly reviews some related work and discuss some related issues. Section III describes the system model and problem formulation. Section IV presents an offline distributed algorithm for demand shaping under the assumption of complete traffic information and characterizes its performance. Section V considers a realistic setting of incomplete traffic information, and presents an online algorithm for demand shaping. Section VI provides numerical examples to complement theoretical analysis, and Section VII concludes the paper.
Traffic/Application Type  Examples 

Nondeferrable application  Online gaming, web browsing 
Discreterate noninterruptible DA  Movie streaming, video conference 
Continuousrate interruptible DA  Applications update, photos backup 
Ii Related work and issues
Demand shaping in cellular networks is similar to demand response in power networks, in terms of design objectives, problem formulation, and the associated algorithmic challenges. Indeed, we borrow insights from demand response in power networks; see, e.g., [29, 12, 14, 15]. In particular, our online demand shaping algorithm is motivated by the solution approach for online control of continuous load in reference [15], and mathematically can be seen as its extension to incorporate discrete decision variables considered in reference [14]. However, our model captures realistic cellular traffic setting, as it includes both continuous and discrete decision variables. Moreover, the integration of discrete decision variables into the online algorithm makes the performance analysis of the algorithm more challenging, compared to that in [15]. Related work also includes Zhao et al [38] that designs a centralized online EV charging algorithm to minimize the peak procurement from the grid under uncertain prediction of future demand and renewable energy supply, and Parise et al [31] that proposes a decentralized charging control for EVs to flatten the aggregate power demand profile. They all consider only continuous decision variables.
To ease the stress from high demand in cellular networks, various demandshapingbased methodologies as well as traffic offloading strategies have been studied in existing literatures. Tadrous et al in [36] propose a paradigm to proactively serve peakhour requests during the offpeak time based on prediction to smoothen the traffic demand over time without changing customers’ activity pattern. However, such strategy is limited to routine behaviors only. In [19] Hajiesmaili et al introduce an online procurement auction framework to incentivize mobile devices to participate in devicetodevice load balancing to offload traffic from one heavyloaded base station to adjacent idle ones. Besides, WiFi and femtocell offloading of cellular data is another major approaches to easing the congestion of cellular networks; see [10, 26, 27, 30, 22, 13] for related works.
In this paper we have focused on designing demand shaping algorithms based on a general and simplified system model. We do not investigate the important practical issues such as the timescale and granularity at which we schedule and reschedule the DAs. We plan to develop a platform to enable automatic demand shaping in the future, and will investigate various practical issues then. Also, demand shaping involves not only the design of control algorithms but also the design of right mechanisms to incentivize the users to move out of their “comfortable zone” in wireless applications and data usage. Incentive design for demand shaping is currently an active research area; see, e.g., the smart data pricing in wireless networks [18, 35, 37]; pricing design in general network service to remove congestions [32, 23], pricing/reward signals in power distribution system [40, 28], and the references therein.
Some discussion on the practicality of demand shaping is also in place. People tend to use mobile data services whenever they want, regardless of whether it is at peak time or valley time for the cellular network. However, a survey [17] conducted in India and USA in 2012 shows that, given proper monetary incentive, many people are willing to postpone their mobile data usage, with acceptable postponement varying from minutes to hours, depending on different types of services and different individual preferences [18]. For example, wireless service providers can motivate the users to shift their demand by implementing the timedependent pricing (TDP) strategy. TDP is now applied as a simple twoperiod plan by many wireless service providers around the world, in voice services and data services; e.g., Verizon [8] and Sprint [5] in the US have “happy hours” in the night and weekend for voice service, TelCom [7] in South Africa has “Night Surfer” plans giving free data from 11pm to 5am, and Airtel [1] in India provides unlimited data in the night. More refined TDP strategies can be applied to maximize benefits for both wireless service providers and users, by dynamically adjusting prices according to the data usage of the current time and predicted future. For instance, Ha et al [18] have worked on a TDPbased application named TUBE. Trials in cooperation with a local wireless service provider shows its effectiveness in shaping the traffic profile [24]. Also refer to [34] for a review of pricing strategies.
Iii System Model and Problem Formulation
Consider a cellular network that serves users for different applications such as web browsing, file sharing, realtime entertainment, etc. The applications can be broadly divided into two categories: deferrable applications (DAs) and nondeferrable applications (nonDAs). DAs refer to those applications that are flexible in the starting time and/or data rate, while the nonDAs refer to those that should be served immediately and often have stringent data rate requirement. Please refer to the third paragraph of Section I and TABLE I for more detailed description and examples of DAs and nonDAs.
This work aims to schedule the traffic of DAs so as to flatten the aggregate traffic profile over a day, subject to the time constraints and rate constraints of each application. We use a discretetime model where one day is divided equally into timeslots, indexed by . The duration of a timeslot can be, e.g., 30 minutes or 1 hour [18], depending on the time resolution of scheduling decisions.
time index,  
DA index,  
set of continuous DAs  
set of discrete DAs  
set of discrete DAs started earlier  
set of DAs adjustable at time  
base traffic profile,  
data rate profile of DA ,  
upper bounds of DA on the data rate at time  
constant bit rate for DA  
number of timeslots to finish transmission for DA  
virtual deferrable traffic profile  
average traffic profile  
average traffic profile of online ODS  
average traffic profile of online relaxed ODS  
average traffic profile of offline relaxed ODS  
total traffic required from DA ,  
remaining traffic to be served for DA  
change in traffic profile of DA ,  
arrival time of DA  
deadline of DA  
number of feasible profiles of DA  
th feasible profile of DA  
probability corresponding to  
set of all feasible traffic profiles for discrete DAs,  
objective value: (time) variance of d 
Iiia NonDeferrable Applications
NonDAs include web browsing, online gaming, and realtime chatting with multimedia, etc. The latency tolerated by these applications usually varies from hundreds of milliseconds to seconds. Since these applications should be served immediately upon request, their traffic is inelastic and constitutes the base traffic whose profile cannot be shaped. Denote the base traffic profile by . As we can only predict the base traffic to a certain accuracy, we model it as a random vector with mean and random derivation from the mean, i.e., . We assume that has a mean of and variance of , and may be temporally correlated. We further assume that we can make better prediction for the timeslots that are closer to current time, modeled by a timedependent deviation from the mean, i.e., the base traffic at some future time is predicted at current time by
(1) 
where the subscript represents the timeslot when the prediction is made, and has a decreasing variance as approaches . More concrete model for prediction will be introduced in Section VI. The parameters and will be specified exogenously, and can be estimated from the historical traffic records.
IiiB Deferrable Applications
Assume that there are DAs in the network, indexed by . Each DA is characterized by an arrival time when it is requested or after which it can be started, a deadline by which its transmission must be done, and certain requirement or constraint on data rate . Let denote the total traffic required by DA , i.e., . We can classify DAs into two main categories: continuousrate interruptible DAs (or continuous DAs for simplicity) that allow any data rates between certain upper and lower bounds and can be interrupted and resumed at any time before the deadline, and discreterate noninterruptible DAs (or discrete DAs for simplicity) that require certain (roughly) constant data rate and cannot be interrupted once they are started. For example, system backup is usually interruptible and allows any continuous data rates, while video conference is usually preferred to be noninterruptible and runs at a constant (thus discrete) data rate once it is started.
Among the total DAs, we assume there are continuous DAs, indexed by . For each continuous DA, denote by and the lower and upper bounds on its data rate at time , i.e.,
(2) 
Naturally, . The lower bounds are usually zero, and the upper bounds can be set according to, e.g., the available bandwidth. The arrival time and the deadline can be integrated into the rate constraints (2) by setting for and , i.e., no traffic is transmitted before arrival time or after deadline.
Index the rest discrete DAs by . For a discrete DA such as a streaming application, a constant bit rate corresponds to a certain graphic quality, e.g., for a SD quality movie on Netflix [4], and for a HD video call on Skype [3]. As the graphic quality usually (preferrably) does not change during those applications, this seemingly oversimplified assumption of a single discrete rate is reasonable.
For each DA with its total traffic and the rate , it takes consecutive timeslots (or equivalently the other way around, i.e., we calculate based on and ). Therefore, the number of its feasible traffic profiles is , wherein the th feasible profile is denoted as
We denote the set of all feasible traffic profiles of DA by , i.e., , .
Remark 1
All the modeled traffic parameters can be reasonably accessed or estimated in practice. For example, information regarding total required traffic and video streaming rate is available from metadata of traffic to be transmitted, parameters like and are specified by the users in advance (and can then be calculated accordingly), whereas data rate bounds and can be either determined by available bandwidth or designated by the users. See, e.g., [18] for an example system involving similar information requirement and implemented with real users and service provider.
IiiC Problem Formulation
We aim to schedule the traffic of DAs, so as to flatten the aggregate traffic profile as much as possible. Denote the “average” traffic profile by . Traffic flattening can be achieved by minimizing the time variance of , formulated as the following optimal demand shaping (ODS) problem:
ODS:
(3a)  
s.t.  (3e)  
Notice that the constraints (3e) for discrete DAs are nonconvex. In next section, we will investigate an offline algorithm together with a randomized scheme for solving the ODS problem under the assumption of complete information on the base traffic and DAs. Then in Section V, we will study an online algorithm for demand shaping under a more realistic setting of incomplete information where we can only predict the future traffic to a certain degree of accuracy. The offline ODS problem and algorithm will later serve as a benchmark to characterize the performance of the online algorithm.
Iv Offline Demand Shaping Algorithm
In this section, we assume complete traffic information, i.e., the base traffic and arrival of DAs are accurately known, and study how to solve the resulting offline ODS problem. The offline problem and algorithm will provide insights into the online algorithm design for realistic setting of incomplete information that will be considered in Section V.
Iva Convex Relaxation and Randomized Scheme
The offline ODS problem is nonconvex, as each discrete DA has to pick a traffic profile from a discrete set; see constraint (3e). Consider the convex hull of , defined as
(4)  
where is the convex combination coefficients, and will be interpreted as probability distribution in the randomized algorithm to be introduced soon. We will instead solve the convex relaxation of the ODS problem by replacing (3e) with the following constraint:
(5) 
We call the relaxed problem (3a)(3e)(5) the RODS problem. However, a solution to the RODS problem might not be feasible, i.e., . But since by definition (4) a solution can always be written as the convex combination we will randomly pick a traffic profile with corresponding probability . That said, we will design a randomized algorithm for the offline ODS problem, based on the solution to the RODS problem. We will integrate it into a distributed algorithm next.
IvB Distributed Algorithm
Solving the RODS problem (and the offline ODS problem) directly in a centralized way requires collecting information on all DAs, which may incur too much communication overhead and is impractical in the real network. Moreover, the users may not be willing to reveal information on DAs due to privacy concern. Therefore, we seek to solve it in a distributed way. Noticing that RODS problem has decoupled constraints, we attempt to design an iterative and distributed algorithm based on the decent method [11].
Before deriving the algorithm, we establish the following useful results. At kth iteration, let be the traffic profiles of all DAs, the average traffic profile, and the change in traffic profile of DA from iteration to . We have:
(6) 
where the variance , and denotes the average.^{1}^{1}1Notice that we consider a randomized scheme only for discrete DAs. That said, for continuousDAs there is no randomness and their variance is zero. By Jensen’s inequality,
(7) 
Therefore, one has
(8) 
And it follows that
(9)  
Denote by the first term in (9) and the second. For , we choose so as to minimize , i.e., to solve
(10a)  
s.t.  (10b) 
On the other hand, after some mathematical manipulations, we have
where is a constant given . For , we choose so as to minimize , i.e., to solve
(11) 
In essence, what we have done is to maximize the expected incremental decrease in the objective value at each iteration (i.e., steepest descent). This motivates a distributed demand shaping algorithm with the collaboration of a coordinator; see Algorithm 1. The wireless service provider can implement a logical coordinator at the base station.
At th iteration:

Upon gathering traffic profiles from DAs, the coordinator calculates the average traffic profile and announces it to DAs (or the end users) over a signaling or control channel.

Upon receiving the average traffic profile ,

DA updates its traffic profile by
s.t. and submits it to the coordinator.

DA calculates the average traffic profile by
which is , and then randomly chooses a traffic profile with probability and submits it to the coordinator.

The OffDS algorithm is a distributed algorithm wherein each DA solves its own simple optimization problem based on its previous decision, the average traffic profile , and local constraints, while the coordinator collects the proposed traffic profiles and updates the average traffic profile, Therefore, this algorithm is not only preserving privacy of the users, but also scalable and thus capable of quick response, which is crucial especially in realtime implementation in Section V.
The computational complexity of the OffDS algorithm is estimated as follows for completeness. Given certain accuracy requirement in the objective function value, the descent method requires iterations [11]. At each iteration, DAs solves an easy quadratic programming with a polynomial complexity of [33]. On the other hand, the coordinator calculates the average traffic profile which requires complexity each iteration. As a result, the OffDS algorithm requires overall computational complexity of .
Remark 2
For simpler expression, we use as the decision variable for DA in algorithm design and analysis, while in real implementation, it is more convenient to use probability distribution as the equivalent decision variable. Also notice that, if there is no continuous DA, Algorithm 1 reduces to the stochastic algorithm in [14]. We expect that the solution approach—randomized algorithm based on the “steepest” descent method for the convex relaxed problem—that we lay out in Sections IVA and IVB will find broad application in designing efficient algorithms for optimization problems that involve both continuous and discrete decision variables.
IvC Convergence
Before showing the convergence of the OffDS algorithm, we first establish two useful relations. For each DA , since solves the problem (10), we have the firstorder optimality condition
(12) 
for any feasible . Set to obtain
(13) 
For each DA , recalling that , by the firstoder optimality condition, we have
(14) 
for any feasible . Set to get
(15)  
Now, construct a filtration of the probability space , where the sample space is the feasible set specified by the constraints (3e)–(3e), the algebra , and , i.e., determined by the th iteration of the OffDS algorithm.
Theorem 1
The pair is a supermartingale.
First, notice that is bounded from below. So, . Second, applying relations (13)–(15) to equation (9), we obtain
i.e., . By definition, is a supermartingale [16].
Notice that is a nonnegative supermartingale. By the martingale convergence theorem [16], the following result is immediate.
Corollary 1
exists almost surely, where is some random variable.
Theorem 2
Denote by an “equilibrium” distribution over traffic profiles that converges to. The support of is a singleton.
When converges, . This requires , , and for (8), (13), and (15) to hold with equality. Notice that implies , as different feasible traffic profiles of DA are linearly independent. Thus, . So, the support of contains only one point.
Denote by an “equilibrium” traffic profile of the OffDS algorithm, i.e., if , then . Obviously the set of equilibrium profiles is not empty, as an optimum of the offline ODS problem is an equilibrium. The following result follows immediately from Theorem 2 and Corollary 1.
Theorem 3
The OffDS algorithm converges almost surely to an equilibrium traffic profile.
IvD Performance Analysis of the Offline Algorithm
We now characterize the performance of the OffDS algorithm with respect to the relaxed problem RODS that has a better and usually unachievable performance than the ODS problem. Specifically, denote by the solution of RODS, we bound the gap between the equilibrium of the OffDS algorithm and the solution of the RODS problem: , where and . Denote by the relative gap achieved by the OffDS algorithm.
Theorem 4
The gap is bounded as follows:
(17) 
Moreover, the relative gap diminishes as the number of discrete DAs increases, i.e.,
(18) 
For notational simplicity, let , which is a constant given the total amount of traffic. The objective value can be written as
where only the part contains decision variables. We can thus write the gap as
where the second inequality follows from (16). Note that is a constant for . Then the relative gap can be bounded as
(19)  
whose numerator increases linearly with and denominator increases linearly with the square of . Equation (18) follows.
V Online Demand Shaping Algorithm
In this section, we consider a realistic setting with incomplete information where we can only predict future traffic to a certain degree of accuracy, and study online demand shaping that makes decision based on the prediction of future traffic and updates the decision as new information becomes available.
A typical algorithm used in this setting is the receding horizon control; see, e.g., [25]. However, as the objective function (3a) does not have a nice additive structure, receding horizon control algorithm does not admit an easy analysis. We will instead extend a shrinking horizon control algorithm, which is used in [15] that studies mathematically the same problem with only continuous DAs, to include discrete DAs, and apply it to our online demand shaping (Online DS) problem.
Va Online Algorithm
We assume that the number of DAs arriving at time is randomly distributed with a mean and variance , and the total amount of traffic of each DA is randomly distributed with a mean and variance . Denote by the set of continuous DAs and the set of discrete DAs that have arrived by time , and let and . Notice that we cannot reschedule the remaining traffic of a discrete DA that has already started. Denote by the set of discrete DAs that have not been started by time . For DA , denote by the set of feasible traffic profiles at time . Let be the set of DAs whose profiles are still adjustable at time (i.e., all the continuous DAs and the discrete DAs that have not started by time ).
At time , we make a prediction of base traffic for the rest timeslots of the day, and we also have the information on DA and the expected total future deferrable traffic . Following [15], we introduce a virtual deferrable traffic profile with and , to emulate the impact of the future deferrable traffic upon the current demand shaping decision. With the afore setup, we aim to schedule and reschedule the DAs, so as to solve the following problem at each timeslot .
ODS:  
(20a)  
over  
s.t.  (20f)  
where , , and is the amount of traffic to be served at or after time .
We can solve the ODS problem at each timeslot the same way as we solve the offline ODS problem (3), constituting an online demand shaping algorithm; see Algorithm 2, wherein the convergence (and computational complexity) of Step 2) can be established (and analyzed) in the same way as Algorithm 1.
At each timeslot :

Denote by the schedules determined by time , and by the set of discrete DAs that has been started before time . For each DA , set its schedule as .

Solve the ODS problem iteratively: at th iteration,

Upon gathering traffic profiles from DAs , the coordinator solves
to obtain a virtual deferrable traffic , and then calculates the average traffic for and announces it to DA over a signaling or control channel.

Upon receiving the average traffic profile ,

DA obtains by
and submits the updated profile to the coordinator.

DA calculates by
represents it as a convex combination , and randomly chooses a traffic profile with probability and submits it to the coordinator.


VB Performance Analysis of the Online Algorithm
We now characterize the performance of OnDS algorithm with respect to the result of OffDS algorithm (i.e., the offline ODS problem) which serves as a benchmark. We will make the following assumptions to simplify the analysis and obtain insights into how uncertainties affect the performance of OnDS algorithm.
Assumption 1
The amount of deferrable traffic is large and flexible enough so that a valleyfilling schedule exists at every time , i.e., there exists some constant such that
(21)  
Remark 4
Assumption 1 looks a strong assumption, and we do not have empirical evidence to support it as demand shaping has not being widely adopted in current cellular networks. However, with increasing penetration of deferrable traffics and users, this assumption expects to hold. One purpose of algorithm design as in this paper and incentive design as in [18] is to facilitate and incentivize wide adoption of demand shaping. On the other hand, valleyfilling represents the scenario where demand shaping is most useful and presents a benchmark for the potential of demand shaping. Mathematically, it is very difficult to analyze the performance of the online algorithm under more general assumption than Assumption 1. However, notice that in numerical examples in Section VI, we do not impose Assumption 1 while the results still fall into the bound specified in Theorem 5.
Assumption 2
The base traffic prediction at is modeled as the following causal filter