Waiting but not Aging: AgeofInformation and Utility Optimization Under the Pull Model
Abstract
The AgeofInformation (AoI) has recently been proposed as an important metric for investigating the timeliness performance in informationupdate systems. In this paper, we introduce a new Pull model and study the AoI optimization problem under replication schemes. Interestingly, we find that under this new Pull model, replication schemes capture a novel tradeoff between different levels of information freshness and different response times across the servers, which can be exploited to minimize the expected AoI at the user’s side. Specifically, assuming Poisson updating process for the servers and exponentially distributed response time, we derive a closedform formula for computing the expected AoI and obtain the optimal number of responses to wait for to minimize the expected AoI. Then, we extend our analysis to the setting where the user aims to maximize the AoIbased utility, which represents the user’s satisfaction level with respect to freshness of the received information. Furthermore, we consider a more realistic scenario where the user has no knowledge of the system. In this case, we reformulate the utility maximization problem as a stochastic MultiArmed Bandit problem with side observations and leverage the unique structure of the problem to design learning algorithms with improved performance guarantees. Finally, we conduct extensive simulations to elucidate our theoretical results and compare the performance of different algorithms. Our findings reveal that under the Pull model, waiting for more than one response can significantly reduce the AoI and improve the AoIbased utility in most scenarios.
I Introduction
The last decades have witnessed the prevalence of smart devices and significant advances in ubiquitous computing and the Internet of things. This trend is forecast to continue in the years to come [2]. The development of this trend has spawned a plethora of realtime services that require timely information/status updates. One practically important example of such services is vehicular networks and intelligent transportation systems [3, 4], where accurate status information (position, speed, acceleration, tire pressure, etc.) of a vehicle needs to be shared with other nearby vehicles and roadside facilities in a timely manner in order to avoid collisions and ensure substantially improved road safety. More such examples include sensor networks for environmental/health monitoring [5, 6], wireless channel feedback [7], news feeds, weather updates, online social networks, fare aggregating sites (e.g., Google Shopping), and stock quotes service.
For systems providing such realtime services, those commonly used performance metrics, such as throughput and delay, exhibit significant limitations in measuring the system performance [8]. Instead, the timeliness of information updates becomes a major concern. To that end, a new metric called the AgeofInformation (AoI) has been proposed as an important metric for studying the timeliness performance [3]. The AoI is defined as the time elapsed since the most recent update occurred (see Eq. (1) for a formal definition). Using this new AoI metric, the work of [8] employs a simple system model to analyze and optimize the timeliness performance of an informationupdate system. This seminal work has recently aroused dramatic interests from the research community and has inspired a series of interesting studies on AoI analysis and optimization (see [9, 10] and references therein).
While all prior studies consider a Push model, concerning about when and how to “push” (i.e., generate and transmit) the updated information to the user, in this paper we introduce a new Pull model, under which a user sends requests to the servers to proactively “pull” the information of interest. This Pull model is more relevant for many important applications where the user’s interest is in the freshness of information at the point when the user requests it rather than in continuously monitoring the freshness of information. One application of the Pull model is in the realtime stock quotes service, where a customer (i.e., user) submits a query to multiple stock quotes providers (i.e., servers) and each provider responds with the most uptodate information it has.
To the best of our knowledge, however, none of the existing work on the timeliness optimization has considered such a Pull model. In stark contrast, we focus on the Pull model and propose to employ request replication to minimize the AoI or to maximize the AoIbased utility at the user’s side. Although a similar Pull model is considered for data synchronization in [11, 12], the problems are quite different and request replication is not exploited. Note that the concept of replication is not new and has been extensively studied for various applications (e.g., cloud computing and datacenters [13, 14], storage clouds [15], parallel computing [16, 17], and databases [18, 19]). However, for the AoI minimization problem under the Pull model, replication schemes exhibit a unique property and capture a novel tradeoff between different levels of information freshness and different response times across the servers. This tradeoff reveals the power of waiting for more than one response and can be exploited to minimize the AoI or to maximize the AoIbased utility at the user’s side.
Next, we explain the above key tradeoff through a comparison with cloud computing systems. It has been observed that in a cloud or a datacenter, the processing time of a same job can be highly variable on different servers [14]. Due to this important fact, replicating a job on multiple servers and waiting for the first finished copy can help reduce the latency [14, 13]. Apparently, in such a system it is not beneficial to wait for more copies of the job to finish, as all the copies would give the same outcome. By contrast, in the informationupdate system we consider, although the servers may possess the same type of information (weather forecast, stock prices, etc.), they could have different versions of the information with different levels of freshness due to the random updating processes. In fact, the first response may come from a server with stale information; waiting for more than one response has the potential of receiving fresher information and thus helps reduce the AoI. Hence, it is no longer the best to stop waiting after receiving the first response (as in the other aforementioned applications). On the other hand, waiting for too many responses will lead to a longer total waiting time, and thus, it also incurs a larger AoI at the user’s side. Therefore, it is challenging to determine the optimal number of responses to wait for in order to minimize the expected AoI (or to maximize the AoIbased utility) at the user’s side. The problem is further exacerbated by the fact that the updating rate and the mean response time, which are important to making such decisions, are typically unknown to the user a priori.
We summarize our key contributions as follows.

To the best of our knowledge, this work, for the first time, introduces the Pull model for studying the timeliness optimization problem and proposes to employ request replication to reduce the AoI.

Assuming Poisson updating process at the servers and exponentially distributed response time, we derive a closedform formula for computing the expected AoI and obtain the optimal number of responses to wait for to minimize the expected AoI. We also discuss some extensions to account for more general replication schemes and different types of response time distributions.

We further consider scenarios where the user aims to maximize the utility, which is an exponential function of the negative AoI. The utility represents the user’s satisfaction level with respect to freshness of the received information. We derive a set of similar theoretical results for the utility maximization problem.

Note that the above results require the knowledge of the system parameters (i.e., the updating rate and the mean response time), which is often difficult, if not impossible, for the user to obtain. Hence, we consider a more realistic scenario where the user has no knowledge of the system. In this case, we formulate the utility maximization problem as a stochastic MultiArmed Bandit (MAB) problem with side observations. The feedback graph associated with side observations has a special linear structure, which can be leveraged to design learning algorithms with improved regret upper bounds.

Finally, we conduct extensive simulations to elucidate our theoretical results. We also investigate the impact of the system parameters on the achieved gain. Our results show that waiting for more than one response can significantly reduce the AoI and improve the utility in most scenarios. In the case of unknown system parameters, we also perform simulations and compare the performance of various learning algorithms. The results show that algorithms that exploit the special linear feedback graph indeed outperform the classic algorithms.
The remainder of this paper is organized as follows. We first discuss related work in Section II and then describe our new Pull model in Section III. In Section IV, we analyze the expected AoI under replication schemes and obtain the optimal number of responses for minimizing the expected AoI. In Section V, we consider the utility maximization problem in the settings where the updating rate and the mean response time are known and unknown, respectively. Section VI presents the simulation results, and we conclude the paper in Section VII.
Ii Related Work
Since the seminal work on AoI [3], there has been a large body of work focusing on AoI analysis and optimization in a wide variety of settings and applications (see [9, 10] for surveys). However, almost all prior work considers the Push model, in contrast to the Pull model we consider in this paper.
A series of work (e.g., [8, 20, 21, 22, 23, 24, 25, 26, 27]) has been focused on analyzing the AoI performance of various queueing models. In [8], the authors analyze the expected AoI in M/M/, M/D/, and D/M/ systems under the FirstComeFirstServed (FCFS) policy. A followup work in [22] extends the analysis to M/M/ and M/M/ models. The expected AoI is also characterized for M/M/ LastComeFirstServed (LCFS) model, with and without preemption, for singlesource and multisource systems [20, 21]. Furthermore, controlling the AoI through packet deadlines is studied in [23, 24]; the effect of the packet management (e.g., prioritizing new arrivals and discarding old packets) on the AoI is considered in [25, 26]. In [27], the authors show that the preemptive LastGeneratedFirstServed (LGFS) policy achieves the optimal (or nearoptimal) AoI performance in a multiserver queueing system.
There have also been lots of recent efforts denoted to the design and analysis of AoIoriented scheduling algorithms in various network settings (e.g., [28, 29, 30, 31, 32, 33, 34, 35, 36]). In [28], the authors aim to minimize the weighted sum AoI of the clients in a broadcast wireless network with unreliable channels. A similar problem with throughput constraints is considered in a followup study [29]. In [30, 31, 33], the authors consider AoIoptimal scheduling problems in ad hoc wireless networks under interference constraints. Considering a similar network setting, the authors of [32] aim to design AoIaware algorithms for scheduling realtime traffic with hard deadlines. Recently, the study on AoI has also been pushed towards more challenging settings with multihop flows [34, 35, 36].
We want to point out that the preliminary version of our paper [1] is the first work that employs the Pull model and replication schemes to study the AoI at the user’s side. Since then, the idea of replication has also been adopted for studying the AoI under different models (see, e.g., [37, 38, 27]). Recently, the authors of [39] also aim to minimize the AoI from the users’ perspective by considering multiple users. Note that outside the AoI area, similar Pull models have been investigated (e.g., for data synchronization [11, 12]) since decades ago. However, the problems they study are very different, and request replication is not exploited.
Besides the linear AoI considered in the above work, there are several studies that investigate more general functions of the AoI (e.g., [40, 41, 42]). Such functions are often used to model utility/penalty, which represents the user’s satisfaction/dissatisfaction level with respect to freshness of the received information. In this extended journal version (see Section V), we also consider the AoIbased utility, which is an exponential function of the negative AoI. However, the model and the problem we consider are quite different from those in the existing work. Furthermore, we study the scenario where the system parameters are unknown and formulate the utility maximization problem as an online learning problem.
Iii System Model
We consider an informationupdate system where a user pulls timesensitive information from servers. These servers are connected to a common information source and update their data asynchronously. We call such a model the Pull model (see Fig. 1). Let be the set of indices of the servers, and let be the server index. We assume that the information updating process at each server is Poisson with rate and is independent and identically distributed (i.i.d.) across the servers. This implies that the interupdate time (i.e., the time duration between two successive updates) at each server follows an exponential distribution with mean . Here, the interupdate time at a server can be interpreted as the time required for the server to receive information updates from the source. Let denote the time when the most recent update at server occurs, and let denote the AoI at server , which is defined as the time elapsed since the most recent update at this server:
(1) 
Therefore, the AoI at a server drops to zero if an update occurs at this server; otherwise, the AoI increases linearly as time goes by until the next update occurs. Fig. 2 provides an illustration of the AoI evolution at server .
In this work, we consider the replication scheme, under which the user sends the replicated copies of the request to all servers and waits for the first responses. Let denote the response time for server . Note that each server may have a different response time, which is the time elapsed since the request is sent out by the user until the user receives the response from this server. We assume that the time for the requests to reach the servers is negligible compared to the time for the user to download the data from the servers. Hence, the response time can be interpreted as the downloading time. Let denote the downloading start time, which is the same for all the servers, and let denote the downloading finish time for server . Then, the response time for server is . We assume that the response time is exponentially distributed with mean and is i.i.d. across the servers. Note that the model we consider above is simple, but it suffices to capture the key aspects and novelty of the problem we study.
Under the replication scheme, when the user receives the first responses, it uses the freshest information among these responses to make certain decisions (e.g., stock trading decisions based on the received stock price information). Let denote the index of the server corresponding to the th response received by the user. Then, set contains the indices of the servers that return the first responses, and the following is satisfied: and . Let server be the index of the server that provides the freshest information (i.e., that has the smallest AoI) among these responses when downloading starts at time , i.e.,
(2) 
Here, we are interested in the AoI at the user’s side when it receives the th response, denoted by , which is the time difference between when the th response is received and when the information at server is updated, i.e.,
(3) 
Then, there are two natural questions of interest:
(Q1): For a given , can one obtain a closedform formula for computing the expected AoI at the user’s side, ?
(Q2): How to determine the optimal number of responses to wait for, such that is minimized?
The second question can be formulated as the following optimization problem:
(4) 
We will answer these two questions in Section IV.
Furthermore, we will generalize the proposed framework and consider maximizing an AoIbased utility function at the user’s side. The utility maximization problem will be studied in Section V, where we consider both cases of known and unknown system parameters (i.e., the updating rate and the mean response time).
Iv AoI Minimization
In this section, we focus on the AoI minimization problem under the Pull model. We first derive a closedform formula for computing the expected AoI at the user’s side under the replication scheme (Section IVA). Then, we find the optimal number of responses to wait for in order to minimize the expected AoI (Section IVB). Finally, we discuss some immediate extensions (Section IVC).
Iva Expected AoI
In this subsection, we focus on answering Question (Q1) and derive a closedform formula for computing the expected AoI under the replication scheme.
To begin with, we provide a useful expression of the AoI at the user’s side under the replication scheme (i.e., , as defined in Eq. (3)) as follows:
(5) 
where the second last equality is from the definition of and (i.e., Eq. (1)), and the last equality is from Eq. (2). As can be seen from the above expression, under the replication scheme the AoI at the user’s side consists of two terms: (i) , the total waiting time for receiving the first responses, and (ii) (also denoted by ), the AoI of the freshest information among these responses when downloading starts at time . An illustration of these two terms and is shown in Fig. 3.
Taking the expectation of both sides of Eq. (5), we have
(6) 
In the above equation, the first term (i.e., the expected total waiting time) can be viewed as the cost of waiting, while the second term (i.e., the expected AoI of the freshest information among these responses) can be viewed as the benefit of waiting. Intuitively, as increases (i.e., waiting for more responses), the expected total waiting time (i.e., the first term) increases. On the other hand, upon receiving more responses, the expected AoI of the freshest information among these responses (i.e., the second term) decreases. Hence, there is a natural tradeoff between these two terms, which is a unique property of our newly introduced Pull model.
Next, we formalize this tradeoff by deriving the closedform expressions of the above two terms as well as the expected AoI. We state the main result of this subsection in Theorem 1.
Theorem 1.
Under the replication scheme, the expected AoI at the user’s side can be expressed as
(7) 
where is the th partial sum of the diverging harmonic series.
Proof.
We first analyze the first term of the righthand side of Eq. (6) and want to show . Note that the response time is exponentially distributed with mean and is i.i.d. across the servers. Hence, random variable is the th smallest value of i.i.d. exponential random variables with mean . The order statistics results of exponential random variables give that is an exponential random variable with mean for any , where we set for ease of notation [43]. Hence, we have the following:
(8) 
Next, we analyze the second term of the righthand side of Eq. (6) and want to show the following:
(9) 
Note that the updating process at each server is a Poisson process with rate and is i.i.d. across the servers. Hence, the interupdate time for each server is exponentially distributed with mean . Due to the memoryless property of the exponential distribution, the AoI at each server has the same distribution as the interupdate time, i.e., random variable is also exponentially distributed with mean and is i.i.d. across the servers [44]. Therefore, random variable is the minimum of i.i.d. exponential random variables with mean , which is also exponentially distributed with mean . This implies Eq. (9).
Remark. The above analysis indeed agrees with our intuition: while the expected total waiting time for receiving the first responses (i.e., Eq. (8)) is a monotonically increasing function of , the expected AoI of the freshest information among these responses (i.e., Eq. (9)) is a monotonically decreasing function of .
IvB Optimal Replication Scheme
In this subsection, we will exploit the aforementioned tradeoff and focus on answering Question (Q2) that we discussed at the end of Section III. Specifically, we aim to find the optimal number of responses to wait for in order to minimize the expected AoI at the user’s side.
First, due to Eq. (7), we can rewrite the optimization problem in Eq. (4) as
(10) 
Let be an optimal solution to Eq. (10). We state the main result of this subsection in Theorem 2.
Theorem 2.
An optimal solution to Problem (10) can be computed as
(11) 
Proof.
We first define as the difference of the expected AoI between the and replication schemes, i.e., for any . From Eq. (7), we have the following:
(12) 
for any . It is easy to see that is a monotonically increasing function of .
We now extend the domain of to the set of positive real numbers and want to find such that . With some standard calculations and dropping the negative solution, we derive the following:
(13) 
Next, we discuss two cases: (i) and (ii) .
In Case (i), we have . This implies that for all , as is monotonically increasing. Hence, the expected AoI, , is a monotonically decreasing function for . Therefore, must be the optimal solution.
In Case (ii), we have . We consider two subcases: is an integer in and is not an integer.
If is an integer in , we have for and for , as is monotonically increasing. Hence, the expected AoI, , is first decreasing (for ) and then increasing (for ). Therefore, there are two optimal solutions: and since (due to ).
If is not an integer, we have for and for , as is monotonically increasing. Hence, the expected AoI, , is first decreasing (for ) and then increasing (for ). Therefore, must be the optimal solution.
Combining two subcases, we have in Case (ii). Then, combining Cases (i) and (ii), we have . ∎
Remark. There are two special cases that are of particular interest: (i) waiting for the first response only (i.e., ) and (ii) waiting for all the responses (i.e., ). In Corollary 1, we provide a sufficient and necessary condition for each of these two special cases.
Corollary 1.
Proof.
The proof follows straightforwardly from Theorem 2. A little thought gives the following: is an optimal solution if and only if . Solving gives . Similarly, is an optimal solution if and only if . Solving gives . ∎
Remark. The above results agree well with the intuition. For a given number of servers, if the interupdate time is much smaller than the response time (i.e., ), then the difference of the freshness levels among the servers is relatively small. In this case, it is not beneficial to wait for more responses. On the other hand, if the interupdate time is much larger than the response time (i.e., ), then one server may possess much fresher information than another server. In this case, it is worth waiting for more responses, which leads to a significant gain in the AoI reduction.
Note that Theorem 2 also implies how the optimal solution (i.e., ) scales as the number of servers (i.e., ) increases: when becomes large, we have .
IvC Extensions
In this subsection, we discuss some immediate extensions of the considered model, including more general replication schemes and different types of response time distributions.
Replication schemes
So far, we have only considered the replication scheme. One limitation of this scheme is that it requires the user to send a replicated request to every server, which may incur a large overhead when there are a large number of servers (i.e., when is large). Instead, a more practical scheme would be to send the replicated requests to a subset of servers. Hence, we consider the replication schemes, under which the user sends a replicated request to each of the servers that are chosen from the servers uniformly at random and waits for the first responses, where and . Making the same assumptions as in Section III, we can derive the expected AoI at the user’s side in a similar manner. Specifically, reusing the proof of Theorem 1 and replacing with in the proof, we can show the following:
(14) 
Uniformly distributed response time
Note that our current analysis requires the memoryless property of the Poisson updating process. However, the analysis can be extended to the uniformly distributed response time. We make the same assumptions as in Section III, except that the response time is now uniformly distributed on interval with and . In this case, it is easy to derive (see, e.g., [43]). Since Eq. (9) still holds, from Eq. (6) we have
(15) 
Following a similar line of analysis to that in the proof of Theorem 2, we can show that an optimal solution can be computed as
(16) 
V AoIbased Utility Maximization
In Section IV, our study has been focused on minimizing the expected AoI at the user’s side. For certain practical applications, however, the user might be more interested in maximizing the utility that is dependent on the AoI than minimizing the AoI itself. Such an AoIbased utility function can serve as a Quality of Experience (QoE) metric, which measures the user’s satisfaction level with respect to freshness of the received information. To that end, in this section we will investigate the problem of AoIbased utility maximization. Specifically, we will consider both cases of known and unknown system parameters (i.e., the updating rate and the mean response time) in Sections VB and VC, respectively.
Va AoIbased Utility Function
Consider a function , which maps the AoI at the user’s side under the replication scheme (i.e., ) to a utility obtained by the user. Such a function is called a utility function. Similar to [45], we assume that the utility function is measurable, nonnegative, and nonincreasing. The specific choice of the utility function depends on applications under consideration in practice.
We consider the same model as that in Section III. From the analysis in the proof of Theorem 1, it is easy to see that the AoI, , is the sum of independent exponential random variables, which are for , and . Therefore, the AoI, , is a hyperexponential random variable (or a generalized Erlang random variable). The probability density function of a hyperexponential random variable with rate parameters can be expressed as , where is the rate of the th exponential distribution and . For the AoI, , we have , and the rate parameters ’s are: for and . Then, the expected utility can be calculated as
(17) 
Now, the problem is to find the optimal value that achieves the maximum expected utility:
(18) 
In the following subsections, we will consider a specific AoIbased utility function, which is an exponential function of the negative AoI, and aim to maximize the expected utility. We will consider both cases of known and unknown system parameters (i.e., the updating rate and the mean response time).
VB Case with Known System Parameters
In this subsection, we consider a specific AoIbased utility function in the following exponential form:
(19) 
The above exponential utility function implies that the user receives the full utility when the AoI is zero (which is an ideal case) and the utility decreases exponentially as the AoI increases. Such a utility function decreases very quickly with respect to the AoI and is desirable for realtime applications that require extremely fresh information to provide satisfactory service to the users (e.g., stock quotes service).
Assuming that the updating rate and the mean response time are known, we first derive a closedform formula for computing the expected utility . Then, we find an optimal that yields the maximum expected utility. The main results of this subsection are stated in Theorems 3 and 4. The proofs of Theorems 3 and 4 follow a similar line of analysis to that for Theorems 1 and 2, respectively. The detailed proofs are provided in Appendices A and B, respectively.
Theorem 3.
Under the replication scheme, the expected utility can be expressed as
(20) 
Theorem 4.
An optimal solution to Problem (18) (i.e., achieving the maximum expected utility) can be computed as
(21) 
Remark. Similar to the AoI minimization problem studied in Section IVB, there are also two interesting special cases: (i) waiting for the first response only (i.e., ) and (ii) waiting for all the responses (i.e., ). In Corollary 2, we provide a sufficient and necessary condition for each case.
Corollary 2.
VC Case with Unknown System Parameters
In Section VB, we have addressed the utility maximization problem in Eq. (18), assuming the knowledge of the updating rate (i.e., ) and the mean response time (i.e., ). Similar assumptions are also made for obtaining a good understanding of the studied theoretical problems (see, e.g., [9, 45] and references therein). However, such information is typically unavailable to the user in practice. For example, the user generally has no knowledge of the updating processes between the information source and the servers. Moreover, it is difficult, if not impossible, for the user to estimate the updating rate as the user has no direct observation about the updating processes. Therefore, an interesting and important question naturally arises: How to maximize the expected utility in the presence of unknown system parameters?
To that end, in this subsection we aim to address the above question through the design of learningbased algorithms. Specifically, in the presence of unknown system parameters we will reformulate the utility maximization problem as a stochastic MultiArmed Bandit (MAB) problem. To the best of our knowledge, this is the first work that leverages the MAB formulation to study the AoI problem.
In the following, we will first briefly introduce the basic setup of the stochastic MAB model (Section VC1). Then, we formulate the utility maximization problem with unknown system parameters as an MAB problem and explain the special linear feedback graph of our problem, which can be exploited to achieve improved performance guarantees compared to the classic MAB setting (Section VC2). Finally, we introduce several MAB algorithms that can be applied to address our problem (Section VC3).
The MAB Model
The MAB model has been widely employed for studying many sequential decisionmaking problems of practical importance (clinical trials, network resource allocation, online ad placement, crowdsourcing, etc.) with unknown parameters (see, e.g., [46, 47, 48, 49]).
In the classic MAB model, a player (i.e., a decision maker) is faced with options, which are often called arms in the MAB literature. In each round, the player can choose to play one arm and receives the reward generated by the played arm. The reward of playing arm in round , denoted by , is a random variable distributed on interval , i.e., . We assume that the rewards are i.i.d. both over time and across arms. Let be the mean reward of arm ; let be the highest mean reward among all the arms, i.e., . The specific distributions of ’s and the values of ’s are unknown to the player.
An algorithm chooses an arm to play in each round , where is the length of the time horizon. The objective here is to design an algorithm that maximizes the expected cumulative reward during this time horizon, i.e., . This is equivalent to minimizing the regret, which is the difference between the expected cumulative reward obtained by an optimal algorithm that always plays the best arm and that of the considered algorithm. We use to denote the regret, which is formally defined as follows:
(22) 
In order to maximize the reward or minimize the regret, the player is faced with a key tradeoff: how to balance exploitation (i.e., playing the arm with the highest empirical mean reward) and exploration (i.e., trying other arms, which could potentially be better)? There exist several wellknown algorithms that can address this challenge. We will discuss them in Section VC3.
The MAB Formulation of the Utility Maximization Problem
We now want to formulate the utility maximization problem with unknown system parameters as an MAB problem. Note that when the updating rate and the mean response time are unknown, one cannot easily derive a closedform formula for the expected utility and find the optimal solution as in Section VB. Therefore, for each sent request the user needs to decide how many responses to wait for in a dynamic manner. In this case, one can naturally reformulate the utility maximization problem using the MAB model: making a decision for each sent request corresponds to a round; waiting for responses corresponds to playing arm . Let be the AoI corresponding to the th request at the user’s side when the user waits for responses. Then, the utility , normalized to interval , corresponds to the obtained reward of playing arm in round . The mean reward of arm is . Note that in this MAB formulation, the utility function is not limited to the exponential function in the form of Eq. (19); instead, can be very general, as long as it is a measurable, nonnegative, and nonincreasing of the AoI.
Recently, MAB models with side observations (also called graphical feedback) have been studied (see, e.g., [50, 51, 52]). In these models, playing an arm not only reveals the reward of the played arm but also that of some other arm(s). Such side observations are typically encoded in a feedback graph, where each node corresponds to an arm and each directed edge means that playing arm also reveals the reward of arm .
We would like to point out that the utility maximization problem with unknown system parameters can be formulated as an MAB problem with side observations. Moreover, the associated feedback graph of this problem has a special linear structure as illustrated in Fig. 4. Specifically, note that upon receiving the th response, the user has the information about the first responses. Thus, the user can know the utility she would have obtained if she had waited for only responses for all . Mapping this property to the MAB model, it means that playing arm reveals not only the reward of arm but also that of arm for all . Such special properties can be leveraged to design learning algorithms that perform exploration more efficiently and thus lead to an improved regret performance.
Learning Algorithms
There exist several wellknown learning algorithms that can address the classic MAB problem, including Greedy [53] and Upper Confidence Bound (UCB) [46, 53, 54]. In the sequel, we will introduce these algorithms and explain how to leverage the side observations and the special linear structure of the graphical feedback to design algorithms with improved regret upper bounds.
We begin with Greedy, which is a very simple algorithm and performs exploration explicitly. Specifically, it plays the arm with the highest empirical mean reward with probability (i.e., exploitation) and plays a random arm (uniformly) with probability (i.e., exploration), where decreases as . We summarize the operations of Greedy in Algorithm 1. When side observations are available, one can incorporate additional samples from side observations into the update of the empirical mean reward of nonplayed arms. We call Greedy that exploits the side observations as GreedyN. Note that GreedyN is almost the same as Greedy except that in Line 6 of Algorithm 1, one needs to update the empirical mean reward for all arms , including the nonplayed arms, accounting for side observations. Apparently, GreedyN accelerates the exploration process by taking advantage of additional samples from side observations and is expected to outperform Greedy.
Although GreedyN leverages side observations and can speed up the exploration process compared to Greedy, it still randomly chooses an arm during the exploration process, being agnostic about the structure of the feedback graph. Therefore, all the arms have to be played in the exploration phase (i.e., with probability ). The analysis in [53] suggests that samples are sufficient for accurately estimating the mean reward of an arm. This implies that both Greedy and GreedyN have a regret upper bounded by . However, many of such explorations appear unnecessary in our studied problem. This is because in our utility maximization problem, playing arm reveals a sample for every arm, due to the special linear structure of the feedback graph in Fig. 4. This suggests that one should always choose arm for exploration, which leads to a graphaware algorithm summarized in Algorithm 2. We call this algorithm GreedyLP as it turns out to be a special case of the GreedyLP algorithm proposed in [52].
(23) 
One can show that the regret of GreedyLP is upper bounded by , which improves upon of Greedy and GreedyN. This result follows immediately from Corollary 8 of [52] as the linear feedback graph in Fig. 4 is a special case of the graphs considered in [52]. Note that the improved regret upper bound relies on the assumption that GreedyLP has the knowledge of the difference between the reward of the optimal arm and that of the best suboptimal arm for choosing parameters and (see Corollary 8 of [52] for the specific form).
(24) 
Next, we consider another simple algorithm called Upper Confidence Bound (UCB). As the name suggests, UCB considers the upper bound of a suitable confidence interval for the mean reward of each arm and chooses the arm with the highest such upper confidence bound (see, e.g., Eq. (23)). There are several variants of the UCB algorithm [46, 53, 54]. We present a popular variant, called UCB1, in Algorithm 3. When side observations are available, similar to GreedyN, there is a slightly modified UCB algorithm, called UCBN [51], which incorporates additional samples from side observations into the update of the empirical mean reward and the total number of samples of nonplayed arms (i.e., Line 4 in Algorithm 3). Like GreedyN, UCBN is also agnostic about the structure of the feedback graph. In order to take the graph structure into consideration, we introduce UCBLP, which is based on another UCB variant, called UCBImproved [54]. UCBLP is a special case of the one proposed in [52]. We summarize the operations of UCBLP in Algorithm 4.
The key idea of UCBLP is the following: we divide into multiple stages. For each stage , we use to denote the set of arms not eliminated yet and use to estimate . At the beginning, set is initialized to the set of all arms; the value of is initialized to 1. We ensure that by the end of stage , there are at least samples available for each arm in set , from playing either arm or the arm itself, where is determined by (Lines 812). Then, at the end of stage we obtain set by eliminating those arms estimated to be suboptimal according to Eq. (24) and obtain by halving the value of .
Under some mild assumptions, one can show that for our studied problem with a linear feedback graph, UCBLP achieves an improved regret upper bounded of compared to of UCB1 and UCBN. This result follows immediately from Proposition 10 of [52]. Note that without the knowledge of , UCBLP achieves an improved regret upper bound that is similar to that of GreedyLP. Although UCBLP presented in Algorithm 4 requires the information about the time horizon , this requirement can be relaxed using the techniques suggested in [54, 52].
Furthermore, leveraging the linear feedback graph in Fig. 4, we propose a further enhanced UCB algorithm by slightly modifying UCBLP. We call this new algorithm UCBLFG (UCBLinear Feedback Graph) and present it in Algorithm 5. The key difference is in Line 9 (vs. Lines 812 in Algorithm 4), where denotes the largest index of arms in set . Recall that in Algorithm 4, the purpose of Lines 812 is to explore arms in set . Specifically, this is to ensure that by the end of stage , each arm in set has at least samples. Because of the linear feedback graph, this exploration step can be achieved in a smaller number of rounds. Specifically, during stage we simply play arm for times. This is because each time when arm is played, there will be a sample generated for every arm in set , thanks to the special structure of the linear feedback graph. Following the regret analysis for UCBLP in [52], we can show that UCBLFG achieves an improved regret upper bounded of . Although UCBLFG and UCBLP have the same regret upper bound, UCBLFG typically achieves a better empirical performance than UCBLP. This can be observed from the simulation results in Section VIB.
(25) 
Vi Numerical Results
In this section, we perform extensive simulations to elucidate our theoretical results. We present the simulation results for AoI minimization and AoIbased utility maximization in Sections VIA and VIB, respectively.
Via Simulation Results for AoI Minimization
We first describe our simulation settings. We consider an informationupdate system with servers. Throughout the simulations, the updating process at each server is assumed to be Poisson with rate and is i.i.d. across the servers. The user’s request for the information is generated at time , which is selected uniformly at random on interval . This means that each server has a total of updates on average.
Next, we evaluate the AoI performance through simulations for three types of response time distribution: exponential, uniform, and Gamma. First, we assume that the response time is exponentially distributed with mean . We consider three representative setups: (i) ; (ii) ; (iii) . Fig. 4(a) shows how the average AoI changes as the number of responses varies, where each point represents an average of simulation runs. We also include plots of our theoretical results (i.e., Eq. (7)) for comparison. A crucial observation from Fig. 4(a) is that the simulation results match our theoretical results perfectly. In addition, we observe three different behaviors of the average AoI performance: a) If the interupdate time is much larger than the response time (e.g., , ), then the average AoI decreases as increases, and thus, it is worth waiting for all the responses so as to achieve a smaller average AoI. b) In contrast, if the interupdate time is much smaller than the response time (e.g., , ), then the average AoI increases as increases, and thus, it is not beneficial to wait for more than one response. c) When the interupdate time is comparable to the response time (e.g., , ), then as increases, the AoI would first decrease and then increase. In this setup, when is small, the freshness of the data at the servers dominates, and thus, waiting for more responses helps reduce the average AoI. On the other hand, when becomes large, the total waiting time becomes dominant, and thus, the average AoI increases as further increases.
In Section IVC, we discussed the extension of our theoretical results to the case of uniformly distributed response time. Hence, we also perform simulations for the response time uniformly distributed on with mean . Fig. 4(b) presents the average AoI as the number of responses changes. In this scenario, the simulation results also match our theoretical results (i.e., Eq. (15)). Also, we observe a very similar phenomenon to that in Fig. 4(a) on how the average AoI varies as increases in three different simulation setups.
In addition, Fig. 4(c) presents the simulation results for the response time with Gamma distribution, which can be used to model the response time in relay networks[55]. Specifically, we consider a special class of the Gamma() distribution that is the sum of i.i.d. exponential random variables with mean (which is also called the Erlang distribution). Then, the mean response time is equal to . We fix in the simulations. In this case, although we do not have any analytical results, the observations are similar to that under the exponential and uniform distributions.
Finally, we investigate the impact of the system parameters (the updating rate, the mean response time, and the total number of servers) on the optimal number of responses and the AoI improvement ratio, defined as . The AoI improvement ratio captures the gain in the AoI reduction under the optimal scheme compared to a naive scheme of waiting for the first response only.
Fig. 5(a) shows the impact of the updating rate . We observe that the optimal number of responses decreases as increases. This is because when the updating rate is large, the AoI diversity at the servers is small. In this case, waiting for more responses is unlikely to receive a response with much fresher information. Therefore, the optimal scheme will simply be a naive scheme that waits for the first response only, when the updating rate is relatively large (e.g., ). Fig. 5(b) shows the impact of the mean response time . We observe that the optimal number of responses increases as increases. This is because when is large (i.e., when the mean response time is small), the cost of waiting for additional responses becomes marginal, and thus, waiting for more responses is likely to lead to the reception of a response with fresher information. Fig. 5(c) shows the impact of the total number of servers . We observe that both the optimal number of responses and the improvement ratio increase with . This is because an increased number of servers leads to more diversity gains both in the AoI at the servers and in the response time. As we discussed at the end of Section IVB, the optimal solution scales with as the number of servers becomes large.
ViB Simulation Results for AoIbased Utility Maximization
In this subsection, we consider the maximization of an AoIbased exponential utility function (in Eq. (19)) and present simulation results for the utility performance under the same settings as in Section VIA. We first evaluate the utility performance in the setting with known system parameters. Then, we evaluate various learning algorithms described in Section VC3, when the system parameters are unknown.
Utility Maximization with Known Parameters
The setups we consider are exactly the same as those in Section VIA, except that we now focus on the utility performance instead of the AoI performance. In Fig. 7, we present the simulation results for the average utility performance with a varying number of responses in three representative setups. The observations are also similar, except that the utility has an opposite trend compared to the AoI. This is because the utility is a nonincreasing function of the AoI.
Similarly, we also investigate the impact of the system parameters on the optimal number of responses (with respect to utility maximization) and the utility improvement ratio, defined as . The utility improvement ratio captures the gain in the utility improvement under the optimal scheme compared to a naive scheme of waiting for the first response only. We present the results in Fig. 8, from which we can make similar observations to those from Fig. 6 for AoI minimization.
Utility Maximization with Unknown Parameters
Next, we consider a more realistic scenario where the system parameters (i.e., the updating rate and the mean response time) are unknown to the user. Given that the overall behaviors are similar for different types of response time distributions (see Fig. 7), in the following evaluations we will focus on the case where the update process is Poisson with rate and the response time is exponentially distributed with mean .
As in Section VIA, we assume and consider three representative setups: (i) ; (ii) ; (iii) .
We evaluate the regret performance of two classes of learning algorithms we introduced in Section VC3: the Greedy algorithms (i.e., Greedy, GreedyN, and GreedyLP) and the UCB algorithms
First, algorithms that take advantage of side observations generally outperform their counterparts that do not use side observations. That is, GreedyN and UCBN outperform Greedy and UCB1, respectively. This is because additional samples from side observations can help accelerate the learning process.
Second, although graphaware algorithms can achieve improved regret upper bounds, their empirical performances may or may not be better than that of their graphagnostic counterparts. That is, GreedyLP and UCBLP may or may not be better than GreedyN and UCBN, respectively. Consider the Greedy algorithms for example. In Setup (i), GreedyLP slightly outperforms GreedyN. This is because in the phase of exploration, GreedyLP always chooses arm , which happens to be the best arm. However, in Setup (ii), GreedyLP performs worse than GreedyN. This is because arm is no longer the best arm, and in fact, it can be much worse than the optimal arm. This phenomenon is exacerbated in Setup (iii), where arm is the worst arm. Among all the considered UCB algorithms, UCBN has the best empirical performance. This is because UCBLP and UCBLFG are modified from UCBImproved, which is an “armelimination” algorithm and is very different from UCB1, from which UCBN is modified. Although UCBImproved has a better regret upper bound with a smaller constant factor, it has a much worse empirical performance than UCB1 in the setups we consider. Therefore, it is not surprising that UCBN has a better empirical performance than UCBLP and UCBLFG.
Third, UCBLFG typically outperforms UCBLP. This is expected because UCBLFG is a further enhanced version of UCBLP. Specifically, UCBLFG explicitly exploits the linear structure of the feedback graph and can accelerate the learning process by reducing the number of rounds for exploration.
Finally, GreedyN seems to be quite robust and has a very good empirical performance in all the setups we consider.
Vii Conclusion
In this paper, we introduced a new Pull model for studying the problems of AoI minimization and AoIbased utility maximization under the replication schemes. Assuming Poisson updating process and exponentially distributed response time, we derived the closedform expression of the expected AoI at the user’s side and provided a formula for computing the optimal solution. We also derived a set of similar theoretical results for the utility maximization problem. Furthermore, we considered a more realistic scenario where the user has no knowledge of the system parameters. In this setting, we reformulated the utility maximization problem as a stochastic MAB problem with side observations. Leveraging the special linear structure of the feedback graph associated with side observations, we introduced several learning algorithms, which outperform those basic algorithms that are agnostic about such properties. Not only did our work reveal a novel tradeoff between different levels of information freshness and different response times across the servers, but we also demonstrated the power of waiting for more than one response in minimizing the AoI as well as in maximizing the utility at the user’s side.
a Proof of Theorem 3
Proof.
Note that using Eq. (17), the expected utility can be computed based on the probability density function of the AoI. For the exponential utility function (19), however, we have the following more intuitive way of computing the expected utility.
To begin with, we rewrite the expected utility as follows:
(26) 
where the first equality is from Eq. (19), the second equality is from Eq. (5), and the last equality is due to that and are independent.
Then, we want to derive the expression for each of the two terms in the last line of Eq. (26).
First, we want to show . Note that for an exponential random variable with mean , it is easy to show the following:
(27) 
Recall from the proof of Theorem 1 that is an exponential random variable with mean for any and that the exponential random variables ’s are all independent [43]. Then, we can derive the following:
(28) 
where the last equality is from Eq. (27).
Next, we want to show . Recall that is an exponential random variable with mean . Then, this is straightforward due to Eq. (27).
Combining the above results, we complete the proof. ∎
B Proof of Theorem 4
Proof.
We first define as the ratio of the expected utility between the and replication schemes, i.e., for any . From Eq. (20), we have the following:
(29) 
for any . It is easy to see that is a monotonically decreasing function of .
We now extend the domain of to the set of positive real numbers and want to find such that . With some standard calculations and dropping the negative solution, we derive the following:
(30) 
Next, we discuss two cases: (i) and (ii) .
In Case (i), we have . This implies that for all , as is monotonically decreasing. Hence, the expected utility, , is a monotonically increasing function for . Therefore, must be the optimal solution.
In Case (ii), we have . We consider two subcases: is an integer in and is not an integer.
If is an integer in , we have for and