Equilibrium and Learning in Queues with Advance Reservations

Equilibrium and Learning in Queues with Advance Reservations

Eran Simhon and David Starobinski
Boston University
College of Engineering
Abstract

Consider a multi-class preemptive-resume queueing system that supports advance reservations (AR). In this system, strategic customers must decide whether to reserve a server in advance (thereby gaining higher priority) or avoid AR. Reserving a server in advance bears a cost. In this paper, we conduct a game-theoretic analysis of this system, characterizing the equilibrium strategies. Specifically, we show that the game has two types of equilibria. In one type, none of the customers makes reservation. In the other type, only customers that realize early enough that they will need service make reservations. We show that the types and number of equilibria depend on the parameters of the queue and on the reservation cost. Specifically, we prove that the equilibrium is unique if the server utilization is below 1/2. Otherwise, there may be multiple equilibria depending on the reservation cost. Next, we assume that the reservation cost is a fee set by the provider. In that case, we show that the revenue maximizing fee leads to a unique equilibrium if the utilization is below 2/3, but multiple equilibria if the utilization exceeds 2/3. Finally, we study a dynamic version of the game, where users learn and adapt their strategies based on observations of past actions or strategies of other users. Depending on the type of learning (i.e., action learning vs. strategy learning), we show that the game converges to an equilibrium in some cases, while it cycles in other cases.

1 Introduction

Many services, such as health care, cloud computing and banking, combine both a first-come-first-served policy and advance reservations (AR).  Advance reservations benefit a service provider since knowledge about future demand can improve resource management and quality-of-service (e.g., Charbonneau and Vokkarane (2012)). Customers are also motivated to reserve in advance, since it decreases their expected waiting time. However, typically, reservations bear an additional cost for customers. This cost can be a reservation fee, the time or resources required for making the reservation, the cost of financing advance payment, or the cost of cancellation if needed.

Since the decision of a customer, about reserving a server in advance or not, affects the waiting time of other customers, game theory is the solution of choice for studying such systems. Although there exists a rich literature on advance reservations, works that study advance reservation systems as a game are rare. The strategic behavior of customers in a system that support AR is studied in Simhon and Starobinski (2014) and Simhon et al. (2015). These two papers study a loss system, i.e., a system with no queue. In this paper, instead, we focus on a queueing system (i.e., customers that encounter a busy server wait for service). This leads to a different model and, interestingly, more explicit results. We show that the server utilization (traffic load) plays in key role in the behavior of the system and, specifically, in the number of equilibria.

We assume that the time axis is divided into two time-periods: a reservation period and a service period. This restriction simplifies the analysis and is common in the literature of advance reservations (e.g., Virtamo (1992), Yessad et al. (2007) and Syed et al. (2008)). It can also be found in real life applications. For example, some service providers do not allow same-day-reservations.

During the reservation period, each customer realizes that he/she will need service at a specific future time point. Upon such a realization, the customer decides whether or not to make a reservation. Customers are assumed to be strategic and rational. Thus, a customer will make a reservation only if it reduces his/her expected total cost which consists of the reservation cost (if making a reservation) and the cost of waiting.

We start the analysis by finding the equilibrium structure of the game. We show that there are two possible types of equilibria. In the first type, none of the customers makes AR, while in the second type customers that realize early enough that they will need future service make AR. We refer to those two types of equilibria as none-make-AR and some-make-AR, respectively. We show that if the utilization of the queue (i.e., the ratio between the arrival rate and the service rate) is smaller than , then the game has a unique equilibrium. Low AR costs lead to a some-make-AR equilibrium, while high AR cost lead to a none-make-AR equilibrium. If the utilization is greater than , however, there also exists a middle range of AR cost such that any cost in that range leads to three equilibria, namely one none-make-AR and two some-make-AR equilibria.

Next, we assume that the AR cost is a fee charged by the service provider. We analyze the game from the prospective of a provider aiming to maximize its revenue from AR fees. We show that if the utilization is greater than , then the revenue maximizing fee leads to multiple equilibria. Thus, charging that fee may yield the highest possible revenue for the provider but possibly also no revenue.

Finally, we study a dynamic version of the game. We use best response dynamics (as in Fudenberg (1998)) and distinguish between strategy-learning and action-learning. In strategy-learning, customers obtain information about strategies adopted at previous steps, while in action-learning, customers estimate the previous strategies by obtaining information about the actions taken at previous steps. Our analysis shows that starting with any initial belief about customers behavior (i) when implementing strategy-learning, the system always converges to an equilibrium; (ii) when implementing action-learning, the system converges to a none-make-AR equilibrium if it exists and cycles otherwise; (iii) if the equilibrium is unique, more customers, on average, make reservations under action-learning than under strategy-learning.

The rest of the paper is structured as follows. In Section 2, we review related work. In Section 3, we formally define the game. In Section 4, we find the equilibrium structure of the game. In Section 5, we derive the revenue maximizing fee and resulting equilibria. In Section 6, we define and analyze dynamic versions of the game. Section 7 concludes the paper and suggests directions for future research.

2 Related Work

Strategic behavior in queues (also known as queueing games) was pioneered by Naor (1969) and has been studied extensively since. In that seminal paper, the author studies an queue where customers decide whether to join or balk after observing the queue length. Hassin and Haviv (2003) and Hassin (2016) conduct an extensive review of the field of queueing games. Most related to our work, Balachandran (1972) analyzes strategic behavior in priority queues and Qiu and Zhang (2016) and Hayel et al. (2016) study strategic behavior in queues. None of these works consider advance reservations.

Advance reservations have been researched from various other perspectives in the literature, including scheduling and routing algorithms for communication networks, methods for revenue maximization, and performance analysis of queueing systems. The work in Wang et al. (2013) describes a distributed architecture for advance reservation, while Smith et al. (2000) proposes a scheduling model that supports AR and evaluates several performance metrics. The work in Virtamo (1992) analyzes the impact of advance reservations on server utilization under a stochastic arrival model, and Guérin and Orda (2000) analyzes the effect of AR on the complexity of path selection. In Weatherford (1998), the author reviews models for revenue management of perishable assets, such as airline seats and hotel rooms, that extend to various industries. The work in Reiman and Wang (2008) considers admission control strategies in reservation systems with different classes of customers, while Bertsimas and Shioda (2003) deals with policies for accepting or rejecting restaurant reservations. The effects of overbooking, cancellations and regrets on advance reservations are studied in Liberman and Yechiali (1978), Quan (2002), Nasiry and Popescu (2012). None of these prior works considers the strategic behavior of customers in making AR, namely, that decisions of customers are not only influenced by prices and policies set by providers but also by their beliefs about decisions of other customers.

Simhon and Starobinski (2014) introduces AR games. In that paper, the authors consider a loss system (i.e., customers that finds all servers busy leave). The authors show that the game may have multiple equilibria, where in one equilibrium the number of reservations is a random variable, while in the other equilibrium, none of the customers makes reservation. In Simhon et al. (2015), the authors study a dynamic version of the game. The main difference between the model of our paper and the model presented in Simhon and Starobinski (2014) and Simhon et al. (2015) is that our paper focus on a queuing system, while these papers focus on a loss system. Specifically, our paper shows that the server utilization plays a key role in determining the number and structure of equilibria. The characterization of the equilibrium strategies in our paper is also much more explicit than that provided in Simhon and Starobinski (2014).

The concept of learning an equilibrium is rooted in Cournot’s duopoly model Cournot (1897) and has been extensively researched since. Traditionally, learning models are used for fixed-player games (i.e., the same players participate at each iteration), see Lakshmivarahan (1981), Fudenberg (1998) and Milgrom and Roberts (1991). Several papers have focused on learning under stochastic settings. For example, in Liu and van Ryzin (2011) customers choose between buying a product at full price or waiting for a discount period. Decisions are made based on observing past capacities. Altman and Shimkin (1998) analyze a processor sharing model. In this model, customers choose between joining or balking after observing the history. Zohar et al. (2002) present a model of abandonment from unobservable queues. The decision is based on the expected waiting time which is formed through accumulated experience. Fu and van der Schaar (2009) assume that the same set of players participate in a bid for wireless resources at each stage. However, the number of packets that need to be transmitted at each iteration is a random variable.

Different learning models differ by their learning rules. A learning rule defines what kind of information players gain and how they use it. In this paper, we focus on best response dynamics. According to this rule, which is rooted in Cournot’s work, players observe the most recent actions adopted by other players and assume that the same actions will be adopted at the next step. Another popular learning rule is fictitious play which assumes that at each iteration, players observe actions made by other players at all previous steps and best-respond to the empirical frequency of observed actions. This rule was suggested by Brown (1951). In contrast, Littman (1994) and Tan (1993) assume that players only observe their own payoffs and learn by trial and error. Reinforcement learning is an example of such a learning rule.

Other relevant work includes Niu et al. (2012), which presents a theoretical model for pricing cloud bandwidth reservations, in order to maximize social welfare. The reservation fee of each customer is a function of his/her guaranteed portion instead of the actual amount of resources reserved, as considered in our models as well as many practical services. In Menache et al. (2014), the authors consider the problem of deciding which type of service a customer should buy from a cloud provider. More specifically, that study considers two options: on-demand, which means paying a fixed price for service, and spot, a service offered by Amazon EC2 that allows users to bid for spare instances. They propose a no-regret online learning algorithm to find the best policy. Our approach complements this work in several ways. First, our framework considers advance reservations (similar to Reserved Instances in Amazon EC2). Second, our models integrate the strategic behavior of all participants (i.e., both the customers and the provider).

3 Game Description

We consider a preemptive-resume queue that supports advance reservations. In our model, there is a reservation period which covers . Each customer is associated with a request time and a desired service starting time (shortly noted as arrival time) . That is, if , then customer has the opportunity to reserve the server before customer . If , then customer wishes to be served earlier than customer . The service period starts only after the reservation period ends. The request time can be interpreted as how much time in advance a customer realizes that he/she will need service at a future time point.

The request times are derived from a general continuous distribution with cumulative distribution function . The arrivals follow a Poisson process with rate . The service time is and we assume that .

Each customer, at his/her request time, decides whether to make a reservation or not. We denote those two actions by and , respectively. If a customer makes a reservation but his/her desired service time is already reserved, the nearest future available time will be reserved for that customer. A customer that does not make a reservation is served on a first-come-first-served basis along periods of times over which the server is not reserved.

The total cost of each customer consists of the reservation cost (if making AR) and the cost of waiting which is a linear function of the waiting time. Without loss of generality, we assume that the cost of waiting is equal to the waiting time. Note that the waiting time when making AR is smaller than when not making AR. However, it may be greater than zero, since it is possible that the server is already reserved at the desired service time. For simplification, we assume that the service period is long enough such that we can ignore the transient phase before the queue reaches its steady state.

In a preemptive-resume queue, if a job is interrupted, then it later resumes and is not restarted. Due to this property, if the server is idle and a customer is waiting for service, the customer will be served even if service cannot be completed due to an existing reservation (in this case the service will be preempted and later resumed). Hence, supporting advance reservations in a preemptive-resume queue does not impact the utilization of the server which is . Figure 1 illustrates the model.

\includegraphics

[scale=1.2]example.ps

Figure 1: An illustration of the model with three customers. Customer makes a reservation at time , and is served upon arrival at . Customer also makes a reservation and is served upon arrival, but his/her service is preempted by customer which made a reservation earlier. Customer is served only when the service of customer is completed.

Note that customers do not know a-priori what will be their waiting time if making or if not making AR. The decision is based on statistical information only, namely the values of , and . However, once a customer decides to make a reservation, the system can provide him/her with the start and end times of the service.

4 Equilibrium Analysis

We can analyze this system as a priority queue where a priority between (lowest priority) and (highest priority) is assigned to each customer. A customer with request time has priority if not making AR and priority if making AR. Customers that share the same priority are served on a first-come-first-served basis. We refer to as the potential priority. Due to the probability integral transformation theorem (Dodge 2006, p. 320), we know that is a random variable, uniformly distributed in .

Since customers are statistically identical, we consider only symmetrical behavior. Thus, a decision of a tagged customer is a mapping of his/her potential priority to the probability of making AR. We denote this strategy function by . Consider a tagged customer with potential priority . We define to be a mapping of the strategy followed by the rest of the customers and the priority of the tagged customer to his/her expected waiting time. Thus, the expected waiting time of the tagged customer is if making AR and otherwise. Since customers are strategic, a customer with potential priority will make only if

(1)

Next, we define a threshold strategy and show that this is the only strategy that can lead to equilibria.

Definition 1.

Let . A strategy function is said to be a threshold strategy if it satisfies

Lemma 1.

At equilibrium, all customers follow a threshold strategy.

Proof.

Consider any strategy function . Since the expected waiting time is non-increasing with the priority, there is either a single potential priority, or an interval of potential priorities, or no potential priority such that

(2)

Note that the left hand side of Eq. (2) is the expected total cost if making AR, while the right hand side is the expected total cost if not making AR. If Eq. (2) holds for a single value , then a customer with potential priority greater (respectively, smaller) than is better off making (respectively, not making) AR. Therefore, is an equilibrium strategy only if it is a threshold strategy with threshold .

If Eq. (2) holds for an interval of values , then all customers with potential priority do not make AR (otherwise, would not be a constant over that interval). Therefore, is an equilibrium strategy only if it is a threshold strategy, with threshold .

Finally, suppose that Eq. (2) does not hold for any . If for all , then all customers are better off not making AR. Therefore, is an equilibrium strategy only if it is a threshold strategy, with threshold .

Note that a situation where for all does not exist, since a customer with potential priority zero has the same expected waiting time if making or avoiding AR. ∎

Next, we define two types of equilibria.

Definition 2.

An equilibrium strategy with threshold is called a some-make-AR equilibrium if .

Definition 3.

An equilibrium strategy with threshold is called a none-make-AR equilibrium if .

Since the structure of the equilibrium depends on the reservation cost, we aim to determine the equilibrium to which a given reservation cost leads. Given that all customers follow a threshold strategy, we define a threshold customer to be a customer with potential priority equals exactly to the threshold followed by all other customers.

Given a strategy with threshold , a threshold customer that makes AR observes three priority classes:

  1. A lower priority class which contains all customers with potential priority smaller than the threshold customer (none of them makes AR). The arrival rate of customers belonging to this class is .

  2. A priority class which contains only the threshold customer (since the potential priority is a continuous random variable, the probability that two customers will have the same potential priority is zero). Thus, the arrival rate of customers belonging to this class is .

  3. A higher priority class which contains all customers with greater potential priority (they all made AR before the threshold customer). The arrival rate of customers belonging to this class is .

A threshold customer that does not make AR only observes two classes:

  1. A priority class which contains the threshold customer and all customers with smaller potential priority. The arrival rate of customers belonging to this class is .

  2. A higher priority class which contains all customers with greater potential priority. The arrival rate of customers belonging to this class is .

Based on the priority classes defined above, we find the expected waiting of the threshold customer if making or not making AR. We apply the known formula of the waiting time in an queue with preemptive-resume priorities (Conway et al. 2012, p.175) and obtain the following:

  1. The expected waiting time of the threshold customer if making AR is

    (3)
  2. The expected waiting time of the threshold customer if not making AR is

    (4)

The condition for threshold to be a some-make-AR equilibrium is

(5)

That is, a customer with potential priority equals to the threshold is indifferent between the two actions. The condition for threshold to be a none-make-AR equilibrium is

(6)

That is, a customer with potential priority (and hence, all customers) are better off not making AR.

By isolating in Eq. (5), we define to be a function that maps a threshold to the reservation cost that leads to that threshold

(7)

We conclude that given reservation cost , the threshold represents a some-make-AR equilibrium if and only if . The threshold represents a none-make-AR equilibrium if and only if . In order to find the equilibrium structure, we next find the properties of .

Lemma 2.

If , then is a monotonically increasing function. If , then is a unimodal function with a global maximum.

Proof.

First, we compute the derivative of :

(8)

Since the denominator is negative for any , the sign of the derivative is determined by the sign of . If , then this expression is negative for any and the derivative of is positive for any . If , then the derivative of is positive for any ; is equal to zero at ; and negative otherwise. Thus, for any value of , is unimodal with a global maximum. ∎

Next, we define:

(9)

and

(10)

Note that if , then is the maximum value of and if , then is the maximum value of . We can now state the main result of this section:

Theorem 1.

The game has the following equilibrium structure.
When :

  • If , then there is a unique some-make-AR equilibrium.

  • If , then there is a unique none-make-AR equilibrium.

When :

  • If , then there is a unique some-make-AR equilibrium.

  • If , then there are two some-make-AR equilibria and a none-make-AR equilibrium.

  • If , then there is a unique none-make-AR equilibrium.

Proof.

We begin with . If , then there is a single value of such that has a solution. Hence, there is one some-make-AR equilibrium. A none-make-AR equilibrium does not exist since . If , then there is no value of such that has a solution. Hence, a some-make-AR equilibrium does not exist. On the other hand, a none-make-AR equilibrium exists since .

Next, consider . In the range , the function is monotonically increasing. Thus, if , then there is a single value of such that has a solution, and hence there is one some-make-AR equilibrium. In the range , the function is unimodal. Thus, if , then there exist two values of that solve , and hence there are two some-make-AR equilibria. The condition for the existence of a none-make-AR equilibrium is the same as in the case of . ∎

5 Revenue Maximization

In this section, we assume that the reservation cost is a fee determined by the service provider. We show that the fee that maximizes the revenue leads to a unique equilibrium if the utilization is smaller than and to multiple equilibria if the utilization is greater than . We also show that the revenue from AR fee depends only on the utilization of the queue and the fee itself. Thus, if the demand and the number of servers increase proportionally, then the revenue from AR fees does not change.

The revenue per time unit, at equilibrium with threshold , is the number of customers making AR multiplied by the AR fee that leads to that equilibrium. The expected revenue is

(11)

With some manipulation, we get that the revenue function does not depend on the values of and but only on the utilization :

(12)

At first glance, this result seems surprising since it implies that the revenue does not increase when scaling the system (i.e., increasing both arrival and service rates). However, in an queue, the waiting time decreases as the system gets larger, and hence customers are less motivated to make AR. Therefore, scaling the system has a trade off. For a given threshold, as we scale the system, more customers will make AR but they will pay a smaller fee.

By solving the equation , we find that the optimal threshold is . By substituting into Eq. (7), we get that the optimal fee is

(13)

Similarly, by substituting into Eq. (12), we get that the maximum possible revenue is

(14)

Next we find the number of equilibria when .

Theorem 2.

The revenue maximizing fee leads to a unique some-make-AR equilibrium if and to multiple equilibria, including a none-make-AR equilibrium, otherwise.

Proof.

The optimal reservation cost leads to multiple equilibria only if and (see Theorem 1). Using Eq. (9) and Eq. (13), we deduce that if , then

(15)

One can show that the inequality above holds only if . ∎

Figure 2 illustrates the game outcome when .

\includegraphics

[scale=0.4]rev-ex1.ps

\thesubsubfigure ,
\includegraphics

[scale=0.4]rev-ex2

\thesubsubfigure ,
Figure 2: When the utilization , the optimal fee leads to a unique equilibrium (a). When , leads to multiple equilibria (b).

5.1 Price of Conservatism

Assuming that , the provider can either be risk-averse and charge a fee that leads to a unique equilibrium with guaranteed revenue, or it can be risk-taking and charge a higher fee that may lead to greater revenue but also to zero revenue. To compare between the two options, we use the Price of Conservatism (PoC) metric, which was introduced in Simhon and Starobinski (2017). PoC is the ratio between the maximum possible revenue and the maximum guaranteed revenue , which is defined as follows.

(16)

Since has exactly one extreme point (which is ), it is increasing in the range . Therefore, the maximum guaranteed revenue is achieved when choosing the largest for which . In other words, should be slightly smaller than . By solving , we get two solutions: and

(17)

By substituting into Eq. (12), we get

(18)

and by dividing by , we get

(19)

We conclude with the following theorem.

Theorem 3.

If , then . Else, .

Figure 3 shows the maximum possible revenue and the maximum guaranteed revenue in a system with parameters and .

\includegraphics

[scale=1.2]MD1_POC.ps

Figure 3: The maximum possible revenue and the maximum guaranteed revenue in a system with parameters and .

By computing the derivative of PoC with respect to , we get that for any

(20)

Thus, we obtain the following corollary:

Corollary 1.

The price of conservatism increases with the utilization.

That is, as the utilization increases, the ratio between the potential revenue when the provider is risk-taking and the revenue when the provider is risk-averse increases and tends to as .

6 Dynamic Games

6.1 Learning Models

In this section, we study dynamic versions of the game. In dynamic games (also known as learning models, since players learn over time the behavior of other players), it is assumed that the game repeats many times and that initially players do not necessarily follow an equilibrium strategy. The goal is to find the long-term behavior of the customers. In our analysis, we use a best response dynamic model which is rooted in Cournot study of duopoly Cournot (1897). we next describe the learning models.

At each step (game), a new set of customers participate (or the same set of participants but with new realizations of request times). At the first step, all customers have an initial belief about the strategy that is followed by all customers. Next, we assume:

Assumption 1.

Customers that are indifferent between actions and choose action AR’.

Based on this assumption, and using the proof of Lemma 1, one can show that the best response of all customers to any initial belief is a threshold strategy. In order to simplify the analysis and since a threshold strategy is followed at all steps, we also assume:

Assumption 2.

The initial belief is a threshold strategy.

We denote by the threshold of the strategy followed at step . We denote by the estimation of this strategy and we distinguish between two types of learning:

  1. Strategy learning. In this type of learning, the analysis assumes that at each step , . That is, customers observe past strategies.

  2. Action learning. In this type of learning, customers observe previous actions and use the proportion of customers that chose at the previous step as an estimation of the strategy that was followed at that step. Namely, if the demand and the number of reservations at step are and respectively, then

    (21)

Since the best response of all customers to any belief is a threshold strategy, we can define a joint best response function . The input is a belief about the threshold strategy that will be followed by all customers. The output is the best response threshold to that belief. Thus, we can describe the best response dynamics of the game as the following process:

(22)
(23)

where represents the initial belief. Note that under strategy-learning this process is deterministic, while under action-learning this process is a Markov process (Gardiner et al. 1985, Chapter 3). In the following sections we analyze this dynamic process.

Next, we focus on the behavior of customers at a given step. Thus, we remove the subscript . We begin the analysis with the following observations:

  1. Given a belief (i.e., assuming that all other customers follow the threshold ), if a tagged customer with potential priority chooses , then all customers with greater potential priority have higher priority and all customers with smaller potential priority have lower priority. Therefore, the (believed) expected waiting time of the tagged customer is equal to the expected waiting time of a threshold customer that chooses in a system where all customers follow the threshold . Hence,

    (24)

    where is defined in Eq. (3).

  2. Given a belief , if a tagged customer with potential priority chooses , then his/her (believed) expected waiting time is the same as the expected waiting time of the threshold customer (recall that each customer believes that he/she is the only one deviating). Hence,

    (25)
  3. The expected waiting time of all customers that choose are equal. Hence,

    (26)

    where is defined in Eq. (4).

Those properties will be used later to prove our main results. Next, we separately explore the case of a unique some-make-AR equilibriun and the case of multiple equilibria.

6.2 Learning with Unique Some-make-AR Equilibrium

Consider a some-make-AR equilibrium with equilibrium threshold . By computing the derivative of and , one can verify that both functions are decreasing with . This property will be used in the proof of the following lemma.

Lemma 3.

Under a unique some-make-AR equilibrium:

  1. If a belief , then .

  2. If a belief , then .

Proof.

From Eq. (3) and Eq. (4), we deduce that . Since, under unique some-make-AR equilibrium, and intersect once, we conclude that

(27)

and

(28)

For proving part 1, assume that . Since , and since is a decreasing function we deduce that

(29)

From Eq. (27) and Eq. (29) we deduce that there exists a such that

(30)

Given the equation above and using Eq. (24) and Eq. (25), one can see that all customers with potential priority choose , while all customers with potential priority choose . This complete the proof of the first part of the lemma.

Now, let assume that . Based on Eq. (24) and Eq. (25) and since is a decreasing function, we deduce that

(31)

From Eq. (28) and Eq. (39), we deduce that

(32)

That is, all customers are better off choosing and the best response to is . ∎

Next, we study the long-term outcome of the dynamic game and establish the following result.

Theorem 4.

A game with unique some-make-AR equilibrium converges to equilibrium under strategy-learning and cycles under action-learning.

Proof.

We begin with the first part of the theorem. Let assume that the initial belief . From Lemma 3, we deduce that . Hence, . By induction, we deduce that, for any ,

(33)
(34)

The set is a monotonically increasing sequence bounded by . Thus, it has a limit, denoted by . since , and sense , we conclude that the limit is a fixed point of , and hence it must be the equilibrium point .

Next, we assume that . In this case, based on Lemma 4, and the game converges to equilibrium as in the case of .

From Eq. (21), we deduce that, under action-learning, at any step , if , then (i.e., if the strategy followed at step is greater than zero then there is a positive probability that the fraction of customers not making AR will be greater than ). Once , then . Thus, customers strategy cycles between and . ∎

Next, we determine, for a given reservation cost, whether a service provider who wishes to maximize the number of reservations is better off under strategy-learning or under action-learning.

Theorem 5.

In a dynamic game with a unique some-make-AR equilibrium, the average number of customers making AR under action-learning is greater than under strategy-learning.

Proof.

Denote the unique equilibrium by . Consider an arbitrary step and assume that action-learning is applied. If at step the strategy , then all customers will choose at the next step. If , then . Thus, in any realization, the strategy followed by all customers in all steps is a random variable that takes values between and . In strategy-learning, the strategy followed by all customers converges to . Thus, the average fraction of customers not making converges to a value between and under action-learning and to under strategy-learning. ∎

Next, we present a simulated example that compares between the revenue under action-learning and under strategy-learning. The pseudo-code of the simulation is given in Algorithm 1. The inputs of the procedure are the arrival rate , the initial belief , the reservation cost and the number of steps .

  
  for  to {iterating over all steps} do
      {variable counting the number of reservations} generate Poisson random variable {the number of customers}
     for  to {iterating over all customers} do
         generate random variable from U(0,1) {the potential priority}
        if  {check if the potential priority is greater than the current belief} then
           if  {check if the customer is better off making AR} then
               {increase the number of reservations by one}
           end if
        else
           if  {check if the customer is better off making AR} then
               {increase the number of reservations by one}
           end if
        end if
     end for
     if strategy-learning then
        
{compute the current strategy}
     end if
     if action-learning then
         {estimate the current strategy}
     end if
  end for
Algorithm 1 Learning Simulation ()
Example 1.

Consider a queue with parameters and . Let the reservation cost be . The unique equilibrium (computed using Eq. (5) and Eq. (6)) is (i.e., on a static game, on average, of the customers make AR). We run a simulation of steps. Each step lasts for one time unit (i.e., the average demand at each step is ). We set the initial belief to be . The average number of reservations per time unit is (i.e., on average, make ) under strategy-learning and (i.e., on average, make ) under action-learning. We conclude that, as Theorem 5 states, when customers base their decisions on historic actions and not strategies, more customers make AR. Statistical analysis (one-tailed t-test) shows that the difference between the mean number of reservations under action-learning and under strategy-learning is statistically significant, with confidence level of . In Figure 4, we plot the number of reservations, under action-learning and under strategy-learning. We use the same realization of customer arrivals in each case and we can see that at each iteration, the number of reservations is greater (or equal) under action-learning.

\includegraphics

[scale=1.2]rplot.ps

Figure 4: Simulation results. In a game with unique equilibrium, more customers make AR under action-learning than under strategy-learning.

We conclude that if the provider interest is that as many customers as possible will make reservations, then it is better off if customers gain information about previous actions rather than strategies.

6.3 Learning with Multiple Equilibria

Lemma 4.

In a game with multiple equilibria with thresholds , and :

  1. If a belief , then .

  2. If a belief , then .

  3. If a belief , then .

Proof.

From Eq. (3) and Eq. (4), we deduce that . Since, under unique some-make-AR equilibrium, and intersect twice at and , we conclude that

(35)

and

(36)

For proving part 1, assume that . Since , and since is a decreasing function we deduce that

(37)

From Eq. (35) and Eq. (37) we deduce that there exists a such that

(38)

Given the equation above and using Eq. (24) and Eq. (25), one can see that all customers with potential priority choose , while all customers with potential priority choose . This complete the proof of the first part of the lemma.

Now, assume that . Based on Eq. (24) and Eq. (25) and since is a decreasing function, we deduce that

(39)

From Eq. (36) and Eq. (39), we deduce that

(40)

That is, all customers are better off choosing and the best response to is . The third part of the lemma can be proved using the same arguments as in the proof of the first part of the lemma. ∎

Next, we study the long-term outcome of a dynamic game with multiple equilibria.

Theorem 6.

A game with multiple equilibria converges to some-make-AR or none-make-AR equilibrium (depend on the initial belief) under strategy-learning and to none-make-AR equilibrium under action-learning.

Proof.

Using the same arguments as in the proof of Theorem 6 and based on Lemma 4, one can show the following. Under strategy-learning, a game with initial belief converges to , while a game with initial belief converges to .

Under action-learning, if at some step , and none-make-AR is an equilibrium, then at all future steps all customers will keep not making AR. Given any threshold strategy followed by all customers, there is a positive probability that the potential priority of all customers will be smaller than , and hence none of the customers will make AR. If the game repeats infinite many times, then with probability one, at some point, none of the customers will make AR and the game will converge to none-make-AR equilibrium. ∎

Example 2.

Consider a queue with parameters and . Let the reservation cost be . Using Eq. (5) and Eq. (6) we compute the set of equilibria: . We set three different initial strategies: and . We apply strategy-learning. As Figure 5 shows, within a few steps, the system converges to an equilibrium.

\includegraphics

[scale=1.2]ex2Plot.ps

Figure 5: Convergence to equilibrium under strategy-learning

6.4 Profit Maximization in Dynamic Games

In this section, we assume that the reservation cost is a fee collected by the provider. Our goal is to find the fee that maximizes the provider revenue in the dynamic games setting. Under action learning, any fee that leads to multiple equilibria will eventually lead to zero revenue. Hence, we focus, in this section, on strategy-learning.

Under strategy-learning with multiple equilibria, the initial belief determines to which equilibrium the game will converge. To execute the analysis, we assume that the initial belief is a continues random variable that takes values between zero and one.

Consider a game with multiple equilibria . From Lemma 3, Lemma 4 and Theorem 6, we deduce that if the initial belief is in , then the game converges to , otherwise it converges to (i.e., zero reservations). Thus, with probability the strategy converges to and with probability it converges to . Thus, the excepted revenue of the dynamic game at steady state is

(41)

where is defined in Eq. (5). Since the expected revenue depends on both and , we next find the relation between those two thresholds. By manipulating the equation (see Eq. (7) for definition of ), we get the following relation.

(42)

Given the distribution of and using Eq. (41), Eq. (42) and Eq. (12), one can find the value of that maximizes the revenue and, in turn, the optimal fee. For instance, let assume that is uniformly distributed in . In this case, the revenue as a function of is

(43)

By computing the derivative of with respect to , one can show that it decreases with . Thus, when considering multiple equilibria, the optimal value of is . From Eq. (17) we know that this value is and is obtained when

Combining this result with Corollary 2 leads to the following theorem:

Theorem 7.

Under strategy-learning, if the initial belief is uniformly distributed between and , then the optimal fee is when and when .

7 Conclusion and future work

In this paper, we analyzed an M/D/1 queue that supports advance reservations. We associated the act of making reservation with a fixed reservation cost and studied the impact of this cost on the behavior of customers. First, we showed that if the utilization of the queue is greater than , then there is a range of reservation costs that lead to multiple equilibria including one where no customer makes a reservation. Furthermore, if the utilization is greater than and the reservation cost is a fee charged by the service provider, then the fee value that maximizes the revenue from AR belongs to the aforementioned range. In order to evaluate whether the provider should charge a lower fee with guaranteed revenue or a higher but riskier fee (yielding several equlibria) we used the price of conservatism (PoC) metric and found the ratio between the two revenues. Specifically, when the utilization exceeds , we showed that the PoC increases with the utilization and tends to infinity as the utilization approaches 1.

In the second part of the paper, we studied a dynamic version of the game. We showed that if the customers observe previous strategies, then the game converges to an equilibrium. If the customers observe previous actions, then the game converge to a none-make-AR equilibrium, if such an equilibrium exists, and cycles otherwise. Finally, we develop a method to derive the revenue-maximizing fee under dynamic games. This method helps to determine the optimal control parameters in a game with many equilibria. We expect the same kind of methods to prove useful for the analysis of other types of dynamic games with many equilibria.

References

  • Altman and Shimkin (1998) Altman, Eitan, Nahum Shimkin. 1998. Individual equilibrium and learning in processor sharing systems. Operations Research 46(6) 776–784.
  • Balachandran (1972) Balachandran, KR. 1972. Purchasing priorities in queues. Management Science 18(5-Part-1) 319–326.
  • Bertsimas and Shioda (2003) Bertsimas, Dimitris, Romy Shioda. 2003. Restaurant revenue management. Operations Research 51(3) 472–486.
  • Brown (1951) Brown, George W. 1951. Iterative solution of games by fictitious play. Activity analysis of production and allocation 13(1) 374–376.
  • Charbonneau and Vokkarane (2012) Charbonneau, Neal, Vinod M Vokkarane. 2012. A survey of advance reservation routing and wavelength assignment in wavelength-routed wdm networks. Communications Surveys & Tutorials, IEEE 14(4) 1037–1064.
  • Conway et al. (2012) Conway, Richard W, William L Maxwell, Louis W Miller. 2012. Theory of scheduling. Courier Corporation.
  • Cournot (1897) Cournot, A Augustin. 1897. Recherches sur les principes mathematiques de la theorie des richesses, paris 1838. English transl. by NT Bacon under the title Researches into the Mathematical Principles of the Theory of Wealth, New York .
  • Dodge (2006) Dodge, Yadolah. 2006. The Oxford dictionary of statistical terms. Oxford University Press on Demand.
  • Fu and van der Schaar (2009) Fu, Fangwen, Mihaela van der Schaar. 2009. Learning to compete for resources in wireless stochastic games. Vehicular Technology, IEEE Transactions on 58(4) 1904–1919.
  • Fudenberg (1998) Fudenberg, Drew. 1998. The theory of learning in games, vol. 2. MIT press.
  • Gardiner et al. (1985) Gardiner, Crispin W, et al. 1985. Handbook of stochastic methods, vol. 3. Springer Berlin.
  • Guérin and Orda (2000) Guérin, Roch A, Ariel Orda. 2000. Networks with advance reservations: The routing perspective. INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 1. IEEE, 118–127.
  • Hassin (2016) Hassin, Refael. 2016. Rational Queueing. CRC Press.
  • Hassin and Haviv (2003) Hassin, Refael J, Moshe Haviv. 2003. To Queue or Not to Queue: Equilibrium Behaviour in Queueing Systems, vol. 59. Kluwer Academic Pub.
  • Hayel et al. (2016) Hayel, Yezekael, Dominique Quadri, Tania Jimenez, Luce Brotcorne. 2016. Decentralized optimization of last-mile delivery services with non-cooperative bounded rational customers. Annals of Operations Research 239(2) 451–469.
  • Lakshmivarahan (1981) Lakshmivarahan, Sivaramakrishnan. 1981. Learning algorithms theory and applications. Springer-Verlag New York, Inc.
  • Liberman and Yechiali (1978) Liberman, Varda, Uri Yechiali. 1978. On the hotel overbooking problem-an inventory system with stochastic cancellations. Management Science 24(11) 1117–1126.
  • Littman (1994) Littman, Michael L. 1994. Markov games as a framework for multi-agent reinforcement learning. Proceedings of the eleventh international conference on machine learning, vol. 157. 157–163.
  • Liu and van Ryzin (2011) Liu, Qian, Garrett van Ryzin. 2011. Strategic capacity rationing when customers learn. Manufacturing & Service Operations Management 13(1) 89–107.
  • Menache et al. (2014) Menache, Ishai, Ohad Shamir, Navendu Jain. 2014. On-demand, spot, or both: Dynamic resource allocation for executing batch jobs in the cloud. 11th International Conference on Autonomic Computing (ICAC 14). USENIX Association, 177–187.
  • Milgrom and Roberts (1991) Milgrom, Paul, John Roberts. 1991. Adaptive and sophisticated learning in normal form games. Games and economic Behavior 3(1) 82–100.
  • Naor (1969) Naor, Pinhas. 1969. The regulation of queue size by levying tolls. Econometrica: journal of the Econometric Society 15–24.
  • Nasiry and Popescu (2012) Nasiry, Javad, Ioana Popescu. 2012. Advance selling when consumers regret. Management Science 58(6) 1160–1177.
  • Niu et al. (2012) Niu, Di, Chen Feng, Baochun Li. 2012. Pricing cloud bandwidth reservations under demand uncertainty. ACM SIGMETRICS Performance Evaluation Review, vol. 40. ACM, 151–162.
  • Qiu and Zhang (2016) Qiu, Chun Martin, Wenqing Zhang. 2016. Managing long queues for holiday sales shopping. Journal of Revenue and Pricing Management 15(1) 52–65.
  • Quan (2002) Quan, Daniel C. 2002. The price of a reservation. Cornell Hotel and Restaurant Administration Quarterly 43(3) 77–86.
  • Reiman and Wang (2008) Reiman, Martin I, Qiong Wang. 2008. An asymptotically optimal policy for a quantity-based network revenue management problem. Mathematics of Operations Research 33(2) 257–282.
  • Simhon et al. (2015) Simhon, Eran, Carrie Cramer, Zachary Lister, David Starobinski. 2015. Pricing in dynamic advance reservation games. Computer Communications Workshops (INFOCOM WKSHPS), 2015 IEEE Conference on. IEEE, 546–551.
  • Simhon and Starobinski (2014) Simhon, Eran, David Starobinski. 2014. Game-theoretic analysis of advance reservation services. Information Sciences and Systems (CISS), 2014 48th Annual Conference on. IEEE, 1–6.
  • Simhon and Starobinski (2017) Simhon, Eran, David Starobinski. 2017. Advance reservation games. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS) 2(2) 10.
  • Smith et al. (2000) Smith, Warren, Ian Foster, Valerie Taylor. 2000. Scheduling with advanced reservations. Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International. IEEE, 127–132.
  • Syed et al. (2008) Syed, Affan A, Wei Ye, John Heidemann. 2008. T-lohi: A new class of mac protocols for underwater acoustic sensor networks. INFOCOM 2008. The 27th Conference on Computer Communications. IEEE. IEEE.
  • Tan (1993) Tan, Ming. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the tenth international conference on machine learning. 330–337.
  • Virtamo (1992) Virtamo, Jorma T. 1992. A model of reservation systems. Communications, IEEE Transactions on 40(1) 109–118.
  • Wang et al. (2013) Wang, Wei, Di Niu, Baochun Li, Ben Liang. 2013. Dynamic cloud resource reservation via cloud brokerage. Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on. IEEE, 400–409.
  • Weatherford (1998) Weatherford, Lawrence R. 1998. A tutorial on optimization in the context of perishable-asset revenue management problems for the airline industry. Operations research in the airline industry. Springer, 68–100.
  • Yessad et al. (2007) Yessad, Samira, Farid Nait-Abdesselam, Tarik Taleb, Brahim Bensaou. 2007. R-mac: Reservation medium access control protocol for wireless sensor networks. Local Computer Networks, 2007. LCN 2007. 32nd IEEE Conference on. IEEE, 719–724.
  • Zohar et al. (2002) Zohar, Ety, Avishai Mandelbaum, Nahum Shimkin. 2002. Adaptive behavior of impatient customers in tele-queues: Theory and empirical support. Management Science 48(4) 566–583.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
214305
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description