Dynamic Pricing in Competitive Markets
Dynamic pricing of goods in a competitive environment to maximize revenue is a natural objective and has been a subject of research over the years. In this paper, we focus on a class of markets exhibiting the substitutes property with sellers having divisible and replenishable goods. Depending on the prices chosen, each seller observes a certain demand which is satisfied subject to the supply constraint. The goal of the seller is to price her good dynamically so as to maximize her revenue. For the static market case, when the consumer utility satisfies the Constant Elasticity of Substitution (CES) property, we give a regret bound on the maximum loss in revenue of a seller using a modified version of the celebrated Online Gradient Descent Algorithm by Zinkevich . For a more specialized set of consumer utilities satisfying the iso-elasticity condition, we show that when each seller uses a regret-minimizing algorithm satisfying a certain technical property, the regret with respect to times optimal revenue is bounded as . We extend this result to markets with dynamic supplies and prove a corresponding dynamic regret bound, whose guarantee deteriorates smoothly with the inherent instability of the market. As a side-result, we also extend the previously known convergence results of these algorithms in a general game to the dynamic setting.
Internet has revolutionized the way goods are bought and sold and has in the process created a range of new possibilities to price the goods strategically and dynamically. This is especially true for online retail and apparel stores where the cost and effort to update prices has become negligible. This flexibility in pricing has propelled the research in dynamic pricing in the last decade or so, informally defined as the study of determining optimal selling prices in an unknown environment to optimize an objective, usually revenue. Coupled with the presence of digitally available and frequently updated sales data one may also view this as an (online) learning problem.
The inherent hurdles in dynamic pricing arise on account of lack of information. In the context of a single good case, this could be the underlying demand function that maps a given price to the observed demand. Indeed, this problem has been studied in several models in literature and strong results are now known for it. However, the problem becomes all the more challenging in a realistic setting where multiple sellers independently choose prices for their goods and the demand observed by any single seller is a function of all the prices. For example, some fixed seller might observe completely different demands for the same price she uses for her items depending on the prices chosen by other sellers. Such a seller might falsely conclude of being in a dynamic environment even when the underlying demand function is static.
Several existing approaches for dynamic pricing assume a parametric form for the underlying demand function and choose a sequence of prices to learn the individual parameters by statistical estimation. This approach is commonly referred to as “learn-and-earn” in literature . It would, however, be unrealistic in the presence of multiple sellers since that would imply learning highly nonlinear and possibly unstructured functions in high dimensions. Instead, we view the market as a set of strategic agents (the sellers) choosing successive actions (prices) in order to maximize their utility (revenue) and focus on using the existing rich tool-kit of agnostic learning in game theoretic models to prove fast convergence to optimal prices.
The advantages of an agnostic learning approach are multifold: Firstly, it does not rely on the precise parametric form of the underlying demand function and secondly can be easily extended to the case when the market parameters may change across rounds. The downside, however, being that in the best case of static markets with clean parametric representation, the algorithms might converge to optimal prices only asymptotically . Consequently, to measure the performance of the actions (prices) chosen by such a learning algorithm we typically compare it to a certain benchmark sequence of actions and the regret bound represents the loss incurred by the algorithm for not having chosen the benchmark sequence instead. In most such algorithms, this benchmark sequence is usually a single action that gives the maximum cumulative utility over all rounds.
We base our dynamic pricing approach on the work by Syrgkanis et al  where the authors prove that in a game with multiple agents, if each agent uses a regret-minimizing algorithm with a suitable step-size parameter and satisfying a certain technical property, then the individual regret of each agent is bounded by where is the total number of rounds. Although the main result is proved in the discrete action setting, the authors show that the same technique can be extended to agents with continuous action sets as well. In a nutshell, these algorithms anticipate the utility vector for the forthcoming round and choose a price such that the cumulative utility over all previous rounds and the forthcoming one is maximized. The regret bound thus obtained holds with respect to the single best price in hindsight and is one of the benchmarks we use to measure the performance of our approach.
Contribution: Our contributions in this paper can be broadly divided into 3 parts:
In the first part, for the class of markets with gross substitutes CES utility functions, we show that a simple modification to the Online Gradient Descent (OGD) algorithm by Zinkevich  can be used to obtain a regret bound on the loss in revenue with respect to the single best price in hindsight of order . We note that CES utilities represent a very general market model used in both economics and CS literature.
Following the analysis in  and using a smoothed revenue objective we obtain a stronger regret bound of order against a multiplicative approximation of the best price in hindsight. This analysis, however, holds only for the more restricted class of iso-elastic markets.
In the last part, we extend the technical property introduced in  to the case of dynamic regret i.e. when the performance of the algorithm is compared to a certain benchmark sequence. For the class of iso-elastic markets we show the existence of learning algorithms satisfying this property and use them to prove a regret bound of order against a multiplicative approximation of the benchmark prices. Here is a measure of the inherent instability of the market. As a side result, we can use this new property in  to prove bounds for dynamic regret.
A key observation in this work is that if the sellers in a market are ready to let go of a small fraction of their revenue, then they can converge to their (approximately) optimal prices (in static market setting) much faster ( instead of ). This faster convergence property is all the more desirable when the markets drift and convergence to optimal strategy in a small number of rounds is not possible. One would then like to achieve good performance with respect to a dynamic benchmark.
The problem of learning an optimal pricing policy for various demand models and inventory constraints has been researched extensively in the last decade. However, many consider the problem of a single good with no competition effects.  study a parametric family of demand functions and design an optimal pricing policy by estimating the unknown parameters by standard techniques such as linear regression or maximum likelihood estimation.  consider Bayesian and non-parametric approaches.
Closer to the theme of this paper, there has also been a considerable amount of research about dynamic pricing in models incorporating competition,  being some of them. However, most of these consider discrete choice models of demand, where a single consumer approaches and buys a discrete bundle of goods. Moreover, they assume that every seller has a fixed inventory level in the beginning and is not replenished during the course of the algorithm. We, on the other hand, consider demand originating from a general mass of consumers where when the volumes are large, the items may be considered divisible. For a more thorough survey of the existing literature, we refer the reader to .
In Section 3 we consider Online Gradient Descent (OGD), first introduced by Zinkevich  as the learning algorithm used by all sellers. At every time step, the learner takes a step in the direction of the gradient observed in that round. Interestingly, the author shows that this simple update rule achieves a regret bound of . While this approach is independent of any game-theoretic considerations Syrgkanis et al  showed that with certain modified versions of this algorithm the individual regret of each player can be brought down to . The analysis is based on the learning algorithm proposed by Rakhlin and Sridharan  in a different context. Informally, the algorithm is based on the idea that if the gradient observed in the next round is predictable, then it rules out the worst-case scenario and allows one to achieve a much better regret guarantee.
2Static Market Model
We consider a market with sellers, each selling a single good to a general population of consumers. We assume that the market operates in a round-based fashion. In each round every seller chooses a price for her good. The supply, , of seller , stays the same every round. No left-over supply from previous rounds is carried over (which is the case for example for perishable goods). Depending on the resulting price vector , each seller observes a certain demand for her item given by . These observed demands are governed by an underlying utility function of the consumers. For the purposes of this paper (except Section 3), we assume that these utilities are “
IGS″ as defined below:
We view this model of utilities as an approximation to the CES utilities (with the parameter ) used in several computer science and economics literature. It is a class of gross substitutes utility functions satisfying parts (a) and (b) in Definition ?. Instead of a fixed constant as price elasticity, this parameter depends on the prices of all goods i.e. . We use this more general class of utilities in Section 3.
In addition to the
IGS utilities we make assumptions to ensure that the problem is well defined. Specifically, the optimal revenue of any seller for any profile of prices chosen by others is bounded in . Intuitively, this is equivalent to saying that the set of allowed prices and supplies are such that revenue of any seller is not arbitrarily small or large.
We measure the performance of the pricing strategy used by the seller in terms of regret. Formally, the regret of an algorithm after rounds is defined as the loss with respect to the single best action (here price) in hindsight. For example, if denotes the sequence of revenue functions faced by the seller then the regret with respect to the sequence of prices is defined as: where . Analogously, one can also define dynamic regret as the regret incurred with respect to a dynamic benchmark sequence. For example, if is the sequence of prices against which we measure the loss of our algorithm, then dynamic regret is defined as:
Log-Revenue Objective: In this paper, we take an indirect approach to the problem of revenue optimization by optimizing the log-revenue objective instead of the actual revenue. The log-revenue objective is simply the plot of revenue against the price in the log-scale defined as follows:
Using the definition of
IGS utility functions we can derive the following straightforward fact used directly in the rest of the paper. The proposition follows from the definition of log-revenue function and price elasticity of demand.
This proposition implies that the log-revenue function for seller , keeping prices of all other items fixed, is shaped as in Figure ?. It is instructive to keep this general shape in mind as we introduce learning algorithms to optimize it in the following sections.
Notation: We shall denote vectors by bold-face letters and log of an entity by tilde, for example, . Often for ease of notation, we shall use to denote demand of good instead of when it is clear from the context. denotes the vector of prices of all sellers excluding . The notation denotes the gradient. All the missing proofs can be found in the Appendix.
In this section, we demonstrate the kind of regret bounds that can be achieved in full generality with CES utilities (with the parameter ). As noted before, since CES utilities do not satisfy the
IGS utility model, the gradient of the log-revenue curve, , in the case when is unknown to the seller (see in contrast Proposition ?). To ensure that the problem is well-defined we assume that the price elasticity of demand for any item and any price vector is bounded in . We work around the problem of unknown gradients by using a simple modification to the analysis by Zinkevich (Theorem 1, ) and show that if sellers use online gradient descent (with modified gradient feedback) as their learning algorithm on the log-revenue objective, then they can achieve a regret bound. We start with a claim for general convex functions with modified feedback.
This property allows us to use OGD even with imperfect gradient feedback, upto a multiplicative constant, to obtain regret bounds that are also within this same factor. Since the exact gradient in the case when is not available to the algorithm we modify the feedback gradient based on the demand observed,
i.e. we work around this problem by choosing as feedback the gradient whenever and otherwise.
Since the price elasticity of demand for any item at any price vector satisfies , for the case when , the gradient of log-revenue curve satisfies:
Using the same idea as in Claim ?, we can pretend to be using OGD on the actual log-revenue curve with a correspondingly modified step size The following bound then follows directly:
where . The left-hand side of the above inequality can be further lower bounded:
This bound serves as a benchmark and improving upon this is the main focus of our paper. In the next section, we focus on a smaller set of
IGS utility functions and show that with specialized learning algorithms the price dynamics converge faster to an approximately optimal configuration.
4Game Theoretic Interpretation
We start our investigation into this problem by observing that the revenue optimization problem in a market (as defined in Section 2) is equivalent to agents in a game using learning algorithms locally to optimize their utility, where this utility is a function of the strategies of all agents in the game. Problems of this flavour have already been studied in different game-theoretic settings but are not applicable in a black-box fashion to our problem on account of the market specific constraints. Specifically, the log-revenue objective although concave is not smooth, an assumption used in almost all gradient-based learning algorithms. This calls for a different approach than the ones taken in the idealized settings.
With this context in mind, we start from the result of , where it is proved that if all players in a game use learning algorithms satisfying a certain technical property, called the property (See Definition ?), then the regret incurred by each individual agent is . A natural question is then: Can we use the same technique in our revenue optimization problem in markets?
Although this property is defined for linear utility functions, we can extend this definition to concave utilities by using the gradient of the utility with respect to as proxy for i.e. in the context of our problem
As noted in , the standard online learning algorithms such as Online Mirror Descent (generalization of OGD) and Follow-the-Regularized-Leader (FTRL) do not satisfy the property. However, Rakhlin and Sridharan  and Syrgkanis  et al have developed modified versions of these algorithms, namely Optimistic Mirror Descent (OMD) and Optimistic FTRL (OFTRL) respectively, that do satisfy this property,
In the context of continuous games, the utility function (alternatively, the objective) of each player should additionally satisfy some regularity conditions. For ease of presentation, we shall refer to the player objectives satisfying these conditions as regular objectives and are defined, in a general sense, as follows:
4.2Smoothed Log-Revenue Curve
One of the foremost requirements to apply the analysis based on the property is that the utility function should be smooth, specifically, the gradient of the objective should be -Lipschitz continuous.
For ease of notation, we shall denote by simply when clear from context. For purposes of analysis, we parametrize the threshold parameter of seller as where is a small constant and is a lower bound on optimal revenue of seller . Also, henceforth we shall refer to the actual revenue curve by and the algorithm’s view of smoothed revenue curve by .
4.3Cost of Smoothness
Since our learning algorithm only uses the smoothed gradient feedback the resulting regret bound also holds only for the smoothed view of the log-revenue curve, i.e. the optimal price in this smoothed view would be the price for which the smoothed gradient is zero although this price is clearly suboptimal for the actual revenue curve. (See Figure 4). To prove bounds with respect to the actual revenue curve we need to draw connections between the smoothed and actual revenue for any fixed price.
Since satisfies the regularity condition (Definition ?), if each seller uses a learning algorithm satisfying the property, then the individual regret satisfies:
where . For ease of notation, we denote by . Using Lemma ? to lower bound the left-hand-side above:
The last inequality holds since is the lower bound on revenue. We still have to prove an upper bound on the expression: . Since our learning algorithm satisfies the property, by Definition ? it follows that:
Since the smoothed gradient for any seller is -Lipschitz continuous (Lemma ?), for we can bound as:
In addition to the fact that OFTRL satisfies the property, it is also known that the algorithm satisfies a stability property (Lemma 20, ) i.e. where is the step-size parameter of the algorithm.
We can now bound the regret as: . Finally substituting the parameters of the algorithm (Proposition 7, ) , and with we get:
Combining this with Equation 2 and substituting the value of we get:
Rearranging the inequality and using same steps as in the proof of Lemma ?:
Similar bounds can be shown in the case when sellers use the Optimistic Mirror Descent (OMD) algorithm.
5Learning with a Dynamic Benchmark
A bound on the loss of revenue of a seller with respect to the single price in hindsight is a comparatively weak benchmark. Ideally the sellers would like to choose as benchmark the revenue-optimizing price in every round, i.e. the sequence of prices . Such a benchmark is however too strict to obtain meaningful regret bounds. We shall instead focus on a more constrained sequence of benchmark prices. In what follows, we define a class of learning algorithms whose guarantees apply to any game setting where strategic players use regret minimization to maximize their own utility. For generality, we define this class for any sequence of concave utility functions . In the following section, we shall specialize this guarantee to the context of revenue optimization in markets.
This definition is an extension of the property. The difference is in the term that quantifies the hardness of learning with respect to a dynamic strategy. As for the property, this property is defined with respect to linear utilities and can be extended to concave utilities by standard arguments.
Using this new definition we can now extend almost all of the results in  to corresponding results for dynamic regret. We state the following claim for concreteness.
5.1Revenue Optimization in Dynamic Markets
Dynamic Market Model:
We define a dynamic market , as a sequence of markets with the same set of sellers and buyers, with the same nice utility functions as in Definition ? but with a dynamic supply vector i.e. we characterize the dynamicity of the market by the sequence of supply vectors . In order to achieve a strong dynamic regret bound, we shall assume that the income elasticity parameter of the market is equal to one. This is a standard assumption in many market models and is also satisfied by CES utilities.
In this section, we connect the dynamic regret of any seller to the inherent instability of the market by choosing the sequence of equilibrium prices
We analyze the performance of our modified OGD and OMD algorithms when the consumer utility functions satisfy the CES property. Although from a theoretical standpoint we assumed that the price elasticity of the market is a constant, empirically we observed that CES functions approximately satisfy this assumption. In our simulations, we show that the OMD algorithm indeed performs as proved in our analysis, except for slightly worse convergence time.
We consider the scenario with 2 items and the value of . We assume that the market is static in that each seller has a supply of one unit every round and uses the threshold parameter . We observe that the modified OGD algorithm converges quickly to the neighbourhood of the optimal price but then keeps oscillating around it. This is expected since in this neighbourhood the observed gradients might change abruptly. The OMD algorithm on the other hand takes a while before it comes close to the neighbourhood but once there converges to optimum quickly. As described in the analysis, this is precisely the reason for using the smoothed gradient feedback.
In this paper, we presented two dynamic pricing strategies based on regret-minimizing algorithms for static markets. In contrast to a simple approach based on the modified OGD algorithm we showed that by using specialized learning algorithms the sellers can converge to (approximate) revenue maximizing prices. We extended the analysis of these algorithms to dynamic markets and proved corresponding dynamic regret bounds. In the process, we defined a property analogous to the RVU property that is satisfied by these learning algorithms and extended their results to the case of dynamic regret.
Our regret analysis with these specialized learning algorithms depends on the assumption that the underlying market is iso-elastic. We believe that extending the analysis to cases where the price elasticity may be dynamic is an important open question. Also, to obtain a regret bound in dynamic markets we needed the assumption of gross substitutes utility function. Obtaining revenue guarantees for more general utility functions would be an interesting future direction.
The update rule of OGD algorithm when the feedback, , is available is given by: where is the euclidean projection operator. Since we use instead, we would get a different sequence of decision points according to the update step as follows:
Since , we can re-write the same update step as:
where is such that , i.e. we get the same sequence of steps by using but with difference step size sequence. Following the same analysis as in Zinkevich  and replacing by , the claim follows.
BRegularity of Smoothed Revenue Objective
It is known that (Lemma 24, ) if for all ,
then satisfies inequality ?. We shall first prove the case when is equal to .
This is equivalent to proving since the revenue curve is differentiable. By observation, we note that the maximum change in smoothed gradient i.e. occurs for prices when . This implies:
In a similar way, we can show that the smoothed gradient of seller is Lipschitz continuous also with respect to the price of any other seller , i.e. . Using the same arguments as above, we get:
The cross derivative term is exactly the cross-price elasticity of item with respect to item . We shall denote it by and by definition of
IGS utility functions is exactly . Therefore,
The corollary follows from the fact that is concave in and Lemma ?. We will need this fact in the next section to obtain a regret bound using smoothed gradient feedback.
CCost of Smoothness
The lemma follows directly from the following two observations:
For any price , where is the revenue maximizing price of seller , chosen by seller,
This follows from our assumption that the gradient of log-revenue curve for any price is a constant equal to .
For and as defined above, the following holds:
This can be shown by the following sequence of utilities.
Since , it follows that
The claim follows by using this in the above equality.
We are now ready to bound the difference between the actual revenue and the smoothed revenue for any seller and price .
The left hand-side of the inequality follows directly from our construction of smoothed gradient. For the right-hand side we observe that the difference between the revenue values of the two curves is maximum at . Hence, in the following, we shall focus on bounding . Note that the gradient of the smoothed revenue function changes gradually from to in the price range to and in the worse case, might change abruptly, i.e.
Using Lemma ? and using the fact that :
DOMD satisfies DRVU Property
Optimistic Mirror Descent (OMD):
Consider the following online convex optimization problem: Let be the convex set of actions of the learner. In each round , the learner chooses an action and observes a linear utility function
where is the Bregman divergence with respect to and is the sequence of step-sizes that can be chosen adaptively.
By Theorem ? instantiated for , we have:
Using Fact ? and by choosing , we can bound the first part of the expression as:
Next, we can sum and rearrange the Bregman divergence terms to get:
where . Using Fact ? in above inequality we get:
Finally, we bound the last part of the expression using Fact ? and observing that in OMD algorithm we choose .
ERevenue Optimization in Dynamic Markets
We derive here a sequence of lemmas required to prove the final theorem.
We only prove Part (a) here. Part (b) follows from identical steps. Note that increasing the prices of all items by a factor of is equivalent to decreasing the income of all buyers by the same factor. Let the income of player be denoted by . Then for any buyer , . Further, for gross substitutes markets with CES utilities, it is known that the income elasticity parameter, , for any player is exactly equal to 1. By definition of income elasticity:
Consider the case where . To prove contradiction, assume that for some player , . By equilibrium condition,
The inequality follows from the definition of gross substitutes markets. Equality is the direct application of Lemma ?. Since this is a contradiction, we conclude that .
For the lower bound suppose that for some item , . Then,
which is a contradiction for any . The inequalities and follow the same reasoning as in inequality Equation 6. This implies that for an increase in supply of item , the price of no item increases and the maximum decrease in the price of any item is at most a factor of . By analogous arguments, we can prove the result for the case when the supply decreases.
First note that we can re-write the result of Lemma ? in log scale as:
where we assumed that the supply of only item changed. Now, for any two supply vectors and , consider the switch from to sequentially in a pre-defined order while keeping the supplies of remaining sellers fixed during this switch. From Lemma ?, we know that for each such intermediate step, where the supply of only item changes, the maximum change in equilibrium is at most . The cumulative change in equilibrium can then simply be upper bounded by the sum of these individual changes.
We can obtain this bound using almost the same steps as in Theorem ? and using Corollary ? to account for the cumulative change in benchmark prices.
- Informally, this is required to ensure that small changes in prices do not lead to large changes in utility gradient.
- Informally, a (Walrasian) equilibrium in this market corresponds to the vector of prices and an allocation of items such that no item is under- or over-demanded. Alternatively, the aggregate demand for each item is exactly equal to its supply.
- For simplicity of presentation, we assume the utility function is linear
- Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms.
Omar Besbes and Assaf Zeevi. Operations Research, 57(6):1407–1420, 2009.
- On the minimax complexity of pricing in a changing environment.
Omar Besbes and Assaf Zeevi. Operations research, 59(1):66–79, 2011.
- Dynamic pricing under a general parametric choice model.
Josef Broder and Paat Rusmevichientong. Oper. Res., 60(4):965–980, July 2012.
- Learning and pricing in an internet environment with binomial demands.
Alexandre X Carvalho and Martin L Puterman. Journal of Revenue and Pricing Management, 3(4):320–336, 2005.
- Recent Developments in Dynamic Pricing Research: Multiple Products, Competition, and Limited Demand Information.
Ming Chen and Zhi-Long Chen. Production and Operations Management, 24(5):704–731, 2015.
- Simultaneously learning and optimizing using controlled variance pricing.
Arnoud V den Boer and Bert Zwart. Management science, 60(3):770–783, 2013.
- Dynamic Pricing of Perishable Assets Under Competition.
Guillermo Gallego and Ming Hu. Management Science, 60(5):1241–1259, 2014.
- Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities.
Guillermo Gallego and Ruxian Wang. Operations Research, 62(2):450–461, 2014.
- Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution.
J Michael Harrison, N Bora Keskin, and Assaf Zeevi. Management Science, 58(3):570–586, 2012.
- Dynamic Pricing with an Unknown Demand Model: Asymptotically Optimal Semi-Myopic Policies.
Bora Keskin and Assaf Zeevi. Operations Research, 62(5):1142–1167, 2014.
- Chasing demand: Learning and earning in a changing environment.
N. Bora Keskin and Assaf Zeevi. Mathematics of Operations Research, 42(2):277–307, 2017.
- The value of knowing a demand curve: Bounds on regret for online posted-price auctions.
Robert Kleinberg and Tom Leighton. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’03, pages 594–. IEEE Computer Society, 2003.
- Learning in concave games with imperfect information.
Panayotis Mertikopoulos. 2016.
- The value of product variety when selling to strategic consumers.
Ali K Parlaktürk. Manufacturing & Service Operations Management, 14(3):371–385, 2012.
- Online learning with predictable sequences.
Alexander Rakhlin and Karthik Sridharan. In Shai Shalev-Shwartz and Ingo Steinwart, editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 993–1019, Princeton, NJ, USA, 12–14 Jun 2013. PMLR.
- Optimization, learning, and games with predictable sequences.
Alexander Rakhlin and Karthik Sridharan. In Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’13, pages 3066–3074, USA, 2013. Curran Associates Inc.
- Fast convergence of regularized learning in games.
Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, pages 2989–2997, Cambridge, MA, USA, 2015. MIT Press.
- Online convex programming and generalized infinitesimal gradient ascent.
Martin Zinkevich. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, pages 928–935. AAAI Press, 2003.