MultiProduct Dynamic Pricing in HighDimensions with Heterogenous Price Sensitivity
Abstract
We consider the problem of multiproduct dynamic pricing in a contextual setting for a seller of differentiated products. In this environment, the customers arrive over time and products are described by highdimensional feature vectors. Each customer chooses a product according to the widely used Multinomial Logit (MNL) choice model and her utility depends on the product features as well as the prices offered. Our model allows for heterogenous price sensitivities for products. The seller apriori does not know the parameters of the choice model but can learn them through interactions with the customers. The seller’s goal is to design a pricing policy that maximizes her cumulative revenue. This model is motivated by online marketplaces such as Airbnb platform and online advertising. We measure the performance of a pricing policy in terms of regret, which is the expected revenue loss with respect to a clairvoyant policy that knows the parameters of the choice model in advance and always sets the revenuemaximizing prices. We propose a pricing policy, named M3P, that achieves a period regret of under heterogenous price sensitivity for products with features dimension of . We also prove that no policy can achieve worstcase regret better than .
1 Introduction
Online marketplaces offer very large number of products described by a large number of features. This contextual information creates differentiation among products and also affects the willingnesstopay of buyers. To provide more context, let us consider the Airbnb platform: the products sold in this market are “stays.” In booking a stay, the user first selects the destination city, dates of visit, type of place (entire place, 1 bedroom, shared room, etc) and hence narrows down her choice to a socalled consideration set. The platform sets the prices for the products in the consideration set. Notably, the products here are highly differentiable. Each product can be described by a highdimensional feature vector that encodes its properties, such as space, amenities, walking score, house rules, reviews of previous tenants, etc. We study a model where the platform aims to maximize its revenue.
In setting prices, there is a clear tradeoff. A high price may drive the user away (decreases the likelihood of a sale) and hence hurts the revenue. A low price, on the other hand encourages the user to purchase the product; however, it results in a smaller revenue from that sell. Therefore, in order for the seller to maximize its revenue, it must try to learn the purchase behavior of the users. Using the users’ interactions and purchasing decisions, the seller can learn how users weighs different features in their purchasing decisions.
In this work, we study a setting where the utility from buying a product is a linear function of the product features and its price. Let be the utility obtained from buying a product with feature vector , at price where the parameter vector represents the users’ purchase behavior. Namely, captures the contribution of each feature to the user’s valuations of the products. Similar to [2, 24, 22], we focus on a linear utility model:
(1) 
where indicates the inner product of two vectors and . The term , a.k.a. market noise, captures the idiosyncratic change in the valuation of each user, and is the price sensitivity parameter. We encode the “nopurchase” option as a new product with zero utility. We emphasize that the parameters of the utility model, and , are a priori unknown to the seller.
In our model, given a consideration set, the customer chooses the products that results in the highest utility. We study the widely use Multinomial Logit (MNL) choice model [27] which corresponds to having the noise terms, Eq (1), drawn independently from a standard Gumbel distribution.
We propose a dynamic pricing policy, called M3P, for MultiProduct Pricing Policy in highdimensional environments. Our policy uses an regularized maximum likelihood method to estimate the true parameters of the utility model based on previous purchasing behavior of the users.
We measure the performance of a pricing policy in terms of the regret, which is the difference between the expected revenue obtained by the pricing policy and the revenue gained by a clairvoyant policy that has full information of the parameters of the utility model and always offers the revenuemaximizing price. Our policy, achieves a regret of , where and respectively denote the features dimension and the length of time horizon. Furthermore, we also prove that our policy is almost optimal in the sense that no policy can achieve worstcase regret better than .
In the next section, we briefly review the related work to ours. We would like to highlight that our work is distinguished from the previous literature in two major aspects: i) Multiproduct pricing that should take into account the interaction of different products as changing the price for one product may shift the demand of other products and this makes the pricing problem even more complex. ii) heterogeneity and uncertainty in price sensitivity parameters. We point out that our methods can obtain logarithmic cumulative regret in if the price sensitivity parameters ( in Eq (1)) were apriory known, cf., [22].
Related Work
There is a vast literature on dynamic pricing as one of the central problems in revenue management. We refer the reader to [12, 4] for extensive surveys on this area. A popular theme in this area is dynamic pricing with learning where there is uncertainty about the demand function, but information about it can be obtained via interaction with customers. A line of work [3, 16, 20, 8, 17, 10] took Bayesian approach. Another related line of work assumes parametric models for the demand function with a small number of parameters, and proposes policies to learn these parameters using statistical procedures such as maximum likelihood [6, 7, 14, 13, 9] or least square estimation [6, 18, 23].
Recently, there has been an interest in dynamic pricing in contextual setting. The work [1, 11, 24, 22, 5] consider singleproduct setting where the seller receives a single product at each step to sell (corresponding to in our setting) and assume equal price sensitivities for all products. In [1], the authors consider a noiseless valuation model with strategic buyer and propose a policy with period regret of order . This setting has been extended to include market noise and also a market of strategic buyers who are utility maximizers [19]. In [11], authors propose a pricing policy based on binary search in highdimension with adversarial features that achieves regret . The work [22] studies the dynamic pricing in highdimensional contextual setting with sparsity structure and propose a policy with regret but in a singleproduct scenario. The problem has been also studied under timevarying coefficient valuation models [21] to address the timevarying purchase behavior of customers and the perishability of sales data. Very recently, [25] studied highdimensional multiproduct pricing, with a lowdimensional linear model for the aggregate demand. In this model, the demand vector for all the products at each step is observed, while in our work the seller only sees the product index that is chosen from the buyer’s consideration set at each step. Similarly, [26] studies a model where the seller can observe the aggregate demand and proposes a myopic policy based on leastsquare estimations that obtains a logarithmic regret.
2 Model
We consider a firm which sells a set of products to customers that arrive over time. The products are differentiated and each is described by a wide range of features.
At each step , the customer selects a consideration set of size at most from the available products. This is the set the customer will actively consider in her purchase decision. The seller sets the price for each of the products in this set, after which the customer may choose (at most) one of the products in . If he chooses a product, a sale occurs and the seller collects a revenue in the amount of the posted price; otherwise, no sale occurs and seller does not get any revenue.
Each product is represented by an observable vector of features . Products offered at different round can be highly differentiated (their features vary) but we assume that the feature vectors are sampled independently from a fixed, but unknown, distribution .
We assume that for all in the support of , and , for an arbitrarily large but fixed constant . Throughout the paper, we use to indicate the norm.
If an item (at period ) is priced at , then the customer obtains utility from buying it, where^{1}^{1}1In general the offered price not only depends on the feature vectors but also the period , as the estimate of the model parameters may vary across time . We make this explicit in the notation by considering both and in the subscript.
Here, are the parameters of the demand curve and are unknown a priori to the seller. The term is the productbased utility, and the component represents market shocks and are modeled as zero mean random variables drawn independently and identically from a standard Gumbel distribution. This noise distribution give us the wellknown multinomial logit (MNL) choice model that has been widely used in academic literature and practice [27, 15]. Under the MNL model, the probability of choosing an item from set is given by
(2) 
where , for .
We refer to the term in the utility model as the price sensitivity of product . Note that our model allows for heterogeneous price sensitivities. We also encode the nopurchase option by item , with market utility , drawn from zero mean Gumbel distribution. The random utility can be interpreted as the utility obtained from choosing an option outside the offered ones. This is equivalent to . Having the utility model established as above, at all steps the user chooses the item with maximum utility from her consideration set; in case of equal utilities, we break the tie randomly.
To summarize, our setting is as follows. At each period :

The user narrows down her options by forming a consideration set of size at most .

For each product , the seller offers a price .^{2}^{2}2Equivalently, the seller can determine all the prices in advance and reveal them after the user determines the consideration set. We note that the consideration set of the user does not depend on the prices, but the choice she makes from the consideration set depends on the prices. In addition, recall that all the users share the same and and the choice of consideration set does not reveal information about these parameters.

The user chooses item where .

The seller observes what product is chosen from the consideration set and uses this information to set the future prices.
We make the following assumption that ensures positivity of the products price sensitivity parameters.
Assumption 2.1.
We have , for some constant .
Before proceeding with the policy description, we will discuss the benchmark policy which is used in defining the notion of regret and measuring the performance of pricing policies.
3 Benchmark policy
The seller’s goal is to minimize her regret, which is defined as the expected revenue loss against a clairvoyant policy that knows the utility model parameters in advance and always offers the revenuemaximizing prices. We next characterize the benchmark policy. Let , , where and be the feature matrix, which is obtained by stacking , as its rows (Recall that ). The proposition below gives an implicit formula to write the vector of optimal prices as a function . We refer to as the pricing function.
Proposition 3.1.
The benchmark policy that knows the utility model parameters , sets the optimal prices as follows: For product , the optimal price is given by , where is the unique value of satisfying the following equation:
(3) 
We can now formally define the notion of regret. Let be a pricing policy that sets the vector of prices at time for the products in the consideration set . Then, the seller’s expected revenue at period , under such policy will be
(4) 
with being the probability of buying product from the set as given by Eq (2).^{3}^{3}3More precisely, is the expected revenue conditional on filtration , where is the sigma algebra generated by feature matrices and market shocks . Similarly, we let be the seller’s expected revenue under the benchmark policy that sets price vectors , at period . The worstcase cumulative regret of policy is defined as
(5) 
(6) 
(7) 
(8) 
4 MultiProduct Pricing Policy (M3P)
In this section, we provide a formal description of our multiproduct dynamic pricing policy (M3P). The policy sees the time horizon in an episodic structure, where the length of episodes grow geometrically (episode is of length ). Throughout, we use notation to refer to periods in episode , i.e., . The policy updates its estimate of the model parameters at the beginning of each episode and adhere to that estimate throughout the episode when setting the prices. At each period during one episode, our policy sets the price vectors as , where , are respectively the estimates of and , which are obtained by solving a regularized maximumlikelihood minimization problem using solely the observations (the products sold) in the previous episode. Note that the seller can only observe which products were sold in the previous episode. Formally, the estimate is obtained by minimizing the negative loglikelihood function given by
where denotes the product purchased at time , and
(9) 
with . We adopt the convention that for the “nopurchase” case with .
The loglikelihood loss can be written in a more compact form. We let be the response vector that indicates which product is purchased at time :
We also let . Then, the loglikelihood loss can be written as
(10) 
We also add the regularization in the cost function to promote sparsity structure in the estimator
(11) 
with , for a constant .
The policy terminates at time but note that the policy does not need to know in advance. Further, in our policy exploration and exploitation are mixed. In the beginning of each episode, the policy exploits the observations in the previous episode to update its estimates of the model parameters. Meanwhile, the market shocks in the utilities gives us sufficient amount of exploration and hence we do not need to actively randomize prices to learn the parameters. Also, by the design when the policy does not have much information about the model parameters it updates its estimates frequently (since the length of episodes are small) but as time proceeds the policy gathers more information about the parameters and updates its estimates less frequently, and use them over longer episodes.
5 Regret Analysis for M3P
We next state our result on the regret of M3P policy.
Theorem 5.1.
(Regret upper bound) Consider the choice model (2). Then, the period regret of the M3P is of , with and being the feature dimension and the length of time horizon. Further, regret of any pricing policy in this case is .
Note that as stated by the theorem, the regret of M3P scales logarithmically in , making the algorithm applicable for high dimensional setting. Below, we state the key lemmas in the proof of Theorem 5.1 and refer to Appendices for the proof of technical steps.
Let be the vector of prices posted at time for products in the consideration set . Recall that M3P sets the prices as , where is the pricing function whose implicit characterization is given by Proposition 3.1.
Our next lemma shows that the pricing function is Lipschitz.
Lemma 5.2.
Suppose that and . Then, there exists a constant such that the following holds
(12) 
We next upper bound the righthand side of Eq (12) by bounding the estimation error of the proposed regularized estimator. Denote by the matrix obtained by putting all the feature matrices corresponding to belonging to episode .
Proposition 5.3.
Let be the solution of optimization problem (6), with for a constant . Then, with probability at least , we have
(13) 
where is a constant depending on and .
The last part of the proof is to relate the regret of the policy at each period to the distance between the posted price vector and the price vector posted by the benchmark. Recall the definition of revenue from (4) and define the regret as .
Lemma 5.4.
Let be the optimal price vector posted by the benchmark policy that knows the model parameters and in advance. There exists a constant (depending on ) such that the following holds,
for some constant .
The reason that in Lemma 5.4, the revenue gap depends on the squared of the difference of the price vectors is that is the optimal price and hence . Therefore, by Taylor expansion of function around , we see that the first order term vanishes and the second order term matters.
The proof of Theorem 5.1 follows by combining Lemma 5.2, Proposition 5.3 and Lemma 5.4. We refer to Appendix B.1 for its proof.
Our next theorem provides a lower bound on the regret of any pricing policy. The proof of Theorem 5.5 is given in Appendix B.2 and employs the notion of ‘uninformative prices’, introduced by [7].
Theorem 5.5.
(Regret lower bound) Consider the choice model (2). Then, the period regret of any pricing policy in this case is .
Theorem 5.5 implies that M3P has optimal cumulative regret up to logarithmic factor.
Acknowledgments
A. J. was partially supported by an Outlier Research in Business (iORB) grant from the USC Marshall School of Business (2018) . This work was supported in part by a Google Faculty Research award (2016).
References
 [1] K. Amin, A. Rostamizadeh, and U. Syed. Learning prices for repeated auctions with strategic buyers. In Advances in Neural Information Processing Systems, pages 1169–1177, 2013.
 [2] K. Amin, A. Rostamizadeh, and U. Syed. Repeated contextual auctions with strategic buyers. In Advances in Neural Information Processing Systems, pages 622–630, 2014.
 [3] V. F. Araman and R. Caldentey. Dynamic pricing for nonperishable products with demand learning. Operations research, 57(5):1169–1188, 2009.
 [4] Y. Aviv and G. Vulcano. Dynamic list pricing. In The Oxford handbook of pricing management. 2012.
 [5] G.Y. Ban and N. B. Keskin. Personalized dynamic pricing with machine learning. 2017.
 [6] O. Besbes and A. Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and nearoptimal algorithms. Operations Research, 57(6):1407–1420, 2009.
 [7] J. Broder and P. Rusmevichientong. Dynamic pricing under a general parametric choice model. Operations Research, 60(4):965–980, 2012.
 [8] N. CesaBianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in secondprice auctions. IEEE Transactions on Information Theory, 61(1):549–564, 2015.
 [9] X. Chen, Z. Owen, C. Pixton, and D. SimchiLevi. A statistical learning approach to personalization in revenue management. 2015.
 [10] W. C. Cheung, D. SimchiLevi, and H. Wang. Dynamic pricing and demand learning with limited price experimentation. Operations Research, 65(6):1722–1731, 2017.
 [11] M. Cohen, I. Lobel, and R. Paes Leme. Featurebased dynamic pricing. 2016.
 [12] A. V. den Boer. Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in operations research and management science, 20(1):1–18, 2015.
 [13] A. V. den Boer and A. P. Zwart. Mean square convergence rates for maximum(quasi) likelihood estimation. Stochastic systems, 4:1 – 29, 2014.
 [14] A. V. den Boer and B. Zwart. Simultaneously learning and optimizing using controlled variance pricing. Management science, 60(3):770–783, 2013.
 [15] O. Elshiewy, D. Guhl, and Y. Boztug. Multinomial logit models in marketingfrom fundamentals to stateoftheart. Marketing ZFP, 39(3):32–49, 2017.
 [16] V. F. Farias and B. Van Roy. Dynamic pricing with a prior on market response. Operations Research, 58(1):16–29, 2010.
 [17] K. J. Ferreira, D. SimchiLevi, and H. Wang. Online network revenue management using thompson sampling. 2016.
 [18] A. Goldenshluger and A. Zeevi. A linear response bandit problem. Stochastic Systems, 3(1):230–261, 2013.
 [19] N. Golrezaei, A. Javanmard, and V. Mirrokni. Dynamic incentiveaware learning: Robust pricing in contextual auctions. http://dx.doi.org/10.2139/ssrn.3144034, 2018.
 [20] J. M. Harrison, N. B. Keskin, and A. Zeevi. Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science, 58(3):570–586, 2012.
 [21] A. Javanmard. Perishability of data: dynamic pricing under varyingcoefficient models. The Journal of Machine Learning Research, 18(1):1714–1744, 2017.
 [22] A. Javanmard and H. Nazerzadeh. Dynamic pricing in highdimensions. arXiv preprint arXiv:1609.07574 (accepted for publication in Journal of Machine Learning), 2016.
 [23] B. Keskin. Optimal dynamic pricing with demand model uncertainty: A squaredcoefficientofvariation rule for learning and earning. Working Paper, 2014.
 [24] I. Lobel, R. P. Leme, and A. Vladu. Multidimensional binary search for contextual decisionmaking. arXiv preprint arXiv:1611.00829, 2016.
 [25] J. Mueller, V. Syrgkanis, and M. Taddy. Lowrank bandit methods for highdimensional dynamic pricing. arXiv preprint arXiv:1801.10242, 2018.
 [26] S. Qiang and M. Bayati. Dynamic pricing with demand covariates. arXiv preprint arXiv:1604.07463, 2016.
 [27] K. T. Talluri and G. J. Van Ryzin. The theory and practice of revenue management, volume 68. Springer Science & Business Media, 2006.
 [28] H. Zhang, P. Rusmevichientong, and H. Topaloglu. Multiproduct pricing under the generalized extreme value models with homogeneous price sensitivity parameters. 2018.
Appendix A Proof of Technical Lemmas
a.1 Proof of Proposition 3.1
In the benchmark policy, the seller knows the model parameters . For simplicity, we use the shorthands , , and the sum as . The revenue function can be written in terms of as
where we used (2). Writing the stationarity condition for the optimal price vector , we get that for each :
which is equivalent to
Since , the above equation implies that
(14) 
Define . We next show that is the solution to Equation (3). By multiplying both sides of (14) by and summing over , we have
By definition of , the lefthand side of the above equation is equal to . By rearranging the terms we obtain
where the second line follows from Equation (14).
a.2 Proof of Lemma 5.2
Define function as
(15) 
By characterization of the pricing function , given in Proposition 3.1, we have and , where and are the solution of and .
By implicit function theorem for a point that satisfies = 0, there exists an open set around , and a unique differentiable function such that and for all . Furthermore, the partial derivative of can be computed as
where in the last step we use the normalization , and is the lower bound on the price sensitivities. Likewise, we have
where we used the fact that the solution of satisfies . (This follows readily by noting that the righthand side of (15) is nonincreasing in .) This shows that is a Lipschitz function of , with Lipschitz constant , where . Therefore,
(16) 
Hence, , for some constant . This completes the proof.
a.3 Proof of Proposition 5.3
We start by recalling the notation and define . To prove Proposition 5.3, we first rewrite the loss function in terms of the augmented parameter vector . (Recall our convention that corresponds to “nopurchase” with .)
(17) 
where . The gradient and the hessian of are given by
(18) 
(19) 
We proceed by bounding the gradient and the hessian of the loss function. Before that, we establish an upper bound on the prices that are set by the pricing function .
Lemma A.1.
Suppose that and . Let be the solution to the following equation:
(20) 
Then, the prices set by the pricing function , where , are bounded by .
The proof of above Lemma follows readily by noting that the righthand side of (20) is an upper bound for the right hand side of (3) and therefore . The results then follows by recalling that the pricing function sets prices as .
To bound the gradient of the loss function at the true model parameters, note that
(21) 
We also have
(22) 
for a constant . We also note that by (18), is written as some of terms. In each term, the index has randomness coming from the market noise distribution. By a straightforward calculation, one can verify that each of these terms has zero expectation. Using (22) and by applying AzumaHoeffding inequality to the righthand side of (A.3), followed by union bounding over coordinates of feature vectors, we obtain
(23) 
with probability at least . (Note that we can absorb in constant since it already depends on .)
We next pass to lower bonding the hessian of the loss. For , we write
(24) 
where (a) and (b) follow from Jensen’s Inequality. (c) is because as per (22); in , we used the notation