Contextual Reserve Price Optimization in Auctions

# Contextual Reserve Price Optimization in Auctions

## Abstract

We study the problem of learning a linear model to set the reserve price in order to maximize expected revenue in an auction, given contextual information. First, we show that it is not possible to solve this problem in polynomial time unless the Exponential Time Hypothesis fails. Second, we present a strong mixed-integer programming (MIP) formulation for this problem, which is capable of exactly modeling the nonconvex and discontinuous expected reward function. Moreover, we show that this MIP formulation is ideal (the strongest possible formulation) for the revenue function. Since it can be computationally expensive to exactly solve the MIP formulation, we also study the performance of its linear programming (LP) relaxation. We show that, unfortunately, in the worst case the objective gap of the linear programming relaxation can be times larger than the optimal objective of the actual problem, where is the number of samples. Finally, we present computational results, showcasing that the mixed-integer programming formulation, along with its linear programming relaxation, are able to superior both the in-sample performance and the out-of-sample performance of the state-of-the-art algorithms on both real and synthetic datasets.

## 1 Introduction

Digital advertising has been a tremendously fast growing industry in recent years – the worldwide digital advertising expenditure has reached $283 billion in 2018, and it is estimated to further grow to$ 517 billion in 2023.3

Among all advertisement allocation mechanisms, real time bidding (RTB) is perhaps one of the most significant developments during the past decade, and it is widely applied at the major online advertising platforms, including–but not limited to–Google, Facebook, and Amazon. In RTB for display ads, an auction held by an Ad Exchange is triggered once a user visits a webpage, and the winner of the auction earns the ad slot and pays the publisher a certain price.

A form of auction commonly used in practice by Ad Exchanges is a second-price auction with reserve price [22]. In such auctions, the highest bidder wins the ad slot and pays the maximum of the second price and a reserve price set by the the publisher or the Ad Exchange. In particular, the reserve price of an ad slot can help improve the revenue if it is between the top two bids.

One central question for Ad Exchanges is how to set the reserve price for each incoming impression in order to maximize the total revenue. In general, the reserve price is set based on the contextual information of the ad campaign, including data pertaining to the publisher (e.g. ad site and ad size), user (e.g. device type and various geographic information), or time (e.g. date and hour). In this paper, we study an offline linear model to set the reserve price for each individual ad slot by utilizing its contextual information in order to maximize the total revenue on the seller side. This maximization problem can be formalized as:

 maxβ∈XR(β):=1nn∑i=1r(wi⋅β;b(1)i,b(2)i), (1)

where and are the (nonnegative) highest bidding price and second highest bidding price of impression , respectively, is the contextual feature vector of impression , and is a bounded hypercube which serves as a feasible region for the model parameters . Additionally, is a discontinuous reward function given as

 r(v;b(1),b(2))\coloneqq⎧⎪⎨⎪⎩b(2)v≤b(2)vb(2)b(1). (2)

Figure 1 plots the reward function , which is a simple univariate (though discontinuous) function for given constants and . The revenue function is a constant if is either set below or above , and it increases linearly if is between and . In other word, by setting the reserve price between and , the seller can potentially capture more revenue from the auctions. However, the reserve price is set before observing the bidding prices and , and the seller must be cautious to not set the reserve price too high, as an unsuccessful auction results in a significant drop in revenue when . At the two extremes, this setting recovers a first price auction (by setting ) or a pure price-setting problem (by setting ).

Although the univariate function is simple, the average revenue function can be extremely complicated, even for small problem instances. Figure 2 plots the average revenue with single feature (i.e. ) and samples, randomly drawn from a log-normal distribution as specified in Section 4. As we can see in Figure 2, the average revenue function has many local maximizers and is discontinuous, even in the small-sample, univariate setting. This complexity will only be exacerbated in the large-sample, multivariate case which is the focus of this paper.

### 1.1 Our Results

Our contribution in this work is threefold.

Hardness (Section 2). Our first main result is to build off the intuition gleaned from Figure 2 to show that (1) is, indeed, a hard problem. In particular, we show that there is no algorithm that solves (1) in polynomial time unless the Exponential Time Hypothesis fails. The Exponential Time Hypothesis is a very popular assumption is computational complexity and it is the basis of many hardness results [34, 42, 29, 13, 18, 36, 10, 1, 11]. This computational complexity assumption is based on the -SAT problem, a famous problem which is in the core of NP-complete problems [31]. The Exponential Time Hypothesis states that -SAT can not be solved in subexponential time in the worst case. In order to show this result, we reduce our problem to the classic -densest subgraph problem.

New algorithms (Section 3). Knowing that there is no polynomial-time algorithm for solving (1), we model the problem exactly using Mixed-Integer Programming (MIP). MIP is an optimization methodology capable of modeling complex, nonconvex feasible regions, and which is widely used in practice. In particular, MIP allows us to exactly model the underlying discontinuous reward function, without relying on convex or continuous proxies which may be poor approximations or require careful hyperparameter tuning.

One issue with MIP is that it is not scalable beyond medium-sized instances (roughly speaking, we can potentially solve a MIP with hundreds to thousands variables, but not with about ten thousands variables). In order to deal with the large-scale problems in daily auctions, we propose a Linear Programming (LP) relaxation of our proposed MIP formulation. Modern LP solvers, such as Gurobi, are capable to solve very large LPs with millions of variables. The solution to the LP not only provides a valid upper bound to the optimal expected revenue, but can also lead to a acceptable solutions to (1). On the other hand, we show that there exist pathological instances where the LP relaxation can produce arbitrarily bad bounds on the true optimal reward.

Computational validation (Section 4). Finally, we present a thorough computational study on both synthetic and real data. We start with a low-dimensional artificially generated data set where we observe that existing methods, while exhibiting low generalization error, are substantially outperformed by our MIP-based approaches. We also perform an analysis on a real data set comprised of eBay sports memorabilia auctions, where we observe a consistent improvement of our MIP-based methods over existing techniques. In both studies, we observe that our MIP formulation substantially outperforms the LP relaxation, its convex counterpart, suggesting the merit of using principled nonconvex approaches for this problem.

### 1.2 Related Work

Reserve Price Optimization Reserve price optimization has been widely studied both in both academia and industry due to its critical role in online advertisement. A major difference of our setting and previous works on reserve price optimization is how to utilize the contextual information . Most previous theoretical works proceed under the assumption that the bidding prices come from a certain distribution without the consideration of contextual information. For example, [12] shows a regret minimization under the assumption that all bids are independently drawn from the same unknown distribution; [30] shows the constant reserve is optimal when the distribution is known and satisfies certain regularity assumptions; [2] studies the case when the buyers are strategic and would like to maximize their long-term surplus.

In practice, however, an Ad Exchange logs the contextual information of every auction and utilizes that to determine the future reserve price. For example, in a large field study at Yahoo! [39], the contextual information of actions is used to learn the bidding distribution of buyers, which is then utilized to set up the future reserve price. This is an indirect use of contextual information. In contrast, our optimization problem (1) builds a linear model for reserve price optimization by directly using the contextual information.

To the best of our knowledge, the only work which directly uses the contextual information to set up the reserve price is that of Mohri and Medina [38]. In order to handle the discontinuity in the revenue function , [38] present a continuous piecewise linear surrogate function, and optimize over this surrogate function using difference-of-convex programming. There are several difficulties of the method proposed in [38]: (i) it is highly non-trivial to tune the hyper-parameter in the surrogate function, which controls the closeness of the two problems and the hardness to solve the surrogate problem; (ii) the global convergence of difference-of-convex programming is slow (requiring, e.g., a cutting plane or branch-and-bound method) and requires a exceedingly careful implementation [26], and (iii) it can only find a local optimizer of the surrogate problem. In contrast, we directly solve the reserve price optimization problem (1) by mixed-integer programming.

Mixed-Integer Programming for Piecewise Linear Functions Mixed-integer programming has long been used to model piecewise linear functions arising in optimization problems arising in a number of application areas as disparate as operations [16, 17, 33], analytics [7, 8], engineering [23, 24], and robotics [19, 20, 32, 37]. In this literature, our approach is most related to a recent strain of approaches applying mixed-integer programming to model high-dimensional piecewise linear functions arising as trained neural networks for various tasks such as verification and reinforcement learning [4, 3, 40, 41]. Moreover, there are incredibly sophisticated and mature implementation of algorithms for mixed-integer programming (i.e. solvers) that can reliably solve many instances of practical interest in reasonable time frames.

Hardness We study the hardness of the reserve price optimization problem (1) and show that it is impossible to solve this optimization problem in polynomial time unless the Exponential Time Hypothesis [27] fails. The exponential time hypothesis is a very popular assumption in computational complexity and it is the basis for many hardness results such as approximating the best Nash equilibrium [11], -densest subgraph [10, 27], SVP [1], network design [14], and many others [34, 42, 29, 13, 18, 36].

## 2 Hardness

In this section we show the hardness of the reserve price optimization problem (1). Specifically, we show that it is not possible to solve this problem in polynomial time unless the Exponential Time Hypothesis fails. We prove this by showing that a polynomial time optimal algorithm for this problem implies a polynomial time constant approximation algorithm for the -densest subgraph problem.

###### Definition 1 (k-densest subgraph problem).

In the -densest subgraph problem, given a graph , where represents the vertex set and represents the edge set. The goal is to find a subgraph with that maximizes .

In fact, there is no - approximation polynomial time algorithm for the -densest subgraph problem unless the exponential time hypothesis fails [36], and hence our reduction implies that there is no polynomial time algorithm for the reserve price optimization problem (1), unless the Exponential Time Hypothesis fails.

###### Theorem 1.

There is no polynomial time algorithm for the reserve price optimization problem (1), unless the Exponential Time Hypothesis fails.

###### Proof.

Let be an arbitrary input graph to the -densest subgraph problem, where is the vertex set of the graph and is the edge set of the graph. We construct an input to the reserve price optimization problem (1) based on , so that if it were possible to solve the reserve price optimization problem for this input in polynomial time, this would imply that it is possible to find an -approximate solution to the -densest subgraph problem on in polynomial time. However, it is known that it is impossible to give a polynomial time approximation algorithm for the densest subgraph problem unless the exponential time hypothesis fails [36]. This implies that it is impossible to solve the reserve price optimization problem (1) unless the exponential time hypothesis fails.

Next, we explain how to construct an input to the reserve price optimization problem (1) based on . In the optimization problem we set . We have two types of impressions as explained below.

• We have impressions , where .

• For each edge , we have one impression , where is a feature vector in which the components corresponding to and are , and all other components are .

First, we lower bound the optimal solution of the optimization problem (1) for this input. Consider a densest subgraph of , where is the vertex set of and is the edge set of . We define to be a feature vector in which the features corresponding to the vertices of are , and all other features are . Next we bound . We use this as a lower bound the optimum solution of the optimization problem (1).

Note that , and hence the contribution of each of the first type of impressions to is . Also, for each edge we have and hence the contribution of each of the second type of impressions corresponding to an edge in to is . Therefore, we have

 R(βH)=1n(k|VG|2+1.5|EG|+0.5|EH|). (3)

Next, we upper bound the optimal solution of the optimization problem (1) for our input. Let be the vector that maximizes . Note that if , the contribution of the first type of impressions is . This means that , which is a contradiction. Therefore, without loss of generality we can assume that .

Let be the set of vertices in with . Let be the subgraph of induced by . Note that if for a vertex we have , then for each edge neighboring , we have . Therefore, we have

 R(β)≤k|VG|2+1.5|EG|+0.5|Eβ|. (4)

Now, we put inequalities (3) and (4) together to complete the proof. By the optimality of we have . This together with inequalities (3) and (4) implies that . Moreover, recall that for every vertex in we have . Also, we have . Hence, we have . Given a graph with vertices, one can easily cover the edges with subgraphs of size . By the pigeon hole principle one of these subgraphs contains edges, and hence it is a -approximate solution to the densest subgraph. ∎

## 3 Mixed-Integer Programming Formulation

In this section, we develop a mixed-integer programming (MIP) formulation for solving (1), study its important computational properties, and discuss how it can be practically used to solve (1).

MIP is an common optimization methodology capable of modeling complex, nonconvex feasible regions. In general, a MIP formulation can model a set as

 S=\Setx∃y∈Rm,z∈Zr,(x,y,z)∈R ,

where is a polyhedron in .

In order to model (1) with MIP, we first start with the graph of the revenue function , which is defined as:

 gr(r(⋅;b(1),b(2));D):=\Set(v,y)v∈D,y=r(v;b(1),b(2)).

This set is not closed, due to the discontinuity of at the input . However, it is straightforward to compute its closure.

###### Lemma 1.

The closure of is , where

 S1 =\Set(v,y)∈D×Ry=b(2)v≤b(2) (5a) S2 =\Set(v,y)∈D×Ry=vb(2)≤v≤b(1) (5b) S3 =\Set(v,y)∈D×Ry=0v≥b(1). (5c)

Moreover, working with the closure does not alter the optimization problem. That is, (1) can be reformulated as the following optimization problem:

 maxβ,v,y 1nn∑i=1yi (6a) s.t. vi=wi⋅β ∀i∈\llbracketn\rrbracket (6b) (vi,yi)∈cl(gr(r(vi;b(1)i,b(2)i);[li,ui])) ∀i∈\llbracketn\rrbracket (6c) β∈X, (6d)

where the bounds on the variables are computed as and . The next proposition shows this formally.

###### Proposition 1.

If a point is an optimal solution for (6), then is an optimal solution for (1). Conversely, if is an optimal solution for (1), then there exists some and such that is an optimal solution for (6).

###### Proof.

First, we show that each optimal solution for (1) has a corresponding feasible point for (6) with equal objective value. Take some optimal for (1). Setting for each , the feasibility of (i.e. ) implies that from the definition of and . Now take for each ; clearly (6c) is satisfied. Therefore, is feasible for (6) and has objective value .

Next, we show that each optimal solution for (6) corresponds to a feasible point for (1) with the same objective value. Clearly is feasible for (1). Additionally, (6c) means that for each , if then , whereas if then . As , the optimality of implies that we must have . Therefore, the objective value of is , giving the result. ∎

Given the representation for the closure of the graph of as a union of three polyhedral sets in Lemma 1, we can now construct a mixed-integer programming formulation for (6c).

###### Proposition 2.

A valid MIP formulation for the constraint

 (v,y)∈cl(gr(r(⋅;b(1),b(2));[l,u])) (7)

is:

 y ≤b(2)z1+b(1)z2 (8a) y ≥b(2)(z1+z2) (8b) y ≤v+(b(2)−l)z1−b(1)z3 (8c) y ≥v−uz3 (8d) l ≤v≤u (8e) 1 =z1+z2+z3 (8f) z ∈[0,1]3 (8g) z ∈Z3. (8h)
###### Proof.

Suppose is feasible for (8). It follows from (8f-8h) that exactly one component of is equal to one, with the other two components equal to zero. We now consider each of these three cases.

If , then the constraints (8a8e) reduce to , , and . Plugging in the equation for into the second pair of inequalities yields , which are the constraints defining .

If , the constraints (8a8e) reduce to , , and , which are equivalent to the constraints defining .

Finally, if , the constraints (8a8e) reduce to , , and . Plugging the equation into the pair of inequalities yields , i.e. the constraints are equivalent to those defining . ∎

Piecing it all together, we can present a MIP formulation for the original problem (1).

###### Corollary 1.

Take as the set of all points feasible for (8), given the data , , , and . Then (1) is equivalent to

 maxβ,v,y 1nn∑i=1yi (9a) s.t. vi=wi⋅β ∀i∈\llbracketn\rrbracket (9b) (vi,yi)∈F(b(1)i,b(2)i,li,ui) ∀i∈\llbracketn\rrbracket (9c) β∈X, (9d)

in the sense that: (i) if is an optimal solution to (9), then is an optimal solution to (1), and (ii) if is an optimal solution to (1), then there exists some and such that is an optimal solution to (9).

In the rest of this section, we discuss different aspects of our MIP formulation (8) and its LP relaxation, and in particular how we can utilize it in practice.

### The Tightness of Formulation (8)

In general, there will exist many different possible MIP formulations for a given set. One way to measure the quality of a MIP formulation is by inspecting how tightly the LP relaxation approximates the underlying nonconvex set, as MIP formulations with tight relaxations are likely to solve much more quickly than those with looser relaxations [43]. The tightest possible MIP formulation is an ideal formulation, where the extreme points of the LP relaxation are integral. The next proposition shows that (8) is an ideal formulation of set (7).

###### Proposition 3.

The MIP formulation (8) is an ideal formulation for (7), in the sense that the linear programming relaxation (8a-8g) is a description of the convex hull of all feasible for (8).

###### Proof.

Take as the set of all feasible for (8). Using Lemma 1, we can infer that , where is the -th unit vector of all zeros except a 1 in the -th coordinate. Therefore, it can be expressed as a finite union of bounded polyhedron. Applying techniques due to Balas [5, 6], we can write a lifted representation for the convex hull of , i.e. one with auxiliary and variables:

 v =3∑i=1vi (10a) y =3∑i=1yi (10b) y1 =b(2)z1 (10c) lz1 ≤v1≤b(2)z1 (10d) y2 =v2 (10e) b(2)z2 ≤v2≤b(1)z2 (10f) y3 =0 (10g) b(1)z3 ≤v3≤uz3 (10h) 1 =z1+z2+z3 (10i) z ∈[0,1]3. (10j)

Moreover, if is the set of all points feasible for (10), it is known that , i.e. the orthogonal projection eliminating the auxiliary variables and yields the convex hull of the set of interest . Therefore, the result follows by explicitly computing this projection, yielding a system of linear constraints equivalent to the LP relaxation of (8), i.e. (8a-8g).

Use the three equations (10c), (10e), and (10g) to eliminate the variables. Then we may use the remaining equations (10a-10b) to eliminate and , leaving the system

 lz1 ≤v−y+b(2)z1−v3≤b(2)z1 b(2)z2 ≤y−b(2)z1≤b(1)z2 b(1)z3 ≤v3≤uz3 1 =z1+z2+z3 z ∈[0,1]3.

We may then apply the Fourier-Motkzin elimination procedure (e.g. [15]([Chapter 3.1])) to project out the last remaining auxiliary variable , giving the result. ∎

### The feasible region X=[L,U]d

While the statement of the problem (1) constrains the model parameters to lie within a bounded hypercube, it may be difficult to infer the correct size of the domain a priori. To illustrate this, we present a simple low-dimensional family of instances where the problem data remains bounded in magnitude, but nevertheless the magnitude of the optimal model parameters goes to infinity.

###### Proposition 4.

Fix samples and features, and consider , i.e. the unbounded variant of (1). There exists a sequence of instances where the problem data is bounded in magnitude by one, and yet the magnitude of the unique optimal solution to (1) grows arbitrarily large.

###### Proof.

Parameterize the sequence of instances by . For each , define , , , and . Note that , and so all the problem data is bounded in magnitude by one. The unique optimal solution to (1) is , giving the result. ∎

In other words, we cannot bound the magnitude of the components of an optimal solution solely as a function of , , and the magnitude of the data. However, due to existential representability results [28], applying MIP formulation techniques to model (1) will require a bounded domain on the model parameters. To circumvent this, we model the magnitude of the bounding box as a hyperparameter, and tune it using a validation data set. This is the same approach taken in the difference-of-convex algorithm due to Mohri and Medina [38].

### How to use the MIP formulation in practice

In general, mixed-integer programming encompasses a difficult class of problems, in both a theoretical and a practical sense. Nonetheless, there exist exceedingly mature, robust, and sophisticated solvers that are often capable of producing high-quality solutions and proofs of optimality for problems of practical interest. These implementations use a variant of branch-and-bound (e.g. [15, Chapter 1.2]), which attempts to do an enumerative tree search in an efficient manner. However, the solver can be terminated before the search has been exhausted (and optimality proven), and will return the best solution found. In Section 4, we will present two variants of a MIP-based algorithm that use this basic property. The first will terminate the algorithm after a pre-specified time budget is exceeded. The second terminates the the solver at the root node, before the enumerative procedure begins. Up to this point, the solver will have run a bevy of heuristic methods to generate solutions and strengthen its LP relaxation, but will not have begun its enumerative tree search procedure. Crucially, these heuristics will rely on the knowledge that the underlying model is a MIP to produce better solutions and tighter relaxations than are possible with a pure linear programming model like the LP relaxation.

Our MIP formulation (8) comprises two types of constraints: linear equality or inequality constraints (8a-8g), and integrality constraints (8h). The linear programming relaxation comprises only the linear constraints, and provides a valid dual upper bound on the optimal reward of a linear programming formulation. Furthermore, for this particular problem, each feasible solution for the linear programming relaxation corresponds to a feasible solution for the original problem (1).

###### Proposition 5.

Take as the set of all points feasible for the linear programming relaxation (8a-8g), given the data , , , and . Then a linear programming relaxation for (1) is

 maxβ,v,y 1nn∑i=1yi (11a) s.t. vi=wi⋅β ∀i∈\llbracketn\rrbracket (11b) (vi,yi)∈W(b(1)i,b(2)i,li,ui) ∀i∈\llbracketn\rrbracket (11c) β∈X, (11d)

in the sense that the optimal reward of any feasible solution for (11) upper bounds the reward of any feasible solution for (1). Moreover, for any feasible solution to (11), is a feasible solution to (1).

###### Proof.

The bound on objective values follows immediately from Proposition 1 and the fact that for any choice of data. Additionally, as (1) only constrains , (11d) gives the feasibility result. ∎

Therefore, a third approach to solve (1) is simply to solve the linear programming relaxation. Linear programming problems can be solved in polynomial time, and there exist algorithms that can very efficiently solve large scale problem instances. Therefore, the approach of Proposition 5 can be applied to very large scale instances of the problem (1).

### The quality of the LP relaxation

As shown in Proposition 5, the linear programming relaxation offers an alternative approach for heuristically solving the problem (1). Roughly, the quality of the resulting solution will depend on the strength of the relaxation, i.e. how closely it approximates the convex hull of all feasible points for the MIP formulation (9). Additionally, modern MIP solvers depend heavily on the quality of this relaxation to converge quickly by pruning large swaths of the search tree in the hopes of keeping computation times manageable.

A straightforward corollary of Proposition 3 is that, if , the LP relaxation (11) is exact, and so exactly represents the convex hull of feasible points for (9). Unfortunately, the composition of ideal formulations will, in general, fail to be ideal. In this subsection, we show that when is permitted to grow, the LP relaxation (11) can be of arbitrarily poor quality.

###### Proposition 6.

There exists a family of instances of (1), parameterized by the sample size , where the true optimal reward in (1) decreases as , but the optimal reward for the LP relaxation (11) is at least .

###### Proof.

Consider the following problem instance parameterized by a positive integer . Take , , and . Furthermore, for each , define , , and . Similarly, for each , define , , and . From inspection, we can observe that for any , there is at most one with . Therefore, we can infer that the optimal reward for (1) is , which can be attained by setting for any .

In contrast, the LP relaxation bound can be bounded below by a constant. By projecting out the auxiliary variables from the LP relaxation (8a-8g), we can compute that the convex hull of is

 Q(l,u):= \Set(v,y)∈[l,u]×R≥0y≤11−l(v−l),y≤1u−1(u−v).

Furthermore, for each we can computer valid bounds on as and . Similarly, valid bounds for each are and . Piecing it all together, we now fix , which due to (11b) will fix for each . Accordingly, the largest value we may set such that is . Similarly, the maximum allowed value for each such that (11c) is satisfied is . The reward at this LP feasible point is then

 1nn∑i=1(y+,i+y−,i) =1nn∑i=1(TT+i+TT+i) =n∑i=11T+i≥n∑i=112T=1.

## 4 Computational study

In this section, we perform a computational study on the efficacy of our proposed methods on both synthetic data and real data.

### 4.1 Implementation Details

#### Methods

Throughout, we compare six methods:

1. CP: – This is an optimal constant reserve price policy (i.e, set the reserve price as a constant for all samples without using contextual information). It is used as a benchmark to measure the gain from contextual information.

2. LP: (11) – The linear programming relaxation of (9).

3. MIP: (9) – The MIP formulation terminated after a time limit (to be specified in subsequent subsections).

4. MIP-R: (9) – The MIP formulation, terminated at the root node.

5. DC: The difference-of-convex algorithm of Mohri and Medina [38].

6. UB: – This is a perfect information upper bound equal to the average first bid price. This is the largest reward that can possibly be garnered from the auction. Note that this may be quite a loose upper bound, as in general there will not exist a linear model capable of setting such reserve prices given the contextual information.

#### Hyperparameter Tuning

The DC, LP, MIP-R, and MIP algorithms require that the model domain is explicitly specified. We utilize cross validation to tune the domain size as for . This cross validation step is the same as is done by Mohri and Medina [38].

Additionally, the DC algorithm utilizes a continuous piecewise linear function to approximate the discontinuous reward function . Thus, it requires another hyperparameter for the “slope” of the linear approximation. We do the same cross-validation on this hyperparameter as suggested in Mohri and Medina [38].

#### Evaluation

For each experiment, we report the average reward (i.e. ) of the final model from each algorithm on both the train and test data sets. Additionally, we report the proportion of sold impressions, namely, the proportion of impressions that the set reserve price is less than the bid price. Finally, we use the “gap closed” metric to measure the improvement of our proposed MIP algorithm over DC, the best existing algorithm from the literature. Mathematically, we compute the gap closed as , where in an abuse of notation we use the algorithm names to denote their respective rewards. Note that UB serves as an upper bound on the best possible linear model, which can be a conservative estimation.

#### Implementation

We implement our experiment in Julia [9], using JuMP [21, 35] for modeling the MIP, MIP-R, LP, and DC formulations. We use Gurobi v8.1.1 [25] to solve the optimization problems underlying MIP, MIP-R, LP, and DC. We intend to open-source our implementation of the methods in the near future.

### 4.2 Synthetic data

#### Data Generation.

Here we describe how we generate our synthetic data . First, the feature vectors are generated i.i.d. from a Gaussian distribution with identity covariance matrix, i.e., , normalized so that . In order to generate the bidding prices and , we assume there are two buyers, and they have underlining generative parameters and , such that their bids come from log-normal distributions as and , where controls the signal-to-noise ratio of the log-normal distribution. We then set and , where is a dilation factor to enlarge the difference between and .4 Moreover, the underlying parameters and of the two buyers should be correlated, since the bidding prices for high-valued slots should be high for all buyers. In order to model this, we set and , where and controls the correlation between and .

Overall, we have three parameters in the data generation process: controls the signal-to-noise level of the model, controls the similarity between two buyers, and controls the degree of flexibility the seller has when setting a reserve price.

#### Experimentation

We fix features, training samples, along with test and validation data sets comprising 5000 samples each. We first set a “baseline” configuration for our generative model with , , and . To explore the robustness of our model to changes in the data generation scheme, we then study three variants of this baseline with “high noise” (), “low correlation” (), and “low margin” (). For each of these four parameter settings, we present aggregate results over three trials in Tables 1-4. In these experiments, we use a time limit for MIP of 3 minutes.

In all four experiments, we observe that MIP offer a considerable improvement over DC. On the baseline configuration, MIP closes an average of 80.82% of the gap left by DC on the training set, and 80.04% of the gap left remaining on the test set. Unsurprisingly, the high noise configuration leads to degradation of performance with respect to the perfect information upper bound, but MIP is still able to close 57.5% and 56.87% of the gap on the training and test data sets, respectively. The low correlation configuration sees MIP closing 72.68% and 71.76% of the remaining gap on training and test data sets, respectively, while on the low margin configuration MIP closes 56.76% of training gap and 54.67% of testing gap.

While MIP-R does not quite attain the same level of performance as MIP, it is quite close and still handily outperforms DC both in- and out-of-sample. The LP method also outperforms DC on three of four experiments, albeit by a smaller margin. Indeed, the DC algorithm is unable to recover the performance of the constant policy that completely disregards contextual information on three of the four experiments. This is despite the fact that its model leads to a sale on nearly every impression. In other words, the DC model fails by not setting reserve prices aggressively enough. In contrast, the LP algorithm sets reserve prices too aggressively, leading to a model that successfully completes an auction in only slightly more than half of all impressions. The MIP and MIP-R methods both attain proportions sold near 1 while attaining very high reward. This indicates that they are not exploiting a small number of impressions that garner a high reward, but instead are intelligently setting a reserve price policy that captures excess reward across the population, without too aggressively setting the prices so that many impressions fail to sell.

### 4.3 eBay auctions for sports memorabilia

In this section, we turn our attention to a real data set. In particular, we utilize a published medium-size eBay data set for reproducibility, which comprises sports memorabilia auctions, to illustrate the performance of our algorithms. The data set is provided by Jay Grossman and subsequently studied in the context of reserve price optimization in [38].5 There are features in the data set, including seller information (e.g. seller rating and seller location), as well as item information. We refer the reader to [38] for a more detailed description of the data set. Finally, we set a time limit for MIP of 5 minutes, and note that we preprocess the data by normalizing the bidding prices with the mean of their first prices.

Table 5 and Table 6 depict the average and the confidence interval of the cumulative reward and of the proportion sold using different algorithms on both training and testing data set over random runs. In both, we use 2000 randomly selected samples from the data set for testing and validation. In Table 5, we train using 2000 randomly selected samples, while in Table 6 we utilize training samples.

In Table 5, MIP outperforms all other methods, producing the best performing models as measured on both the training and testing data sets. The DC algorithm is the next best performer, producing higher quality models than both LP and MIP-R. Indeed, MIP closes 7.39% of the gap left by DC on the training data set, with respect to the conservative UB upper bound. However, due to a lack of generalization, this number shrinks considerably to 1.66% on the test data set. There is no doubt that DC has a smaller generalization gap, although one plausible explanation for this could be the additional hyperparameters tuned over in the DC method. Moreover, we emphasize that these gaps are computed based on a conservative upper bound (i.e., UB) which, as observed in Section 4.2, may be quite loose.

In order to understand the behavior of the algorithms in a larger data context, we increase the training data sample size to 5000 and repeat the eBay experiments. The results are depicted in Table 6. While the rankings of the algorithms remains the same, MIP is able to extract more information from the larger data set. While the training reward grows, the models produced generalize much more successfully to the testing data set. In contrast, the DC algorithm appears unable to exploit the extra available data, with train and test accuracy that remain nearly identical with the previous experiment. Indeed, MIP is able to close 9.11% of the remaining gap on the training data set, and 7.01% on the testing data set.

Comparing Table 5 and Table 6, we can clearly see that the difference in reward produced by MIP between the training and testing data sets decreases as number of samples increases. This is intuitively consistent with what could be expected from a learning theory analysis, and we expect that this gap will likely keep shrinking in the “big data” regime as we further enlarge the training sample size.

## 5 Conclusion

In this paper, we study the linear model for reserve price optimization in a second-price auction. We first show that this is indeed a hard problem – unless the Exponential Time Hypothesis fails, there is no polynomial time optimal algorithm. Then we propose a mixed-integer programming formulation to exactly model this problem, and we show that this is ideal (i.e. the strongest possible formulation) when the number of sample . Since it can be computationally expensive to exactly solve the mixed-integer programming, we study the performance of its linear programming relaxation. Unfortunately, we provide a counter-example to show that, in the worst case, the objective gap between the linear programming relaxation and the true problem can scale linearly in the number of samples. Finally, we present a computational study of our methods on both synthetic dataset and real dataset, showcasing the advantages of our proposed methods.

### Footnotes

1. footnotemark:
2. footnotemark:
4. We note that the dilation factor is similar to the scaling of the linear functions used in the synthetic data generative model used in [38].
5. The dataset can be accessed at https://cims.nyu.edu/~munoz/data/.

### References

1. D. Aggarwal and N. Stephens-Davidowitz (2018) (Gap/s) eth hardness of svp. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 228–238. Cited by: §1.1, §1.2.
2. K. Amin, A. Rostamizadeh and U. Syed (2013) Learning prices for repeated auctions with strategic buyers. In Advances in Neural Information Processing Systems, pp. 1169–1177. Cited by: §1.2.
3. R. Anderson, J. Huchette, W. Ma, C. Tjandraatmadja and J. P. Vielma (To appear) Strong mixed-integer programming formulations for trained neural networks. Mathematical Programming. Cited by: §1.2.
4. R. Anderson, J. Huchette, C. Tjandraatmadja and J. P. Vielma (2019) Strong mixed-integer programming formulations for trained neural networks. In Proceedings of the 20th Conference on Integer Programming and Combinatorial Optimization, A. Lodi and V. Nagarajan (Eds.), Cham, pp. 27–42. Note: \urlhttps://arxiv.org/abs/1811.08359 Cited by: §1.2.
5. E. Balas (1985) Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM Journal on Algorithmic Discrete Methods 6 (3), pp. 466–486. Cited by: §3.
6. E. Balas (1998) Disjunctive programming: Properties of the convex hull of feasible points. Discrete Applied Mathematics 89, pp. 3–44. Cited by: §3.
7. D. Bertsimas and J. Dunn (2017-07) Optimal classification trees. Machine Learning 106 (7), pp. 1039–1082. Cited by: §1.2.
8. D. Bertsimas and A. King (2015) An algorithmic approach to linear regression. Operations Research 64 (1), pp. 2–16. Cited by: §1.2.
9. J. Bezanson, A. Edelman, S. Karpinski and V. B. Shah (2017) Julia: A fresh approach to numerical computing. SIAM Review 59 (1), pp. 65–98. Cited by: §4.1.4.
10. M. Braverman, Y. K. Ko, A. Rubinstein and O. Weinstein (2017) ETH hardness for densest-k-subgraph with perfect completeness. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1326–1341. Cited by: §1.1, §1.2.
11. M. Braverman, Y. K. Ko and O. Weinstein (2014) Approximating the best nash equilibrium in no (log n)-time breaks the exponential time hypothesis. In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 970–982. Cited by: §1.1, §1.2.
12. N. Cesa-Bianchi, C. Gentile and Y. Mansour (2014) Regret minimization for reserve prices in second-price auctions. IEEE Transactions on Information Theory 61 (1), pp. 549–564. Cited by: §1.2.
13. Y. Chen, K. Eickmeyer and J. Flum (2012) The exponential time hypothesis and the parameterized clique problem. In International Symposium on Parameterized and Exact Computation, pp. 13–24. Cited by: §1.1, §1.2.
14. R. H. Chitnis, H. Esfandiari, M. Hajiaghayi, R. Khandekar, G. Kortsarz and S. Seddighin (2014) A tight algorithm for strongly connected steiner subgraph on two terminals with demands. In International Symposium on Parameterized and Exact Computation, pp. 159–171. Cited by: §1.2.
15. M. Conforti, G. Cornuéjols and G. Zambelli (2014) Integer programming. Springer. Cited by: §3, §3.
16. K. L. Croxton, B. Gendron and T. L. Magnanti (2003-09) A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Management Science 49 (9), pp. 1268–1273. Cited by: §1.2.
17. K. L. Croxton, B. Gendron and T. L. Magnanti (2007-January-February) Variable disaggregation in network flow problems with piecewise linear costs. Operations Research 55 (1), pp. 146–157. Cited by: §1.2.
18. M. Cygan, F. V. Fomin, Ł. Kowalik, D. Lokshtanov, D. Marx, M. Pilipczuk, M. Pilipczuk and S. Saurabh (2015) Lower bounds based on the exponential-time hypothesis. In Parameterized Algorithms, pp. 467–521. Cited by: §1.1, §1.2.
19. R. Deits and R. Tedrake (2014) Footstep planning on uneven terrain with mixed-integer convex optimization. In 2014 14th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pp. 279–286. Cited by: §1.2.
20. R. Deits and R. Tedrake (2015) Efficient mixed-integer planning for UAVs in cluttered environments. In IEEE International Conference on Robotics and Automation, pp. 42–49. Cited by: §1.2.
21. I. Dunning, J. Huchette and M. Lubin (2017) JuMP: A modeling language for mathematical optimization. SIAM Review 59 (2), pp. 295–320. Cited by: §4.1.4.
22. D. Easley and J. Kleinberg (2010) Networks, crowds, and markets. Vol. 8, Cambridge university press Cambridge. Cited by: §1.
23. A. Fügenschuh, C. Hayn and D. Michaels (2014) Mixed-integer linear methods for layout-optimization of screening systems in recovered paper production. Optimization and Engineering 15, pp. 533–573. Cited by: §1.2.
24. T. Graf, P. V. Hentenryck, C. Pradelles-Lasserre and L. Zimmer (1990) Simulation of hybrid circuits in constraint logic programming. Computers and Mathematics with Applications 20 (9–10), pp. 45–56. Cited by: §1.2.
25. L. Gurobi Optimization (2020) Gurobi optimizer reference manual. External Links: Link Cited by: §4.1.4.
26. R. Horst and N. V. Thoai (1999) DC programming: overview. Journal of Optimization Theory and Applications 103 (1), pp. 1–43. Cited by: §1.2.
27. R. Impagliazzo and R. Paturi (2001) On the complexity of k-sat. Journal of Computer and System Sciences 62 (2), pp. 367–375. Cited by: §1.2.
28. R.G. Jeroslow and J.K. Lowe (1984) Modelling with integer variables. Mathematical Programming Study 22, pp. 167–184. Cited by: §3.
29. P. Jonsson, V. Lagerkvist, G. Nordh and B. Zanuttini (2013) Complexity of sat problems, clone theory and the exponential time hypothesis. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pp. 1264–1277. Cited by: §1.1, §1.2.
30. Y. Kanoria and H. Nazerzadeh (2017) Dynamic reserve prices for repeated auctions: learning from bids. Available at SSRN 2444495. Cited by: §1.2.
31. R. M. Karp (1975) On the computational complexity of combinatorial problems. Networks 5 (1), pp. 45–68. Cited by: §1.1.
32. S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion and R. Tedrake (2016) Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots 40 (3), pp. 429–455. Cited by: §1.2.
33. H. Liu and D. Z.W. Wang (2015-02) Global optimization method for network design problem with stochastic user equilibrium. Transportation Research Part B: Methodological 72, pp. 20–39. Cited by: §1.2.
34. D. Lokshtanov, D. Marx and S. Saurabh (2011) Lower bounds based on the exponential time hypothesis. Bulletin of the EATCS (105), pp. 41–72. Cited by: §1.1, §1.2.
35. M. Lubin and I. Dunning (2015-Spring) Computing in operations research using Julia. INFORMS Journal on Computing 27 (2), pp. 238–248. Cited by: §4.1.4.
36. P. Manurangsi (2017) Almost-polynomial ratio eth-hardness of approximating densest k-subgraph. In STOC, pp. 954–961. Cited by: §1.1, §1.2, §2, §2.
37. D. Mellinger, A. Kushleyev and V. Kumar (2012) Mixed-integer quadratic program trajectory generation for heterogeneous quadrotor teams. In IEEE International Conference on Robotics and Automation, pp. 477–483. Cited by: §1.2.
38. M. Mohri and A. M. Medina (2016) Learning algorithms for second-price auctions with reserve. The Journal of Machine Learning Research 17 (1), pp. 2632–2656. Cited by: §1.2, §3, item 5, §4.1.2, §4.1.2, §4.3, footnote 2.
39. M. Ostrovsky and M. Schwarz (2011) Reserve prices in internet advertising auctions: a field experiment.. EC 11, pp. 59–60. Cited by: §1.2.
40. M. Ryu, Y. Chow, R. Anderson, C. Tjandraatmadja and C. Boutilier (2019) CAQL: Continuous action Q-learning. Note: \urlhttps://arxiv.org/abs/1909.12397 Cited by: §1.2.
41. V. Tjeng, K. Xiao and R. Tedrake (2019) Verifying neural networks with mixed integer programming. In International Conference on Learning Representations, Cited by: §1.2.
42. V. Vassilevska Williams (2015) Hardness of easy problems: basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In 10th International Symposium on Parameterized and Exact Computation (IPEC 2015), Cited by: §1.1, §1.2.
43. J. P. Vielma (2015) Mixed integer linear programming formulation techniques. SIAM Review 57 (1), pp. 3–57. Cited by: §3.