Sponsored Search Auctions with Rich Ads
The generalized second price (GSP) auction has served as the core selling mechanism for sponsored search ads for over a decade. However, recent trends expanding the set of allowed ad formats—to include a variety of sizes, decorations, and other distinguishing features—have raised critical problems for GSP-based platforms. Alternatives such as the Vickrey-Clarke-Groves (VCG) auction raise different complications because they fundamentally change the way prices are computed. In this paper we report on our efforts to redesign a search ad selling system from the ground up in this new context, proposing a mechanism that optimizes an entire slate of ads globally and computes prices that achieve properties analogous to those held by GSP in the original, simpler setting of uniform ads. A careful algorithmic coupling of allocation-optimization and pricing-computation allows our auction to operate within the strict timing constraints inherent in real-time ad auctions. We report performance results of the auction in Yahoo’s Gemini Search platform.
Very early in the history of sponsored-search advertising, all the major platforms settled on some version of the Generalized Second Price (GSP) auction as the mechanism used to sell search ad spots. GSP has had remarkable staying power, apparently serving search ad marketplaces well for over a decade. However, recent trends expose problems stemming from the rigidity of traditional GSP-bound platforms: ads now come in various sizes and formats, and a mechanism that simply sorts ads and prices each based on competition from the ad below will have significant inefficiencies and unsought incentive properties.
For instance, imagine that a search platform has a priori allotted 12 lines at the top of the search page for advertisements. In the “old world” all ads were three lines long (just a title, url, and description), and so in this 12-line example there would be precisely four available ad slots, regardless of which advertisers bid. But in the “new world” there may be ads that have the basic three lines plus additional lines of sitelinks (taking the user directly to specific sections of the advertiser’s landing page), star-ratings, location information, a phone number, etc. An example of some of these ad extensions and decorations on Yahoo’s search platform is given in Figure 1.
In the new world of heterogeneous ads, an ad packing problem emerges. In a context where ads can vary in length, the search platform faces the richer problem of deciding which versions of which ads—and how many—to show. Whether it’s best to show a larger or smaller version of an ad may depend on which size-variants of competing ads are available. Perhaps one giant ad should be chosen to fill the entire space, or perhaps it’s better for the giant ad to be “trimmed” to a more moderate size and paired with a second small ad below it, or perhaps a slate of several three-line ads is best, etc. An illustration of the packing problem is given in Figure 2.
To deal with this new world, in this paper we rethink the search ad allocation and pricing problems from the ground up, proposing a mechanism that optimizes an entire page of ads globally. The efficiency-maximizing ad allocation problem can be formulated as an integer program; however, for a number of reasons, solving it this way is unwieldy in practice. We instead approach the problem explicitly as a search through the space of possible ad slates; the specific solution we implement is local-search based and not guaranteed to find an optimal configuration, but in practice its distance from optimality is negligible.
The main feature of the classic approach that we retain is a separable click-probability model, wherein it is assumed that the probability that a given ad will be clicked is equal to the product of an ad-variant-specific “ad clickability” number and an ad-variant-independent “location clickability” number. Even here, though, innovations are required: the “location clickability” number can no longer be associated with an ad slot, since the starting-position of the ad now depends on what kind of ad variants were shown above—location clickability for us is now a function of starting-line-position rather than slot-number.
The most technically novel contribution of this work is probably the approach we take to computing prices. It is most common to think of an allocation and pricing mechanism as proceeding in two stages—an allocation is computed, and from it (and the bids that generated it) prices are subsequently calculated. This two-phase approach makes a great deal of conceptual sense, but in our setting even fractions of a millisecond of compute-time are critical, and so a more integrated solution was required. Noting that the most salient pricing schemes can all be described in terms of the allocation function (where is the probability of bidder receiving a good—here, an ad click—given that he bids ), during the local search phase of the algorithm we “log” key information from which we can, ultimately, very quickly compute each bidder’s allocation function. Then, whatever pricing scheme is chosen, it can be applied by essentially just “reading off” prices from the computed allocation functions.
1.1 Related work
The problem of rich ads in search is well known, but not as well studied. In one sense, there is little to do — the elegant Vickrey-Clarke-Groves (VCG) auction reduces any problem related to rich ads to a modeling and optimization problem if one buys into it, and Facebook and Google have both leveraged VCG for this very reason .
The primary challenge for rich search ads is that marketplaces have been running GSP auctions for years; two thin lines of work consider the consequences. The first line of work studies GSP-like mechanisms in more complex optimization domains: Papers by Deng et al. , Bacharach et al. , and Cavallo and Wilkens  generally show negative results that equilibria may be poor or nonexistent. The second line of work instead aims to convert GSP bidders to VCG bidders with minimal issues .
More broadly, a long line of work starting with Varian  and Edelman, Ostrovsky and Schwarz  studies GSP and attempts to rationalize its use, e.g., by showing the existence of good equilibria or showing that GSP is more robust when click-through-rates have error [13, 6]. More recently, we argue that advertisers do not have quasilinear utilities and that GSP may in fact be the truthful auction [3, 16].
These prior works are all largely theoretical in nature. In the current paper, while we do make some conceptual and modeling contributions, a large emphasis is on reporting about what we think is an interesting large-scale engineering task: how to solve a computationally hard market-based problem in a feasible amount of time under severe runtime constraints. That dimension of our work strongly connects with many other studies from very different domains, such as [9, 10, 12], to name a few.
2 The ad allocation problem
We start with a formal description of the ad allocation problem. There is a set of ad “candidates”; each has a height and is associated with an advertiser , where is the set of all advertisers. At most one ad per advertiser can appear on the page. There are also configurable limits on the number and cumulative height of ads that can be shown on a single page: no more than ADLIM ads occupying a total of H lines can be selected.
Each ad has an associated bid and click probability density . can be interpreted as the advertiser’s claim about how much value he will receive should one of his ads be clicked (it is the same for all of his ads). The click probability density is a more novel concept: it can be thought of as a kind of normalized “click probability per line” for the ad. In combination with the line-specific location-clickability parameters,555These may be calculated naively based on empirical click-through-rates for every line of the page. More sophisticated approaches that seek to avoid selection bias may also be applied; we do not delve into such details here. it determines for each ; is the search platform’s estimate of the probability with which will be clicked if it is placed at starting line .
Each ad also has an associated vector of “costs”, where cost can loosely be thought of as the externality the ad imposes (on the user, the search platform, etc.) if it is shown; denotes this cost when ad is placed at starting line . One simple way costs may be deployed in practice is to assign a constant value to every , for every and , using that constant as a knob with which to tune the average ad footprint.
For each ad and line , let , i.e., if an ad starts on line then the ad covers line .
Our goal is to maximize efficiency, i.e., total advertiser value net of costs imposed by the chosen configuration of ads. Letting be a boolean variable denoting whether or not ad is placed at starting line , we can formulate the problem as follows:
Constraint (2) says that we can choose at most one ad variant per advertiser. Constraint (3) says that each line can be covered by at most one ad. This constraint also implicitly encodes the fact that our solution can use at most H lines. Constraint (4) limits the total number of ads chosen by the solution.
The above is an integer program that can be solved with standard methods. Even though the problem is strongly NP-hard (by the reduction from 3-PARTITION), the number of possible ad candidates is bounded, and so asymptotic runtime analysis is really not relevant. However, the runtime constraints of this environment are extremely severe—to create an experience of “instant service” for search users, every millisecond counts, and there may not always be time to solve this integer program.
2.1 Our algorithm
Motivated by runtime constraints, we opt for a local-search based heuristic approach to the problem. Our algorithm, described below in Figure 3, virtually always obtains an optimal solution, but in a much shorter period of time; moreover, it has an “anytime” property — in the rare event of an instance that cannot be solved within our time-constraints, the local search can be shut down and the intermediate solution taken.
The algorithm starts by doing something akin to traditional GSP: it orders ads by bid times click probability—except here we use click probability density since ads vary in length—and then chooses a slate greedily. But while this is where traditional GSP ends, it is only a starting point for us. The core of the algorithm iteratively modifies the slate through a series of ad swaps until no improving swaps can be made. We find solutions in this way for every possible size slate, and then choose the best one.
What are the possible vulnerabilities of this algorithm—i.e., in what cases might we get stuck in a local optimum that is not globally optimal? This may happen only in cases where swapping more than one ad at a time is required. Note that the absence of “1-for-2” swaps and the like is strongly mitigated by the fact that we find a local optimum for every possible cardinality ad slate. We will report detailed performance statistics in Section 4. For now, suffice it to say that the algorithm rarely leaves significant efficiency on the table.
Our pricing implementation maximizes flexibility by estimating each bidder ’s allocation curve .666In our setting there are a variety of non-null outcomes (ranging over ad variants and the slots they may appear in) that any given bidder may receive; but an “allocation” can be reduced to the one dimension that determines advertiser value: probability of click. is thus the probability with which receives a click in the outcome yielded when he bids and all other bidders’ bids are held constant. The allocation is a common tool in theory because it fully captures what an advertiser needs to know when selecting a bid. However, auctions in practice rarely construct explicitly; instead, they rely on computations that indirectly reference it. For example, externality pricing in the VCG auction is computed by removing each bidder one at a time and computing the negative effect on others — this computation happens to be equal to the area above .
In our case, having direct access to is important for two reasons. First and foremost, as we will discuss later, we strive to maintain GSP-like pricing, and our formulation effectively requires full knowledge of the curve . Second, having access to gives substantial flexibility in pricing if Yahoo wishes to change in the future, say, if competitors switch to a different pricing function such as VCG and Yahoo feels compelled to follow suit.
We will first discuss how we estimate efficiently; then we will discuss a handful of possible pricing strategies and motivate GSP-like prices.
3.1 Estimating allocation curves
The local search optimization explores a wide variety of slates; we want to use these slates to efficiently construct an approximation of . Since the allocation curve will be piecewise-constant, our desired output is a sequence of thresholds and a sequence of allocations , where the final estimated allocation is given by:
This is conceptually easy to compute in a naïve way: identify the breakpoints by repeated binary search. Unfortunately, this will require too much time, as the allocation algorithm must be run at every stage of the binary search. We must therefore leverage the work of local search to construct an approximation.
The approximate allocation
Note that if we run the optimal algorithm, bidder ’s allocation can be written as where denotes the allocation probability (probability of a click) on ’s ad in slate . Given any subset of possible slates , we can then define an approximation by taking the over only those slates in , i.e., . We use this idea to define :
The local search approximation of the allocation curve is
where denotes the set of slates considered by the local search algorithm.
Note that the approximation is the exact allocation assuming that the mechanism always explored a fixed set of slates and selected the optimal one.777Said in terms of another standard mechanism, is the allocation of a maximal-in-range allocation on the set of slates . However, since the mechanism will explore different slates for different bids, can both over- and under-estimate . The accuracy of as an approximation of will be discussed in Section 4.
Note that for any slate , we have
In particular, fixing bidder this can be written as , where is independent of . If we let denote the optimal objective value when reports (holding fixed), we can write:
Each slate yields a distinct (i.e., objective value as a function of ) line, and is the upper-envelope of these lines. Importantly, ’s allocation when reporting is the slope of the upper envelope at :
The optimal objective value , as a function of bid , for a set of slates is the upper-envelope of the lines associated with the slates . The associated allocation function is the slope of the upper envelope .
This implies a straightforward method to compute , illustrated in Figure 4.
The upper envelope is convex, therefore its slope is nondecreasing and the allocation is nondecreasing.888This should not be confused with a claim that is non-decreasing; if the local search fails to consider the right set of possible slates, its suboptimalities may lead to non-monotonicities in the actual allocation function . remains monotonic by construction.
3.2 Pricing methodologies
The beauty of an approach like this, which efficiently constructs an accurate representation of an entire allocation curve for each bidder, is that an array of diverse pricing functions can be accommodated—all with the same underlying infrastructure—with only a quick slate of the final “price read-off” stage (step 2 in Figure 4).
While we will ultimately settle on prices that mimic GSP, three pricing strategies are worthy of discussion here: first-pricing, VCG pricing, and GSP pricing. Each strategy has its own strengths and weaknesses.
First-price auctions (advertisers pay exactly what they bid) are convenient to implement but create major issues. Simple implementations are proven to be unstable both in theory and in practice . While stability can be restored , bidders must adopt a new bidding language. Perhaps more damningly, first-price semantics would likely upset advertisers who are generally accustomed to a second-price discount on search.
Running a traditional Vickrey-Clarke-Groves (VCG) auction is appealing for many reasons, but is ultimately an undesirable solution. On the plus side, first, standard theory says that it is the truthful auction. Second, VCG prices can be efficiently computed as externalities — it is sufficient to rerun the optimization as a black box additional times, then compute the negative effect each bidder has on the others. This mathematical abstraction naturally leads to a practical implementation abstraction, making VCG prices easy to implement. As a result, VCG has become the industry standard auction when facing a complex optimization problem .
However, VCG is not a perfect solution. Practically speaking, the marketplaces that use VCG pricing have generally done so from an early stage — we are unaware of any mature markets that have transitioned from GSP to VCG. The main challenge is that advertisers will need to change their bidding strategies; until they do, the auctioneer will generally lose money. Even assuming bidders eventually react, obtaining a smooth transition is a tricky task . Even worse in our particular circumstance, it is unclear that advertisers will be responsive given Yahoo’s market share.
More subtly, it is not clear that VCG is truly the best auction from a theoretical viewpoint for reasons having to do with questions regarding which utility model best reflects advertiser preferences. In particular, our prior work even suggests that GSP might be the appropriate incentive compatible auction .
Generalized GSP pricing
A natural solution is to stay with GSP pricing; the challenge is to define what that means. A traditional GSP auction sorts ads by a ranking score and charges each bidder the minimum bid required to hold its rank. This is sensible when the auction is simply assigning ads to ranks; but when the auction makes a complex trade-off over the features of an ad, this is no longer well-defined. A theoretical literature strives to justify GSP’s use; however, it fails to identify the defining properties of GSP that one would need in order to generalize it.
Based on our prior work, we propose that GSP be generalized as the truthful auction for value maximizing bidders (see [3, 16] for a thorough treatment). A value maximizing bidder wants to get as many clicks as possible without paying more than its value, i.e., to maximize while keeping . In contrast, a traditional model assumes bidders maximize expected profit .
Defining truthful prices for these bidders in our auction leads to a pricing intuition often given to the GSP price:999This property of GSP prices is not a new observation, but  is the first to give a solid foundation for why this property is significant.
The truthful price for a value maximizer is when gets allocation .
That is, is the minimum bid advertiser must submit to maintain the same allocation.
This gives us a candidate auction: when gets allocation , charge . Unfortunately, this auction may “overcharge.” For example, if (there’s a tiny step in the allocation function) but (there’s a large difference in the minimum bids that yield the two allocations), an advertiser might not care whether it gets allocation or , but this GSP auction could charge a 2x premium for the higher allocation. This is illustrated in Figure 5. The problem arises because the value maximizing model assumes bidders are willing to pay an unrealistically large price for a tiny increase in allocation.
To refine our version of GSP, we choose a middle ground between VCG pricing and GSP pricing using ideas introduced in [3, 16]. Our approach is to start with a hybrid preference model — a model of bidder preferences that lies between quasilinear utilities and value maximizing preferences — and set prices so that bidders of the chosen type would be truthful. We propose two different hybrids.
Our first hybrid model adds a return on investment (ROI) constraint of to existing quasilinear utilities. We refer the reader to  for details, but the prices are as follows:
The ROI-constrained truthful price when gets allocation is computed by the following recursive formula: , and for all ,
This formula has a natural interpretation: a bidder is charged the lesser of the GSP price () and the price one computes starting with at price and assuming a marginal cost-per-click of for the extra expected clicks. This is illustrated in Figure 6.
The second type of preferences we propose is continuous and assumes that bidders optimize a utility function of the form . Again, we refer the reader to  for details:
The -hybrid truthful price is given by the following formula:
At , both models describe traditional VCG prices (truthful prices for profit maximizers); as , both models converge to GSP prices (truthful for value maximizers). We choose a hybrid model to mimic GSP while curtailing extremely high marginal prices.
4.1 Allocation accuracy
The first results we present regard how well our heuristic ad allocation algorithm approximates the optimal solution.101010The optimal allocation, given any set of ad candidates, can be solved by a variety of methods including integer programming; it is easy for us to compute statistics about an optimal algorithm offline, despite it not being suitable for online use due to the severe runtime constraints of our domain. We report results on a random selection of 100,000 auction instances drawn from Yahoo’s Gemini search platform for Desktop devices. An “instance” consists of a set of candidate ads and all relevant accompanying information (bids, clickability predictions, size, decorations, etc.). We ran the algorithm with a variety of different maximum ad cardinality limits (ADLIM), in each case applying a maximum number of ad lines (H) equal to 18.
When a maximum of two ads may be shown, the heuristic misses the optimal allocation in less than one out of every 10,000 instances. When three ads may be shown, this goes down to about one out of every 150 instances. As the ADLIM increases the heuristic diverges from the optimal solution in more and more cases; however, when it does diverge, it still finds a solution that is negligibly worse than the optimal one. Even for an ADLIM of 5 (which is the upper limit of what is currently seen on any of the major search platforms), our heuristic algorithm obtains more than 99.5% of the optimal efficiency on average.
4.2 Pricing accuracy
Completely apart from the potential suboptimality of the allocation that our algorithm computes, there is approximation in the prices we compute. As discussed, to precisely compute prices (say, according to Eq. (6) or Eq. (7)) one needs to compute the allocation curve for the bidder, from which the price can be quickly deduced. One can do so in a brute-force manner, panning across the space of possible bids and observing how the bidder’s allocation (and predicted number of clicks) changes, holding all other bids constant. Since there are only a finite number of allocations, one can do better than this by using binary search to determine the “break points” in the allocation curve—i.e., the set of distinct flat regions it is constituted by. But even this will be too computationally costly to use in real-time. Hence our online method for computing allocation curves, described in Section 3.
Determining those prices is computationally feasible, but are the prices any good? Yes. Figure 7 illustrates the accuracy of our “approximate prices” by comparing them to the exact prices, computed offline via binary search.
The top illustration in Figure 7 is a histogram of the ratios of approximated price to exact price. The giant peak at 1 indicates that we get the price precisely right a large percentage of the time. There is a significant volume just to the right of peak, where we slightly overestimate prices, and there is a fairly long and light tail below 1. The bottom illustration conveys the cumulative distribution of the absolute-value distance of our approximate prices from the exact prices. We’re within 5% of the exact price about 85% of the time, and we’re within 15% almost 95% of the time.
4.3 Online bucket results
In live bucket tests our algorithm substantially improves over its predecessor, packing more ads into less space. Selected results from one test are given in Table 2: the test packed more ads into of the lines. Meanwhile, it increased advertiser value by 8% (assuming bids are truthful) with clicks remaining nearly neutral. Revenue was also neutral, but this metric has little meaning in a small bucket test when the auction rules change.
|Metric||Test vs. Control|
|Value per Search|
|Revenue per Search||neutral|
In this paper we provided a formal introduction of the rich ads problem for sponsored search, and described our recent efforts to address it. The major part of the paper focused on presenting the details of our engineered solution. We described a local search based heuristic method that achieves performance that is practically identical to that of an optimal algorithm, while meeting the tight runtime constraints of the sponsored search domain. We described a method that couples the algorithmic determination of a near-optimal allocation with allocation-curve construction, which allows us to quickly compute prices without repeating work, making the whole system runtime feasible and easily adaptable to future developments.
The claim about our heuristic optimization algorithm being near-optimal is an empirical observation based on the types of decorations available today and the ad real-estate constraint in place. Similarly we have empirically established the close approximation of our allocation curve generation method. In our future work, we will be exploring exact optimization algorithms that are guaranteed to produce allocation curves with approximation bounds.
Online experiments on a fraction of Yahoo search traffic comparing the performance of our algorithm with the standard GSP algorithm on metrics such as revenue, click yield, ad real estate footprint, and user response indicate that our new approach yields improved outcomes in a “win-win-win” fashion, achieving gains in advertiser value and revenue, while at the same time reducing the overall ad footprint. One thing we observe is that, for many ads, after a certain point the click-through-rate (CTR) per line has diminishing returns and thus smaller ads have a higher average click-through-rate per line. Our algorithm therefore often favors smaller ad variants over larger ones, packing more ads for the same total number of lines on a page.
After a version of our algorithm is launched into full-scale production we expect that advertisers will adjust their bids in an effort to have their preferred ad variant appear. While in our present version we assume that we can drop decorations willy-nilly to vary the size of the ad, advertiser preferences over their ad variants may present practical constraints. It may be necessary to allow advertisers to bid separately for each ad variant, so that their preferences can be properly expressed. This would raise a number of challenges, among other things forcing us to modify how we generate allocation curves for pricing. It is an area that we will be studying further.
-  Yoram Bachrach, Sofia Ceppi, Ian A. Kash, Peter Key, and Mohammad Reza Khani. Mechanism design for mixed ads. In Ad Auctions Workshop, January 2015.
-  Yoram Bachrach, Sofia Ceppi, Ian A. Kash, Peter Key, and David Kurokawa. Optimising trade-offs among stakeholders in ad auctions. In Proceedings of the 15th ACM Conference on Economics and Computation, pages 75–92, 2014.
-  Ruggiero Cavallo, Prabhakar Krishnamurthy, and Christopher A. Wilkens. On the truthfulness of gsp. In Eleventh Workshop on Sponsored Search Auctions, 2015.
-  Ruggiero Cavallo and Christopher A. Wilkens. Web and Internet Economics: 10th International Conference, WINE 2014, Beijing, China, December 14-17, 2014. Proceedings, chapter GSP with General Independent Click-through-Rates, pages 400–416. Springer International Publishing, Cham, 2014.
-  Xiaotie Deng, Yang Sun, Ming Yin, and Yunhong Zhou. Mechanism Design for Multi-slot Ads Auction in Sponsored Search Markets, pages 11–22. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
-  Paul Dütting, Felix Fischer, and David C. Parkes. Truthful outcomes from non-truthful position auctions. In Proceedings of the 2016 ACM Conference on Economics and Computation, EC ’16, pages 813–813, New York, NY, USA, 2016. ACM.
-  Benjamin Edelman and Michael Ostrovsky. Strategic bidder behavior in sponsored search auctions. Decis. Support Syst., 43(1):192–198, February 2007.
-  Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1):242–259, 2007.
-  Yuzo Fujishima, Kevin Leyton-Brown, and Yoav Shoham. Taming the computational complexity of combinatorial auctions: Optimal and approximate approaches. In IJCAI, volume 99, pages 548–553. DTIC Document, 1999.
-  Oktay Günlük, Lászlo Ladányi, and Sven De Vries. A branch-and-price algorithm and new test problems for spectrum auctions. Management Science, 51(3):391–406, 2005.
-  Darrell Hoy, Kamal Jain, and Christopher A. Wilkens. A dynamic axiomatic approach to first-price auctions. In Proceedings of the Fourteenth ACM Conference on Electronic Commerce, EC ’13, pages 583–584, New York, NY, USA, 2013. ACM.
-  Benjamin Lubin, Adam I. Juda, Ruggiero Cavallo, Sébastien Lahaie, Jeffrey Shneidman, and David C. Parkes. ICE: An expressive iterative combinatorial exchange. Journal of Artificial Intelligence Research, 33(1):33–77, 2008.
-  Paul Milgrom. Simplified mechanisms with an application to sponsored-search auctions. Games and Economic Behavior, 70(1):62–70, 2010.
-  Hal R. Varian. Position auctions. International Journal of Industrial Organization, 25:1163–1178, 2007.
-  Hal R. Varian and Christopher Harris. The vcg auction in theory and practice. American Economic Review, 104(5):442–45, 2014.
-  Christopher A. Wilkens, Ruggiero Cavallo, and Rad Niazadeh. Mechanism design for value maximizers. CoRR, abs/1607.04362, 2016.