Dual Based DSP Bidding Strategy and its Application
Abstract.
In recent years, RTB(Real Time Bidding) becomes a popular online advertisement trading method. During the auction, each DSP(Demand Side Platform) is supposed to evaluate current opportunity and respond with an ad and corresponding bid price. It’s essential for DSP to find an optimal ad selection and bid price determination strategy which maximizes revenue or performance under budget and ROI(Return On Investment) constraints in P4P(Pay For Performance) or P4U(Pay For Usage) mode. We solve this problem by 1) formalizing the DSP problem as a constrained optimization problem, 2) proposing the augmented MMKP(Multichoice Multidimensional Knapsack Problem) with general solution, 3) and demonstrating the DSP problem is a special case of the augmented MMKP and deriving specialized strategy. Our strategy is verified through simulation and outperforms stateoftheart strategies in real application. To the best of our knowledge, our solution is the first dual based DSP bidding framework that is derived from strict second price auction assumption and generally applicable to the multiple ads scenario with various objectives and constraints.
1. Introduction
In recent years, RTB(Real Time Bidding) becomes a popular online advertisement trading method. There are three major roles in the market, namely SSP(Supply Side Platform), DSP(Demand Side Platform), and AdX(Ad Exchange). SSP controls huge amount of websites and earns money by supplying impressions. DSP holds a lot of advertisers and makes profit through fulfilling their demands. AdX, an online advertisement exchange, docks SSPs and DSPs and holds auctions.
In a typical scenario, an audience visits one of the SSP’s websites, then the AdX is informed and an auction is initiated. The AdX broadcasts bid request to DSPs and waits for a short time(e.g. 100ms). Each DSP is supposed to evaluate current opportunity and respond with an ad and corresponding bid price. The AdX gathers bid responses arriving before deadline and determines the winner and its bidding cost. Finally, the AdX notifies the SSP about the auction result and the SSP serves the winner’s ad to the audience.
There are two popular payment modes for advertisers

P4P(Pay For Performance): the advertiser sets a CPP(Cost Per Performance) and pays DSP the CPP times the units of performance delivered by DSP(e.g. 1$/click*10clicks=10$).

P4U(Pay For Usage): the advertiser sets a CR(Commission Rate) and pays DSP the total bidding cost plus the fraction of it as commission(e.g. (1+10%)*100$=110$).
DSP is interested in optimizing one of the following objectives

Revenue: the total amount of money(e.g. 50$) earned from advertisers through either payment mode mentioned above.

Performance: the total units of performance(e.g. 20 clicks) delivered to advertisers.
During the optimization, several constraints must be satisfied

Budget Upper Bound: the maximum amount of money the advertiser is willing to spend in DSP for a certain period of time (e.g. 100$/day).

ROI Lower Bound: the minimum value of ROI(Return On Investment) which is defined as, for DSP, the revenue earned from advertisers over the bidding cost payed to AdX (e.g. DSP ROI is 1.1 when DSP earns 110$ and pays 100$) and, for advertiser, the performance delivered by DSP over the money spent in DSP (e.g. advertiser ROI is 0.8 when advertiser spends 100$ for 80 clicks).
It’s essential for DSP to find an optimal ad selection and bid price determination strategy which maximizes revenue or performance under budget and ROI constraints in P4P or P4U mode. We solve this problem by

formalizing the DSP problem as a constrained optimization problem(Section 3),

proposing the augmented MMKP(Multichoice Multidimensional Knapsack Problem) with general solution(Section 4),

and demonstrating the DSP problem is a special case of the augmented MMKP and deriving specialized strategy(Section 5).
Our strategy is verified through simulation(Section 6) and outperforms stateoftheart strategies in real application(Section 7). To the best of our knowledge, our solution is the first dual based DSP bidding framework that is derived from strict second price auction assumption and generally applicable to the multiple ads scenario with various objectives and constraints. These are the main contributions of this document.
Before further discussion, it’s worth to mention several points about our problem configuration. First, PPI(Performance Per Impression) is defined as the expected performance of one impression with certain ad and its accurate prediction is of great importance in performance estimation. However, PPI prediction is beyond the scope of this document and we assume that the PPI is always explicitly provided in the rest of our discussion. Second, it is assumed that all advertisers agree to the same payment mode and performance metric and DSP prefers to optimize a pure objective rather than a hybrid one. Third, the CPP in P4P mode or CR in P4U mode are set on the ad level, i.e. the advertiser is able to set different CPP or CR for his ads. And the constraints are set on ad group level, e.g. the budget might be shared by ads of the same advertiser and DSP might be interested in controlling its global ROI. At last, the ROI lower bound for advertiser could also be interpreted as the CPP upper bound which might be more familiar to some readers.
2. Related Works
(Perlich et al., 2012) suggests a linear bidding strategy which, given base price, bids in proportion to the relative quality of impression. However, their method is a heuristic one and lacks theoretical foundations.
Based on calculus of variations, (Zhang et al., 2014) suggests a nonlinear relationship between optimal bid price and KPIs. However, their strategy is derived from first price auction assumption which doesn’t hold in RTB. Besides, winning rate is explicitly modeled as a function of bid price in (Zhang et al., 2014). To find the analytical solution of the optimal bid price, the winning rate function must be of specific forms, which makes their method inflexible.
Both win rate and winning price are estimated in (Li and Guan, 2014), and the corresponding bidding strategy is provided. However, their strategy doesn’t consider any constraints(i.e. budget) which are common in real DSP applications.
While all above researches consider only one campaign, (Zhang and Wang, 2015) extends (Zhang et al., 2014) and proposes bidding strategy for multiple campaigns. However, (Zhang and Wang, 2015) also shares the drawbacks of (Zhang et al., 2014) as listed above.
(Geyik et al., 2016) studies the joint optimization of multiple objectives with priorities. (Xu et al., 2016) argues that the bid price should be decided based on the performance lift rather than absolute performance value. Risk management of RTB is discussed and riskaware bidding strategy is proposed in (Zhang et al., 2017). By modeling the state transition of auction competition, the optimal bidding policy is derived in (Cai et al., 2017) based on reinforcement learning theory.
The probability estimation of interested feedbacks plays a central role in performance based advertising. CTR(Click Through Rate) prediction is of great importance and extensively studied by researchers. FTRLProximal, an online learning algorithm, is proposed in (McMahan et al., 2013) and sparse model is learned for CTR prediction. In (He et al., 2014), a hybrid model which combines decision trees with logistic regression outperforms either of these methods on their own. In (Juan et al., 2016), fieldaware factorization machines are used to predict CTR. Compared with clicks, the conversions are even more rare and harder to predict. To tackle the data sparseness, a hierarchical method is proposed in (Lee et al., 2012) for CVR(Conversion Rate) prediction. Feedbacks are usually delayed in practice and (Chapelle, 2014) tries to distinguish negative training samples without feedbacks eventually from those with delayed ones.
Bidding landscape is studied in (Cui et al., 2011) and log normal is used to model the distribution of winning price. (Wu et al., 2015) predicts win price with censored data, which utilizes both winning and losing samples in the sealed auction. Traffic prediction for DSP is discussed in (Lai et al., 2016). Budget pacing is achieved through throttling in (Xu et al., 2015) and bid price adjustment in (Lee et al., 2013).
Our work is mainly inspired by (Chen et al., 2011) in which compact allocation strategy, after modeling its problem as linear programming, is derived from complementary slackness. Sealed second price auction is studied in (Vickrey, 1961). After all, DSP problem is a sort of online matching problem and (Mehta et al., 2013) is an informative survey of this area.
3. Formalization
3.1. Primal
The DSP problem could be formalized as follows. Once we bid with , it results in gain and resource consumptions , both of which are functions of . Our total gain should be maximized under resource constraints with and as variables. In addition, each should be distributed to no more than one . To conquer the computational hardness, indicator variable is relaxed from to . Although most kinds of resources(e.g. budget) are sort of private and only accessible to very limited number of s in practice, we assume, without loss of generality, that all resources are public and shared by all s in this formalization.
(1)  
(2)  s.t.  
(3)  
(4) 
is the index of
is the index of
is the index of
is a relaxed variable, indicating whether should be given to
, short for , is a variable
is the gain function of with support
is the th resource consumption function of with support
is a resource limit constant
The above formalization might seem too abstract to capture the details of those practical objectives and constraints discussed in Section 1. To make things clearer, we

derive the expected winning probability and bidding cost under second price auction assumption(Section 3.2),

define the utility function family based on previous derivation(Section 3.3),

and show how to systematically encode those practical objectives and constraints into above formalization by setting and choosing and from (Section 3.4).
3.2. Second Price Auction
Most AdXes adopt sealed second price auction mechanism in which the DSP with the highest bid price wins and pays the second highest bid price. For example, three DSPs bid 2$, 1$, 3$ respectively, so the third DSP wins and pays 2$. Furthermore, only the winner has access to the second highest bid price while the others observe nothing except the fact that they lose.
Due to the dynamic nature of auction, the outcome is random. To model this uncertainty, is defined as the distribution of the highest bid price among all other DSPs’ bid prices for with support . In another word, the most competitive DSP will bid for with probability .
To win , our must be higher than , but we will only pay eventually. Then the expected winning probability and bidding cost for our DSP could be defined as follows. As our goes infinite, we’ll win with probability 1 and our bidding cost must be the mean of .
Definition 3.1 ().
Definition 3.2 ().
It is the nonnegativity property of and the integral forms of and which play a central role in our theory(Section 3.3). Except that, we make little, if any, assumption about the distribution family of . In addition, in some special cases, even explicit modeling of is unnecessary (Section 5.3 & 5.4), which simplifies the implementation of our strategy.
Whenever it’s mandatory, could be modeled with method proposed by (Wu et al., 2015). We could pick a distribution family (e.g. log normal) with parameter and learn a parameter predictor from historical bidding data which maximizes the following likelihood.
(5) 
The likelihood could be separated into two parts, i.e. one for the impressions we won and the other for those we lost. For any that we won, the bid price of the most competitive DSP must be equal to our , which suggests the first part. Otherwise, the only thing for sure is that it must be higher than our , which suggests the second part.
3.3. Utility Function Family
The practical and in DSP problem come from a certain family which is defined here and whose properties are shown without proof.
Definition 3.3 ().
is the function family that is of the form .
Theorem 3.4 ().
Given , we have .
Theorem 3.5 ().
Given , we have .
Theorem 3.6 ().
Given with shared and , we have if and only if .
Theorem 3.7 ().
Given with shared , we have .
All above theorems are listed here for summarization purpose and will be referenced when actually used. It’s safe to skip them for now and come back later.
3.4. Objectives and Constraints
There are 4 practical objectives as listed in Table 1, i.e. revenue and performance objectives in P4P and P4U modes. It’s straightforward to encode those objectives into standard form by definition.
Mode  Type  Definition  
P4P  Revenue  0  
Performance  0  
P4U  Revenue  0  
Performance  0 
There are 6 practical constraints as listed in Table 4, i.e. budget, DSP ROI and advertiser ROI constraints in P4P and P4U modes. Constraints like budget could be expressed in standard form naturally. Others, though not so obvious at the first glance, could be rewritten into standard form as well.
Take DSP ROI constraint in P4P mode for example. By definition, we have
(6) 
After multiplying both sides with the denominator, subtracting both sides with the nominator, and combining items by , we have
(7) 
It’s easy to encode this constraint into standard form with , and .
Mode  Type  Definition  
P4P  Budget  0  
DSP ROI  0  
Advertiser ROI  0  0  
P4U  Budget  0  
DSP ROI  0  0  
Advertiser ROI  0 
4. Augmented MMKP
4.1. Primal
Now we propose the augmented MMKP which could be formalized as follows and seen as an extension of MMKP with infinitely many subchoices. In the original MMKP, both and are constants, while, in the augmented MMKP, is variable and becomes function of . Our mainchoice and subchoice are indicated by and corresponding respectively.
(8)  
(9)  s.t.  
(10)  
(11) 
is the index of
is the index of
is the index of
is a relaxed variable, indicating whether should be given to
is a gain variable
is the th resource consumption function of with support
is a resource limit constant
4.2. Dual
We define several basic functions and show the dual of augmented MMKP based on them.
Definition 4.1 ().
Definition 4.2 ().
Definition 4.3 ().
(12)  
(13)  s.t.  
(14)  
(15) 
serves as sort of score function which is used to estimate the utility of distributing to with subchoice . It could be interpreted as the compromised gain function in which our original gain is degenerated by resource consumption with opportunity price .
4.3. Strong Duality
The strong duality of augmented MMKP is provable under mild assumption. A brief proof is provided here and more details are revealed in the appendix.
Theorem 4.4 ().
If is convex function of , strong duality of augmented MMKP holds.
Proof.
Several auxiliary problems are defined in Table 3. P is the primal and could be separated into 2 nested steps. The inner step, given , maximizes objective with as variables. The outer step, maximizes objective with as variables.
Name  Description 
P  outer[inner] 
D  dualize(P) 
DD  dualize(dualize(P)) 
d  outer[dualize(inner)] 
dd  outer[dualize(dualize(inner))] 
Since is convex function of , inner is a strong duality problem, inner = dualize(inner), P = d. Since dualize(inner) is a strong duality problem, dualize(inner) = dualize(dualize(inner)), d = dd. Since D is a strong duality problem, D = DD. dd and DD happen to have the same form, dd = DD. As a result, P = D, strong duality of augmented MMKP holds. ∎
4.4. Dual Based Strategy
With strong duality satisfied, several important properties are claimed about the optimal solution of both primal and dual problems(i.e. , , and ), based on which we propose the dual based strategy.
Theorem 4.5 ().
.
Theorem 4.6 ().
.
Corollary 4.7 ().
If , we have .
Proof.
Since and , we have . Now that and , taking above theorem into consideration, must be 0, that is should not be distributed to . ∎
Corollary 4.8 ().
If , we have .
Proof.
Similarly, since and , we have . Now that and , taking above theorem into consideration, must be 0, that is should not be distributed to dominated . ∎
Theorem 4.9 ().
.
Corollary 4.10 ().
If that , we have .
Proof.
Since and , we have . Now that and , taking above theorem into consideration, must be 1, which means should not be discarded. ∎
In summary, for each , every should propose its own best score achieved by . should be awarded to the dominating if its best score is positive and discarded if that is negative. Theoretically speaking, while most of which are determined by above corollaries, behaviors remain undefined in two special cases. First, there might be multiple dominating users with the same best score. Second, the best score of dominating user might be exactly zero. In practice, however, both cases are probably rare due to the high resolution of items and users, and prone to cause relatively limited damage. Ties could be broken by random or heuristics.
4.5. Numeric Optimization
Note that, during the execution of the dual base strategy, only the is mandatory while the others(i.e. , and ) could be recovered with , which makes our strategy storage efficient. Next, we propose the numeric method to solve .
Definition 4.11 ().
Definition 4.12 ().
By fixing in the dual problem, could be calculated as . Then the dual problem could be rewritten as
(16) 
and solved by SGD(Stochastic Gradient Descent). Due to the convexity of dual problem, it must converge to the global optimal .
5. Solution
5.1. Dual
We define corresponding basic functions and show the dual of DSP problem based on them.
Definition 5.1 ().
Definition 5.2 ().
Definition 5.3 ().
(17)  
(18)  s.t.  
(19)  
(20) 
Note that, in DSP problem, our subchoice is indicated by rather than . Since is the linear combination of and from with shared , it must belong to too with its and as follows.
(21)  
(22) 
In practice, each ad is usually subjected to very limited number of constraints, which makes the calculation of and lightweighted.
5.2. Strong Duality
Due to the nice property of , it’s easy to check that, as to practical objectives and constraints(Section 3.4), is indeed convex function of , which immediately justifies the strong duality of DSP problem.
Theorem 5.4 ().
Strong duality of DSP problem holds.
5.3. Dual Based Strategy
With strong duality satisfied, the dual based strategy developed for augmented MMKP is also applicable to DSP problem. Generally speaking, could be determined without according to Theorem 3.5. In certain applications, is the same for given and all , then could also be determined independent of according to Theorem 3.6. By disposing of completely from deciding process, it not only simplifies the computation, but also encourages free training process.
5.4. Numeric Optimization
The numeric method developed for augmented MMKP is also applicable to DSP problem. It’s easy to prove that must be either if the best score of the dominating is positive or otherwise. This optimization method, though generally applicable, requires explicit modeling of .
Through executing our strategy in production environment, the randomized version of is revealed and the gradients could be approximated with these feedbacks. This optimization method is free and much easier to implement.
6. Simulation
6.1. Methodology
To eliminate the uncertainty, our strategy is verified in P4P and P4U modes through simulation. Due to the limited space, we focus on the P4P mode in the rest of Section 6.
There are two simulation cases, i.e. one for revenue maximization and the other for performance maximization. Two mocked ads and are created with and . Four mocked constraints are listed in Table 4. Budget of and are 20 and 10 respectively. The global DSP ROI lower bound is 2, while the global advertiser ROI lower bound is 0.5. As suggested by (Cui et al., 2011), is assumed to be log normal distribution with mean and standard deviation as parameters. To mock the impressions, 200 tuples ¡, , , ¿ are drawn randomly.
k  Type  Parameter  Scope 
1  Budget  
2  Budget  
3  DSP ROI  
4  Advertiser ROI 
Once the configurations are ready, are solved by SGD(Section 5.4). After that, the dual based strategy is applied on the same cases and the consequent statistics are collected.
6.2. Results and Analysis
The statistics and are listed in Table 5. In both cases, all resources have nonnegative surplus and no constraint is violated. In addition, the gap between primal and dual objective values is negligible(Theorem 5.4). As mentioned earlier, the serves as so called opportunity price of the resource. Intuitively speaking, waste of resource with positive surplus shouldn’t lead to any opportunity cost. As a result, the corresponding tends to be 0.
Case  Description  Primal  Dual  Resource  
Limit  Consumption  Surplus  

2.164  2.164  1  20.000  0.865  19.135  0.000  
2  10.000  1.299  8.701  0.000  
3  0.000  0.000  0.000  4.749  
4  0.000  0.433  0.433  0.000  

1.590  1.590  1  20.000  1.100  18.900  0.000  
2  10.000  0.979  9.021  0.000  
3  0.000  2.747  2.747  3.654  
4  0.000  0.550  0.550  0.000 
7. Application
7.1. Scenario
We also deploy our strategy in the DSP platform of Alibaba. In our application, advertisers set budgets and pay for clicks, while DSP is willing to maximize revenue under daily global DSP ROI constraint. There are so many ads in our inventory that it’s impossible to go through each ad before auction deadline. Although these budgets are quite large totally, they are relatively small on average.
With well calibrated CPP and PPI predictors, the problem could be transformed equivalently into one in P4P mode. And to meet the latency requirement, the whole deciding process is decomposed into two stages with so called logical ad.
Logical ad should be seen as proxy of physical ads and binded with specific ad retrieval algorithm. In the first stage, DSP is supposed to make decisions among just a few logical ads and respond in time. In the second stage, once the chosen logical ad wins the auction, physical ad is lazily retrieved with corresponding algorithm.
Our logical ads are actually based on 4 heterogeneous ad retrieval algorithms whose details are beyond the scope of this document. These algorithms are sorted by their historical performance in descending orders and 4 logical ads are constructed correspondingly.
In summary, our problem could be approximately modeled as, given 4 logical ads with literally unlimited budget, maximizing revenue under daily global DSP ROI constraint in P4P mode. Since there is only one resource constraint, superscript is omitted and ROI is short for global DSP ROI in the rest of Section 7.
7.2. Experiment Groups
We compare our strategy with a variation of linear bidding strategy. In (Perlich et al., 2012), it’s suggested that with set by operation team. However, unlike which is independent of , indeed varies with it. As a result, we iteratively update with .
We also apply optimal RTB theory to our application for comparison. According to (Zhang et al., 2014), we model the win probability as and bid with , in which is fitted with method proposed by (Wu et al., 2015) and is iteratively tuned with .
Four experiment groups are shown in Table 6. The first three groups are designed to compare different strategies with single logical ad, while the last group is used to test our strategy with multiple logical ads.
Group  Inventory  Strategy  Iteration  Period  
Linear  24 hours  
Optimal RTB  10 minutes  
Dual Based  
To eliminate potential bias, the experiment lasts for a whole ordinary week. Bidding opportunities are distributed to each group randomly with equal probability. For fairness, the same CPP and PPI predictors are shared by all groups. The lower bound of daily ROI is set to 3.5.
Strategy parameters(i.e. , and ) are randomly initialized and periodically adjusted with actual ROI since last update. The period is set to 24 hours for the group due to the data sparseness and 10 minutes as to the others for robustness and faster convergence. Note that the more frequent update introduces inexplicitly a 10 minutes ROI constraint which is stricter than the daily one and might degenerate the theoretical optimal.
7.3. Results and Analysis
For each group, the daily statistics of four metrics are plotted in Figure 3, namely revenue, actual ROI, number of winning impressions and revenue per winning impression.
The , though with theoretical optimal intact, tends to earn less revenue than the others in practice. In addition, it usually violates the daily ROI constraint seriously, so it’s an inferior strategy.
Compared with the who claims a linear relationship between bid price and expected revenue, the , derived from first price auction assumption, suggests a nonlinear one. It is biased towards the impressions with low expected revenue and against those with high expected revenue, which leads to more impressions and lower averaged quality. While the daily ROI constraint is satisfied by both strategies, the earns more revenue than the . As a result, the is superior theoretically and practically.
The , as an ensemble of four ad retrieval algorithms, achieves the most revenue without violation of the daily ROI constraint and becomes the best strategy.
8. Conclusions and Future Works
In this document, we propose a dual based DSP bidding strategy derived from second price auction assumption according to convex optimization theory. Our strategy is verified through simulation and outperforms stateoftheart strategies in real application. It’s a theoretically solid and practically effective strategy with simple implementation and various applications.
Three problems remain unsolved and deserve further study. First, is there a better way to solve of large scale in dynamic environment? On the one hand, in a typical DSP, there will be millions of constraints shared by similar number of ads. Each of the constraints deserves a , which makes the vector very large. On the other hand, billions of impressions are broadcast by AdX every day and bid by hundreds of DSPs simultaneously. The bidding strategies are interactively adjusted by DSPs and the inventories are frequently updated by advertisers, which makes the bidding landscape unstable. Both properties make the hard to solve.
Second, how to construct and index logical ads automatically in massive ads applications, balancing latency and performance? It’s obvious that both deciding and training processes share the same ad evaluation and maximum determination style, which makes their computational complexities linearly related with the number of candidate logical ads. At one extreme, each ad is represented by exactly one logical ad, and the consequent latency is unacceptable. At the other extreme, all ads are represented by the only logical ad, while the performance might be seriously degenerated. A proper compromise combined with efficient indexing tricks will accelerate both processes by orders of magnitude.
Third, how to optimally break ties when they are common and critical? Take an imaginary scenario for example. DSP is willing to maximize its revenue in P4P mode. There are two identical ads with the same CPP and PPI, but they are targeted to overlapped sets of impressions and subjected to different budget constraints. In this circumstance, resolution of impressions and ads is extremely low and ties are very prevalent. To tackle the tie breaking problem, we might try randomized softmax instead of hardmax during ad selection. However, the theoretical soundness and practical effectiveness of this tie breaking strategy are to be verified.
Appendix A Strong Duality Proof
Here we give the detailed proof of the strong duality. We first prove that P D by dualizing P.
Next, we prove that D = DD by dualizing DD.