Display advertising: Estimating conversion probability efficiently

# Display advertising: Estimating conversion probability efficiently

Abdollah Safari Rachel MacKay Altman and Thomas M. Loughin Department of Statistics and Actuarial Science
Simon Fraser University
8888 University Drive
Burnaby, BC V5A 1S6
CANADA
###### Abstract

The goal of online display advertising is to entice users to “convert” (i.e., take a pre-defined action such as making a purchase) after clicking on the ad. An important measure of the value of an ad is the probability of conversion. The focus of this paper is the development of a computationally efficient, accurate, and precise estimator of conversion probability. The challenges associated with this estimation problem are the delays in observing conversions and the size of the data set (both number of observations and number of predictors). Two models have previously been considered as a basis for estimation: A logistic regression model and a joint model for observed conversion statuses and delay times. Fitting the former is simple, but ignoring the delays in conversion leads to an under-estimate of conversion probability. On the other hand, the latter is less biased but computationally expensive to fit. Our proposed estimator is a compromise between these two estimators. We apply our results to a data set from Criteo, a commerce marketing company that personalizes online display advertisements for users.

###### keywords:
Display advertising; Conversion probability; Survival; Bias-adjustment
journal: Computational Statistics and Data Analysis

## 1 Introduction

Display advertising is a relatively new type of online advertisement where advertisers pay publishers to present their ads (also known as impressions) on different webpages. Depending on the purpose of the advertisement, different payment options can be used. These options include cost per impression, where the advertisers pay the publishers to display their ads (whether the user clicks the ad or not), cost per click, where the advertisers pays for an impression only if a user clicks on it, and cost per action (CPA), where advertisers pay only if the user takes a predefined action (conversion) after clicking the ad, such as purchasing a product or service Muthuk (); Chapelle (2014).

For profitability, the CPA option requires that publishers make a “good” match between advertisers and customers. In particular, they should display ads with high expected earnings per impression, i.e., ads where the customer’s probability of clicking and the subsequent probability of the click’s converting are high. McAfee (). The entire process of ad selection needs to be completed in the time between when a user opens a page and when the page is fully rendered. Thus, the publisher has a very short time in which to choose which ad(s) to display to the user. Great progress has been made predicting whether a user will click on an impression in the context of search advertising (see for example Hillard et al. (2010), or McMahan et al. (2013)) and display advertising (see for example Chapelle et al. (2014), or Agarwal et al. (2010)). However, little is known about estimating the probability of conversion. For instance, Rosales et al. (2012) perform an experimental analysis (on a private Yahoo data set) to show the advantage of conversion probability over the click probability as a measure of profitability in display advertising, and point out the lack of inference about this new measurement in the literature.

The main issue in conversion probability estimation is the delay between the click and the eventual conversion status of the click (called the conversion delay), which can vary from a few milliseconds to months. In other words, eventual conversion status (converted or unconverted) is unknown for clicks where the conversion delay is censored. Chapelle (2014) proposed using the maximum likelihood estimator (MLE) of the conversion probability based on a delay feedback model (DFM), a mixture model for observed conversion status that depends on the delay distribution. Although his estimator is accurate when the model is correctly specified, his approach is not computationally efficient. Efficiency is critical in this Big Data setting where publishers need to re-estimate conversion probability rapidly and frequently as time progresses and more data accrue (i.e., real-time updating). In addition, the performance of his estimator is unknown when the delay distribution is not exponential.

Our goal in this paper is to develop a method for estimating probability of conversion with high accuracy and in a computationally efficient manner. In particular, we introduce a new estimator based on the logistic regression model that (wrongly) treats all conversion statuses as known, and then reduce the bias of this estimator through a novel application of the Kullback-Leibler distance. We evaluate the accuracy and computational efficiency of this new estimator compared to those of Chapelle’s estimator. In addition, we study the performance of these estimators when the delay distribution is misspecified.

In §2 we define some notation and present the DFM of Chapelle (2014). In §3, we introduce our estimator along with an algorithm to evaluate it efficiently for a given data set. Section §4 presents an application of our results to a data set released by Criteo Chapelle (2014), and §5 describes a simulation study that illustrates the accuracy, precision, and computational efficiency of the estimators.

## 2 Model specification

In this section, we describe the DFM developed by Chapelle (2014). The assumptions of the DFM (and of our methods that follow) are: i. the true conversion probability is fixed over time, ii. the predictors don’t depend on time, iii. a converted click can never become unconverted, iv. an unconverted click with delay time less than a fixed time period (the conversion window) can convert, v. an unconverted click with delay time greater than the conversion window cannot convert (in other words, an unconverted click can convert any time within the conversion window, ), and vi. is long enough that only a negligible proportion of conversions occur outside this window.

Let the data collection start at time . Label clicks sequentially in time as . Let be the time of click – treated as non-random for the purposes of this paper. Throughout, we use bold letters to denote vectors. For instance, is a vector of covariates associated with click , e.g., attributes of the user and/or origin website. We define so as to include an intercept. Define to be the eventual conversion status indicator for click , i.e. if click ever converts and otherwise. Let be the time at conversion if ; if , then fix . Then the delay time is defined as (so that if ). Given and , let and be the conditional pdf and cdf, respectively, of .

Now suppose that at a given moment we wish to estimate the conversion probability of a click with covariates . Define to be the age of click . Since we treat as non-random, is non-random as well.

For subsequent derivations, we consider a given fixed time and suppress in our notation for convenience. At this time, say clicks have accumulated. Let be the current conversion status indicator of click , i.e., if click converted prior to time and otherwise. Note that is observed prior to if , and is greater than or equal to (right censored) if .

To the best of our knowledge, the DFM is the only model for conversion probability in the literature that incorporates conversion delays, i.e., that is based on the bivariate response for each click, . In this model, is assumed to follow a logistic regression model with . Given and , delay times are assumed to follow an exponential distribution with rate . The log-likelihood function of the DFM is then

 \l(βc,βd|y,d)= − ∑i:yi=1{log[P(Ci=1|xi)]+log(λi)−λidi} − ∑i:yi=0log[1−P(Ci=1|xi)+P(Ci=1|xi)exp(−λiai)]

For later derivations in this paper, we will require a different form for . Specifically, we define as

 Zi(ai)≡Zi:=min(Di,ai), (1)

so that . Note that is a function of a single random variable, . We can define an equivalence relationship between and as

 Zi

Then, the likelihood function of the DFM can be rewritten in terms of the ’s (realizations of the ’s) as

 Lg(βc|z)=∏i(pih(zi))I(zi

(See Appendix A for the proof.)

## 3 Estimation

As discussed by Chapelle (2014), the likelihood function in (3) is non-convex with no closed form for the MLE. Therefore, its optimization is very slow. For this reason, we consider alternative estimators in this section.

### 3.1 Naive estimator

A simple (but misspecified model) for observed conversion status is the logistic regression model where the current conversion statuses of the clicks are treated as their eventual conversion statuses. In other words, conversion delay time (and the possibility that unconverted clicks with age less than could convert) are ignored. Chapelle (2014) calls this model the “naive model”. The likelihood function of this model is

 Lf(α|z) = f(z|α) = n∏i=1f(zi|α) = n∏i=1[θI(zi

where is the conversion probability of the click and is a vector of regression coefficients. The likelihood function of the naive model is convex and computationally efficient to optimize. However, the MLE of is biased low for the true probability of conversion, since some unconverted clicks could convert later.

### 3.2 Bias - adjusted estimator

In this section, we introduce a new estimator to adjust for the bias in the naive estimator. We use the Kullback-Leibler information criterion (KLIC) approach White (1982). Suppose that is the true data-generating distribution, but that is the assumed model. Then the KLIC can be computed as follows:

 KLIC(g:f ; α,βc) = Eg(ln[g(z|βc)f(z|α)]) = Eg(ln[g(z|βc)])−Eg(ln[f(z|α)]) = Eg(ln[g(z|βc)])−Eg(ln[∏if(zi|α)]) = Eg(ln[g(z|βc)])−∑iEg(ln[f(zi|α)]) = Eg(ln[g(z|βc)]) − ∑i[piln(θi)Hi(ai)+ln(1−θi)(1−Hi(ai)pi)]

White (1982) shows that the MLE of the parameters in the misspecified model is consistent for the minimizer of the KLIC. We use his results to adjust the naive estimator and remove its asymptotic bias relative to the true model. In other words, we assume that the true model is (3) and treat the parameters of the true model, , as known. Then we minimize the KLIC with respect to the parameters of the misspecified model, , resulting in estimating equations that depend on both and the unknown KLIC minimizer, . We then solve for .

The details are as follows. First, we have

 ∂ KLIC(g|f ; α)∂αj∣∣ ∣∣˜α=0 (5) ⇒ ∑ipiHi(ai)xi,j=∑ixi,j˜θi ,  j=1,…,k (6)

where , and is the number of regression coefficients. Treat and as known for the moment. Note that the equations in (6) are algebraically equivalent to the weighted quasi-score equations associated with a logistic regression model (with taking the place of the usual response variable). Thus, they can be solved efficiently for .

In the usual case where and are unknown, we plug in consistent estimates. In particular, we compute , the MLE of from (3.1), which is consistent for (by White’s theorem).

To estimate , given the family of distributions of the delay (e.g., exponential), we can find the MLE of the delay distribution parameters. However, since the censoring rate could be very high, especially when is small, this MLE can be quite biased (see, e.g., Shen and Yang (2015), Wan (2015), and Hirose (1999)). As a remedy, we can adjust for the delay rate estimator bias as well. Firth (1993) proposes a general approach to bias reduction using on a penalized score function. Pettitt et al. (1998) apply Firth’s approach to obtain the penalized likelihood when the responses are exponentially distributed and possibly censored. In our notation, this penalized likelihood is

 L∗(λ|z)=∏i∈S∗(λi)−2hi(zi)Hi(ai), (7)

where as before,

 hi(zi) = {λiexp(−λizi),  zi

, and . Note that since in the application, we don’t know the eventual conversion status of clicks (especially for recent clicks), we approximate by , which the approximation improves as time goes on. In other words, we exclude only unconverted clicks with longer than in (7) since we assume they never convert, and thus don’t contribute information about the delay distribution.

When the delays follow a Weibull distribution, we cannot obtain a closed form for the Firth (1993) penalized likelihood function. However, if we make the usual assumption that only the scale parameter of the Weibull distribution depends on the covariates, we can obtain the Weibull penalized likelihood function for a fixed shape parameter as

 L∗w(γ,ν|z)=∏i∈S∗(νγi)2hwi(zi)Hwi(zi), (9)

where and are the pdf and cdf, respectively, of the Weibull distribution with scale parameter and shape parameter . We suggest first estimating the shape parameter, , by its MLE, and then treating it as a known parameter in (9).

We call the convergence probability estimator based on exponential and Weibull distributions for the delays the E-bias-adjusted and W-bias-adjusted estimators, respectively.

To summarize, we obtain our bias-adjusted estimate of as follows:

1. Compute the MLE of based on the naive model (3.1).

2. Compute the maximum penalized likelihood estimates of the delay distribution parameters using Firth’s approach (i.e. (7) if the delay distribution is exponential or (9) if the delay distribution is Weibull).

3. Compute the bias-adjusted estimate by solving the equations in (6), substituting the estimates of and the delay distribution for their true values.

Standard GLM software can be used to compute the estimates in Steps 1 and 3, while packages such as brglm in R can be used to compute the estimates in Step 2. Thus, an advantage of the bias-adjusted estimator is that it can be computed efficiently and easily.

The similarity between (6) and the weighted quasi-score equations associated with a logistic regression model suggests that a SE for (or , the estimator of ) could be efficiently computed as a function of the derivative of the left side of (5). We explore the validity of this SE in §5.

## 4 Application

In this section, we apply our results from the previous sections to a publicly available data set released by Criteo, a commerce marketing company that connects publishers and advertisers111The data set is available at http://research.criteo.com/outreach/. The data concern a collection of clicks that accrued over a period of two months, with days Chapelle (2014). The eventual conversion statuses of the clicks are also included in the data set.

In this data set, each row corresponds to a display ad chosen by Criteo and subsequently clicked by the user. The first two columns are click time and conversion time, where the latter is blank for unconverted clicks. The data set has 17 covariates (8 integer-valued and 9 categorical variables). Except for campaign ID (one of the categorical variables), the definitions of the covariates are undisclosed (due to confidentiality issues).

To evaluate the performance of our bias-adjusted estimators (i.e., E-bias-adjusted and W-bias-adjusted estimators) in this section, we investigate their bias, SE, and computation time relative to three other estimators: the naive estimator, the oracle estimator (the MLE of the logistic regression model based on the eventual conversion statuses of the clicks), and the maximizer of the DFM when the distribution of the delays is treated as exponential (Chapelle’s estimator). Note that the oracle estimate is not obtainable in practice, where at any time , the delay distribution parameters will be unknown. However, we include this estimator as a “gold standard” to which we compare the other estimators.

Following Chapelle (2014), we use log-loss to measure the bias of each estimator. Log-loss is a measure of the distance between parameter estimates and the true quantity of interest. In our case, log-loss is algebraically equivalent to the negative log-likelihood (NLL) of the logistic regression model (treating eventual conversion statuses as the true quantities of interest).

Estimating the parameters in the DFM can be very slow, depending on the number of covariates in the model. For instance, say we choose a subset of the covariates in the full data set such that we have 300 covariate coefficients (corresponding to the continuous covariates and the dummy variables that represent the categorical covariates) in the model. Obtaining the MLE of the DFM is approximately 500 times slower than computing the bias-adjusted estimator.

To keep the parameter estimation time feasible in our data analysis, we first fit a logistic regression model with all possible covariates to the eventual conversion statuses of the clicks, using a LASSO penalty term with regularization parameter large enough that approximately 100 covariates appear in the fitted model. We then use only the selected subset of the covariates in all of our analysis in this section. Note that the purpose of this variable selection is solely to facilitate estimation; in this paper, we are interested in the relative performances of the estimators given a set of covariates, not variable selection, per se. Therefore, we perform this variable selection only once.

Since the data set is huge, we use data splitting and use only a random sample (approximately ) of the data set as our training set. We then obtain our estimates on the training set and compute NLL on the rest of the data set (our test set). We repeat this procedure 40 times and report the average of the NLLs.

Figure 1 shows the average (over the 40 random splits of the data) NLL of the estimators at different time steps. The DFM estimator has convergence problems, especially when the number of known conversions is not large relative to the number of parameters (i.e., over the first two weeks of the observation period). After excluding the problematic estimates, the DFM estimator still behaves poorly (top plot of figure 1). To illustrate the differences among the other estimators better, we omit the DFM estimator from the plot (bottom plot of figure 1). The E-bias-adjusted estimator appears to outperform the other estimators. Specifically, the E-bias-adjusted estimator appears to outperform the W-bias-adjusted in the first month, and they perform similarly in the second month. In addition, as we obtain more new clicks and more information about the old clicks, the NLL of the estimators appears to decrease and get closer to that of the oracle estimator.

Table 1 shows the average computation time (in seconds) of the estimates based on repeated data splitting when we have approximately 100 covariates in the model. The computation time of the DFM estimator is about 21 times longer than that of the E-bias-adjusted estimate, and the computation time of W-bias-adjusted estimator is about longer than that of the E-bias-adjusted estimator.

As a final note, the distribution of the observed delays of converted clicks looks closer to Weibull than exponential (see online material and also, e.g., Ji et. al. (2016)) and thus the assumption of exponential delay times in the DFM is unreasonable. However, the MLE (i.e., of the maximizer of the model based on a Weibull distribution for the delays) has serious convergence issues and very long computation time. Thus we did not study the performance of this estimator in detail.

## 5 Accuracy, precision and computational efficiency of the bias-adjusted estimators

In this section, we use a simulation study to evaluate the performance of our estimators. We investigate their bias, SE, and computation time. Besides the estimators mentioned in §4, since we know the delay distribution in our simulation study, we consider the true-bias-adjusted estimator (the bias-adjusted estimator computed using the true cdf of the delay distribution, so that the weights in (6) are known). The true-bias-adjusted estimator helps us to gauge how much we lose by estimating the delay distribution parameters with (7) (or (9)).

We suggest using bias of the estimated probabilities as a measure of error in the simulation study. Average bias at time is defined as for an estimator of , . Recall that the ’s vary according to covariates; average bias can be interpreted as an estimate of the marginal bias of the estimator of probability of conversion (in contrast with , which represents the bias of conditional on ).

### 5.1 Simulation study design

Since our focus in this paper is display advertising, we use a real data set (the Criteo data described in Section 4) to inform the design of our simulation study. Specifically, we pick approximately 8500 clicks () from a campaign with a large number of clicks. For this campaign, the average conversion probability was moderate (). We use the covariates values given in the Criteo data set by Chapelle (2014) and keep these values the same across runs. Since for the selected clicks some covariates have only one value (or have only a few values that differ from the mode), we use only three of the categorical variables (resulting in 16 dummy variables) and four of the integer-valued covariates in the original data set.

We conduct two simulation studies. In the first, we generate exponential-distributed conversion delays. In the second, we generate Weibull-distributed conversion delays. The parameters of these distributions are set to their estimated values based on the observed delays in the chosen campaign (using only converted clicks). In other words, the estimated parameters based on the Criteo data become the true parameter values in the simulation study. Similarly, we estimate the regression coefficients of the conversion probability model by fitting a logistic regression to the final conversion status of the clicks in the data set. We then use these estimated coefficients as the true coefficients in the simulation studies (see online material for the covariates and coefficients we use).

We consider two factors affecting the performance of the conversion probability estimators: average conversion probability and average delay time, where average means across all clicks of the campaign. We choose the levels of the factors based on the range of the conversion probabilities and delays in the real data set; see table 2 for details. To keep the simulation study feasible and the number of parameters in the model manageable, we assume no interaction among the covariates – in particular no interaction between campaign and the other covariates. Under this assumption, we can vary the factors of interest (average conversion probability and average delay time) simply by varying the values of the intercepts and of the campaign effects in both the delay and conversion models while keeping the other covariate coefficients (and the shape parameter, in the Weibull case) fixed.

To create a realistic scenario in our simulation studies, we track the clicks since start of the data collection at , and evaluate the estimators at 17 different time steps over a two month period (with time steps spaced far enough apart such that approximately equal numbers of clicks occur in each interval). At each time step , we consider only clicks that occurred by . Similarly, we treat a click as converted only if we observe its conversion by and its age is less than days. Otherwise, we treat it as unconverted.

### 5.2 Study 1

We first consider the case where the conversion delays follow an exponential distribution. In other words, we generate data from Chapelle’s DFM. Thus, the MLE of the DFM and the E-bias-adjusted estimator are based on the correct model.

Figure 2 shows the average bias of the estimators over time when both factors (average conversion probability and average delay) are at their medium level. As expected, since the DFM is the true model in this study, its maximizer (the MLE) outperforms all other estimators (except the oracle estimator). In particular, it appears to be less biased than the E-bias-adjusted estimator (especially over the first month). That said, the bias of both estimators seems quite small in the second month (less than 0.007 on average). The true-E-bias-adjusted estimator appears to perform slightly better than the E-bias-adjusted estimator (especially over the first month), and the naive estimator appears to remain biased even after two months by approximately 0.05. The overall trend in bias is similar when we use other levels of the factors given in table 2. As expected, when the average delay is at its low level, the accuracy of the naive estimator appears to be almost as high as the other estimators. Moreover, the MLE of the DFM behaves poorly when the average delay is high and average conversion probability is low (see online materials).

### 5.3 Study 2

In this study, we consider a Weibull distribution for the delays.

We compute all the estimators (including the W-bias-adjusted estimator) over time as in study 1. Note that in this case, the maximizer of the DFM and the E-bias-adjusted estimator are both based on the (misspecified) exponential distribution for the delay times. Thus, the former is no longer the MLE and we call it DFM estimator in this study. We do not study the MLE (i.e., the maximizer of the DFM modified to allow a Weibull distribution for the delays) due to convergence issues and very long computation times. In addition, Since the true-E-bias-adjusted estimator hasn’t been derived for this study, we don’t consider the estimator here.

Figure 3 shows the average bias of the E-bias-adjusted and W-bias-adjusted estimators, along with that of the oracle and naive estimators over time when both factors (average conversion probability and average delay) are at their medium level. Over the first three weeks, the E-bias-adjusted estimator appears to slightly outperform the W-bias-adjusted estimator. However, both estimators perform similarly after the third week. In addition, the computation time of the W-bias-adjusted is approximately more than that of the E-bias-adjusted estimator. Therefore, we consider only the E-bias-adjusted estimator for the remainder of this paper.

Figure 4 shows the bias of the estimators over time when both factors, average conversion probability and average delay, are at their medium level. In contrast with study 1, the E-bias-adjusted estimator appears to outperform the DFM estimator. In particular, as time goes on, the bias of the E-bias-adjusted estimator nearly disappears, whereas the bias of the DFM estimator does not. In addition, the bias of the E-bias-adjusted estimator shows that the maximum penalized likelihood estimator of the parameters in the delay time model (see (7)) performs well even when the delay distribution is misspecified, especially for . The trend in bias is similar for other levels of the factors given in table 2. Again, when the delay mean is in its low level, the accuracy of the naive estimator is almost as high as the other estimators. Similar to study 1, the DFM estimator behaves poorly when the average delay is high and average conversion probability is low (see online materials).

### 5.4 Coverage probability of the bias-adjusted estimator

As mentioned in §3.2, we can efficiently compute a SE for as a function of the derivative of the left side of (5). In this section, we study the validity of our SE.

Figure 5 shows the average coverage probability (CP) associated with 95% confidence intervals for conversion probability based on the E-bias-adjusted estimator over time when the delays follow exponential or Weibull distributions. In the first month, the average CP is below the nominal level (approximately ). However, in the second month, the average CP is more than . To show the closeness of the average CP to the nominal value of at each time point more carefully, we add the non-rejection region for the score test of whether CP differs from 0.95. This region is defined as , where is the number of replicates. The CP when the delays are exponential-distributed (so that the E-bias-adjusted estimator is based on the correct model) is not significantly different the nominal coverage level at the last 4 time steps. In contrast, when the delays are Weibull-distributed, the CP differs significantly from except at the last time step. In other words, CP is lower when the E-bias-adjusted estimator is based on a misspecified model, but our results suggests that it converges to .

To compute the SE (and CP) associated with the DFM estimator, we could compute the Hessian matrix of the estimates for each replicate. However, this matrix was non-positive-definite for most replicates. For this reason, we omit results concerning the DFM estimator. The results for the other runs are similar (see online material).

### 5.5 Computation time

Given the very short time available for choosing an ad and publishing it on the host website – and the huge number of ad requests and new campaigns at any time – publishers need to refit the model and obtain the conversion probabilities frequently. Therefore, computation time is a critical issue in display advertising. Table 3 shows the average computation times of the estimates along with their sample standard deviation (SSD) when the true delay distribution is Weibull (Study 2) and the factors are at their medium level. In particular, the computation time of the DFM estimator is more than 5 times that of the E-bias-adjusted estimate. For the levels of the factors that we considered, this ratio can be between 4 and 8.

The computation time of the estimates is similar for Study 1.

## 6 Discussion

In this paper, we developed a method for estimating probability of conversion efficiently and with high accuracy. In particular, we introduced a bias-adjusted estimator based on a simple (misspecified) logistic model, and evaluated its accuracy and computational efficiency.

As an alternative, we could obtain the MLE and bias-adjusted estimators by assuming a Weibull distribution for the delays, which would allow greater flexibility in the model and would, in particular, provide a better description of the the delays in the Criteo data (see online material). However, the MLE of this model suffers from both convergence issues and lengthy computation times. Moreover, the W-bias-adjusted estimator is not as consistent and efficient as the E-bias-adjusted estimator. Therefore, we recommend the E-bias-adjusted estimator even when the delays follow a Weibull distribution.

Since clicks have different associated true probability of conversion, the estimators of these probabilities (and their bias) have different variances. When computing the average of bias, one may account for these differences by weighting each bias by its true SD, especially when the range of the true probabilities is large. In our case, there was no difference between behaviour of the estimators in bias and weighted bias.

To reduce overall computation time in the example, we used data splitting to obtain the estimates in our application. Comparing the performance of the estimators over the entire data set could be another interesting problem.

Our estimation method incorporates data only from users’ final click on an ad. In other words, we ignore users’ previous (unconverted) clicks on the same ad. Interesting future work could be a model that can capture the information in the historical unconverted clicks of the users.

## Appendix A Likelihood of Z

To prove (3), we first derive the cdf of as

 GZi(zi) = P(Zi≤zi) = P(Zi≤zi|Ci=1)P(Ci=1) + P(Zi≤zi|Ci=0)P(Ci=0) = ⎧⎪⎨⎪⎩0if zi≤0Hi(zi)piif 0

where . Therefore, the likelihood function is

 Lg(βc|z) = g(z) = ∏ig(zi|ai) = ∏i(pih(zi))I(zi

## References

• (1) Muthukrishnan, S. (2009). Ad exchange: Research issues. In proceeding of the 5th International Workshop on Internet and Network Economics.
• Chapelle (2014) Chapelle, O. (2014). Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, USA., ACM. 1 1097–1105.
• (3) McAfee, R. P. (2011). The design of advertising exchanges. Review of Industrial Organization. 39(3) 169–185.
• White (1982) White, H. (1982). Maximum Likelihood Estimation of Misspecified Models. Econometrica. 50(1) , 1–25.
• Firth (1993) Firth, D. (1993). Bias reduction of maximum likelihood estimated. Biometrika. 80 , 27–38.
• Pettitt et al. (1998) Pettitt, A. N., Kelly, J. M. & Gao, J. T. (1998). BIAS CORRECTION FOR CENSORED DATA WITH EXPONENTIAL LIFETIMES. Statistica Sinica. 8 , 941–963.
• Rosales et al. (2012) Rosales, R., Cheng, H. & Manavoglu, E. (2012). Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. The fifth ACM international conference on Web search and data mining. , 293–302.
• McMahan et al. (2013) McMahan, H. B., Holt, G., Sculley, D., et al. (2013). Ad click prediction: a view from the trenches. The 19th ACM SIGKDD international conference on Knowledge discovery and data mining. , 1222–1230.
• Hillard et al. (2010) Hillard, D., Schroedl, S., Manavoglu, E., et al. (2010). Improving ad relevance in sponsored search. The 3rd ACM international conference on Web search and data mining. , 361–370.
• Agarwal et al. (2010) Agarwal, D., Agrawal, R., Khanna, R., et al. (2010). Estimating rates of rare events with multiple hierarchies through scalable log-linear models. The 16th ACM SIGKDD international conference on Knowledge discovery and data mining. , 213–222.
• Chapelle et al. (2014) Chapelle, O., Manavoglu, E. & Rosales, R. (2014). Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology. 5.
• Ji et. al. (2016) Ji, Wendi, Wang, Xiaoling, & Zhang, Dell (2016). A probabilistic multi-touch attribution model for online advertising. The 25th ACM International on Conference on Information and Knowledge Management, 2016 1373-1382.
• Wan (2015) Wan, Xiaomin, Peng, Liubao, & Li, Yuanjian (2015). A review and comparison of methods for recreating individual patient data from published Kaplan-Meier survival curves for economic evaluations: A simulation study. Hills RK, ed. PLoS ONE. 10(3).
• Shen and Yang (2015) Shen, Y., & Yang, ZL (2015). Bias-correction for Weibull common shape estimation. Journal of Statistical Computation and simulation. 85(15), 3017–3046.
• Hirose (1999) Hirose, H. (1999). Bias correction for the maximum likelihood estimates in the two-parameter Weibull distribution. IEEE Transactions on Dielectrics and Electrical Insulation. 6, 66–69.
Comments 0
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters

Loading ...
1059

You are asking your first question!
How to quickly get a good answer:
• Keep your question short and to the point
• Check for grammar or spelling errors.
• Phrase it like a question
Test
Test description