Toward Controlling Discrimination in Online Ad Auctions111The code for the simulations is available at https://github.com/AnayMehrotra/Fair-Online-Advertising.
Online advertising platforms are thriving due to the customizable audiences they offer advertisers. However, recent studies show that advertisements can be discriminatory with respect to the gender or race of the audience that sees the ad, and may inadvertently cross ethical and/or legal boundaries. To prevent this, we propose a constrained ad auction framework that maximizes the platformâs revenue conditioned on ensuring that the audience seeing an advertiserâs ad is distributed appropriately across sensitive types such as gender or race. Building upon Myersonâs classic work, we first present an optimal auction mechanism for a large class of fairness constraints. Finding the parameters of this optimal auction, however, turns out to be a non-convex problem. We show that this non-convex problem can be reformulated as a more structured non-convex problem with no saddle points or local-maxima; this allows us to develop a gradient-descent-based algorithm to solve it. Our empirical results on the A1 Yahoo! dataset demonstrate that our algorithm can obtain uniform coverage across different user types for each advertiser at a minor loss to the revenue of the platform, and a small change to the size of the audience each advertiser reaches.
- 1 Introduction
- 2 Our Model
- 3 Other Related Work
- 4 Theoretical Results
- 5 Our Algorithm
- 6 Empirical Study
- 7 Proofs
- 8 Limitations and Future Work
- 9 Conclusion
- A A Simple Example of Competitive Spillover
- B Revenue is Non-Concave in
- C Why Is the TV-Distance Small?
Online advertisements are the main source of revenue for social-networking sites and search engines such as Google . Ad exchange platforms allow advertisers to select the target audience for their ad by specifying desired user demographics, interests and browsing histories . Every time a user loads a webpage or enters a search term, bids are collected from relevant advertisers , and an auction is conducted to determine which ad is shown, and how much the advertiser is charged [35, 55, 49]. As it is not practical for advertisers to place individual bids for every user, the advertiser instead gives some high-level preferences about their budget and target audience, and the platform places bids on their behalf .
More formally, let there be advertisers, and types of users. Each advertiser specifies their target demographic, average bid, and budget to the platform, which then decides a distribution, , of bids of advertiser for user type . These distributions represent the value of the user to the advertiser, and ensure that the advertiser only bids for users in their target demographic, with the expected bid not exceeding the amount specified by the advertiser . At each time step, a user visits a web page (e.g., Facebook or Twitter), the user’s type is observed, and a bid is drawn from , for each advertiser . Receiving these bids as input, the mechanism decides an allocation and price for the advertisement slot. Several Ad Exchanges including Google Ads  and Facebook Ads , use variants of second price auction mechanism 222If the auction sells a single item, then Myerson’s mechanism  reduces to a second price auction mechanism with a reserve price ..
Overall, such targeted advertising leads to higher utilities for the advertisers who show content to relevant audiences, for the users who view related advertisements, and for the platform which can benefit from selling targeted advertisements [22, 54, 23, 26]. However, targeted advertising can also lead to discriminatory practices. For instance, searches with “black-sounding” names were much more likely to be shown ads suggestive of an arrest record . Another study found that women were shown fewer advertisements for high paying jobs than men with similar profiles . In fact, recent experiments demonstrate that ads can be inadvertently discriminatory;  found that STEM job ads, specifically designed to be unbiased by the advertisers, were shown to more men than women across all major platforms (Facebook Ads, Google Ads, Instagram and Twitter). On Facebook, a platform with women  the advertisement was shown to more men than women.  find that this could be a result of competitive spillovers among advertisers, and is neither a pure reflection of pre-existing cultural bias, nor a result of user input to the algorithm. Such (likely inadvertent) discrimination has led to two recent cases filed against Facebook, which will potentially lead to civil lawsuits alleging employment and housing discrimination [30, 47, 38, 5].
To gain intuition on how inadvertent discrimination could happen, consider the setting in which there are two advertisers with similar bids/budgets, but one advertiser specifically targets women (which is allowed for certain types of ads, e.g., related to clothing), while the second advertiser does not target based on gender (e.g., because they are advertising a job). The first advertiser creates an imbalance on the platform by taking up ad slots for women and, as a consequence, the second advertiser ends up advertising to disproportionately fewer women and is inadvertently discriminatory. Currently, online advertising platforms have no mechanism to check this type of discrimination. In fact, the only way around this would be for the advertiser to set up separate campaigns for different user types and ensure that each campaign reached a similar number of the sub-target audience. However, online platforms often reject such campaigns in the apprehension of discriminatory practices [34, 17].
Our main contribution is an optimization-based framework which maximizes the revenue of the platform subject to satisfying constraints that prevent the emergence of inadvertent discrimination as described above. The constraints can be formulated as any one of a wide class of “group fairness” constraints as presented in , which constrains the distribution of an adâs audience across the sensitive types to ensure proportionality across types as defined by the platform. The framework allows for intersectionality, allowing constraints across multiple sensitive attributes (e.g., gender, race, geography and economic class) and allows for restricting different advertisers to different constraints.
Formally, building on Myerson’s seminal work , we characterize the truthful revenue-optimal mechanism which satisfies the given constraints (Theorem 4.1). The user types, as defined by their sensitive attributes, are taken as input along with the type-specific bid distributions for each advertiser, and we assume that bids are drawn from these distributions independently. Our mechanism is parameterized by constant “shifts” which it applies to bids for each advertiser-type pair. Finding the parameters of this optimal mechanism, however, is a non-convex optimization problem, both in the objective and the constraints. Towards solving this, we first propose a novel reformulation of the objective as a composition of a convex function constrained on a polytope, and an unconstrained non-convex function (Theorem 4.2). Interestingly, the non-convex function is reasonably well behaved, with no saddle-points or local-maxima. This allows us to develop a gradient descent based scheme (Algorithm 1) to solve the reformulated program, which under mild assumptions has a fast convergence rate of (Theorem 4.3).
We evaluate our approach empirically by studying the effect of the constraints on the revenue of the platform and the advertisers using the Yahoo! Search Marketing Advertising Bidding Data . We find that our mechanism can obtain uniform coverage across different user types for each advertiser while losing less than 5% of the revenue (Figure 3). Further, we observe that the total-variation distance between the fair and unconstrained distributions of total advertisements an advertiser shows on the platform is less than (Figure 4).
To the best of our knowledge, we are the first to give a framework to prevent inadvertent discrimination in online ad auctions.
2 Our Model
A mechanism is defined by its allocation rule , and its payment rule . Truthful mechanisms are those in which revealing the true valuation is optimal for all bidders. Further, the can be shown that the allocation rule , of any truthful mechanism must be monotone in for all .  proved for any mechanism there exists a truthful mechanism such that offers the same revenue to the seller and the same utility to each bidder as . As such, we restrict ourselves to truthful mechanisms. Furthermore, it is a well known fact  that for any truthful mechanism its payment rule , is uniquely defined by its allocation rule . Hence, for any truthful mechanism our only concern is the allocation rule .
Let be the distribution of valuation of a bidder, be its probability density function, and be its cumulative density function, then we define the virtual valuation , as . We say is regular if is non-decreasing in . Likewise, we say is strictly regular if is strictly increasing in .
Myerson’s Optimal Mechanism.
Myerson’s mechanism is defined as the VCG mechanism [15, 29, 52] where the virtual valuation , is submitted as the bid for each bidder . If the valuations , and therefore, the virtual valuations are independent, then for any truthful mechanism the virtual surplus , is equal to the revenue in expectation over the bids. Since VCG is surplus maximizing, if Myerson’s mechanism is truthful then it maximizes the revenue. It can be shown that if the bids have a regular distribution, then Myerson’s mechanism is truthful, and therefore, revenue maximizing.
Let be the virtual valuation of advertiser for type , be its probability density function, and be its cumulative density function. We denote the joint virtual valuation of all advertisers for type by , and its joint probability density function by . The types are distributed according to a known distribution . Finally, given a user of type , let a mechanism’s allocation rule be .
2.2 Fairness Constraints
We would like to guarantee that advertisers have a fair coverage across user types. We do so by placing constraints on the coverage of an advertiser. Formally, we define advertiser ’s coverage of type , , as the joint probability that advertiser wins the auction and the user is of type
where is the -th component of . Then, we consider the proportional coverage of the advertiser on each type. Given vectors , , we define -fairness constraints for each advertiser and type , as a lower bound , and an upper bound , on the proportion of users of type the advertiser shows ads to, i.e., we impose the following constraints for all and
2.3 Discussion of Fairness Constraints
Returning to the example presented in the introduction, we can ensure that the advertiser shows % of total ads to women, by choosing a lower bound of for this advertiser on women. More generally, for user types, moderately placed lower bounds and upper bounds ( and ), for some subset of advertisers, ensure this subset has a uniform coverage across all types, while allowing other advertisers to target specific types.
Importantly, while ensuring fairness across multiple types our constraints allow for targeting within any single type. This is vital as the advertiser may not derive the same utility from each user, and could be willing to pay a higher amount for more relevant users in the same type. For example, if the advertiser is displaying job ads, then a user already looking for job opportunities may be of a higher value to the advertiser than one who is not.
For a detailed discussion on how such constraints can encapsulate other popular metrics, such as statistical parity, we refer the reader to .
2.4 Optimization Problem
We would like to develop a mechanism which maximizes the revenue while satisfying the upper and lower bound constraints in Eq. (2). Towards formally stating our problem, we define the revenue of mechanism , with an allocation rule for type as
where and are the -th component of and respectively. Thus, we can express our optimization problem with respect to functions , or as an infinite dimensional optimization problem as follows.
(Infinite-dimensional fair advertising problem). For all user types , find the optimal allocation rule for
In the above problem, we are looking for a collection of optimal continuous functions . To be able to solve this problem, we need – in the least – a finite dimensional formulation of the fair online advertisement problem.
3 Other Related Work
 consider a framework which selects an ad category (e.g., job or housing) every time a user visits the platform. Given fair mechanisms for each category, they construct a fair composition of these mechanisms. However, they do not show how to design fair mechanisms for each category, or study how the composition affects the platform’s ad revenue. Another related problem is to design optimal mechanisms which satisfy contract constraints [25, 7, 41]; these constraints allocate a minimum number of ad spots to advertisers with a contract, and are different from our constraints which control the fraction of each sensitive type the ads are shown to.
Several prior works address the problems of polarization and algorithmic bias, including [24, 10] who control polarization in social-networks and personalized feeds,  who diversify personal feeds, and  who create a diverse and balanced summary of a set of results. In addition, [44, 6, 13] study fair ranking algorithms; these could be used to generate a balanced list of results on job platforms and other search engines. While these works are related to our broad goal of controlling algorithmic bias, their formulation is different since they do not involve a bidding mechanism. Therefore, their solutions cannot be applied to our problem.
4 Theoretical Results
Our first result is structural, and gives a characterization of the optimal solution , to the infinite-dimensional fair advertising problem, in terms of a matrix , making it a finite-dimensional optimization problem with respect to .
(Characterization of an optimal allocation rule). There exists an such that if for all , are strictly regular and independent, then the set of allocation rules , defined below, is optimal for the infinite-dimensional fair advertising problem
Where we randomly breaks ties if any (this is equivalent to the allocation rule of the VCG mechanism).
We present the proof of Theorem 4.1 in Section 7.1. In the proof, we analyze the dual of the infinite-dimensional fair advertising problem. We reduce the dual problem to one lagrangian variable, by fixing the lagrangian variables corresponding lower bound (5) and upper bound (6) constraints to their optimal values. The resulting problem turns out to be the dual of the unconstrained revenue maximizing problem, for which Myerson’s mechanism is the optimal solution. We interpret the fixed lagrangian variables as shifting the original virtual valuations . It then follows that for some shift , the -shifted mechanism (8) is the optimal solution to the infinite-dimensional fair advertising problem.
Now, our task is reduced from finding an optimal allocation rule, to finding an characterizing the optimal allocation rule. Towards this, let us define the revenue, and coverage as functions of
These follow by observing that (8) selects the advertiser with the highest shifted virtual valuation, and then using this allocation rule in Eq. (3) and Eq. (1) respectively. Depending on the nature of the distribution, the gradients and may not be monotone in (e.g., consider the exponential distribution). Therefore, in general neither is a concave, nor is a convex function of (see Section B for a concrete example). Hence, this optimization problem is non-convex both in its objective and in its constraints. We require further insights to solve the problem efficiently.
Towards this, we observe that revenue is a concave function of . Consider two optimal allocation rules obtaining coverages and revenues respectively. If we use the first with probability , we achieve a coverage with revenue . Therefore, the optimal allocation rule achieving has a revenue of at least . This shows that for optimal allocation rules revenue is a concave function of the coverage .
Let , be the maximum revenue of the platform as a function of coverage .333We drop for one and each . This is crucial to calculate (see Remark 5.1). By some abuse of notation we write for instead of using . Consider the following two optimization problems.
(Optimal coverage problem). Find the optimal for,
(Optimal shift problem). Given the target coverage , find the optimal for
Our next result relates the solution of the above two problems with the infinite-dimensional fair advertising problem.
Given a solution to the optimal coverage problem, the solution to the optimal shift problem with , defines an optimal -shifted mechanism (8) for the infinite-dimensional fair advertising problem.
For any adding the all vector, , to does not change the allocation rule in (8). Thus, it suffices to show that for all , there is a unique with , such that .
We can show that for all , there is at-least one such that . In fact, the greedy algorithm which increases all , where and , will find the required .
To prove it is unique consider distinct such that . We can show that . In particular, that for . Now, the uniqueness of follows by contradiction. ∎
The above theorem allows us to find the optimal by solving the optimal coverage and optimal shift problems. First, let us consider the optimal coverage problem. We already know that its objective is concave. We can further observe that its constraints are linear in , and in particular, they define a constraint-polytope . Therefore, it is a convex program, and one approach to solve it is to use gradient-based algorithms.
The problem is that we do not have access to . The key idea is that if we let , then we can calculate by solving the following linear-system,
where is the Jacobian of 444 represents the vectorization operator., with respect to . It turns out that is invertible for all (see Section 5.1), and therefore, the above linear-system has an exact solution.
Now, let us consider the optimal shift problem. Its objective is non-convex (see Figure 8(b)). is a linear combination of for all and . Since is invertible, its rows , are linearly independent, and the gradient is never zero unless we are at the global minimum where . This guarantees that the objective does not have a saddle-point or local-maximum, and that any local-minimum is a global minimum. Using this we can develop an efficient algorithm to solve the optimal coverage problem (Lemma 7.2).
This brings us to our main algorithmic result, which is an algorithm to find the optimal allocation rule for the infinite-dimensional fair advertising problem.
(An algorithm to solve the infinite-dimensional fair advertising problem). There is an algorithm (Algorithm 1) which outputs such that if assumptions (17), (18), (19), and (20) are satisfied, the -shifted mechanism (8) achieves a revenue -close to the optimal for the infinite-dimensional fair advertising problem in
Where the arithmetic calculations in each step are bounded by calculating once and hides factors in and .
Roughly, the above algorithm has a convergence rate of , under the assumptions which we list below.
Assumption (17) guarantees that all advertisers have at least an probability of winning on every type, assumption (18) places lower and upper bounds on the probability density functions of the , assumption (19) guarantees that the probability density functions of the are -Lipschitz continuous, and assumption (20) assumes that the expected is bounded.
We expect Assumptions (17) and (20) to hold in any real-world setting. We can drop the lower bound in Assumption (18) by introducing “jumps” in to avoid ranges where the measure of bids is small. Removing assumption (19) would be an interesting direction for future work.
We inherit the assumption of independent and regular distributions from Myerson. In addition, we require the the distributions of valuations are strictly regular to guarantee that ties between advertisers happen with probability. We can drop this assumption by incorporating a randomized tie-breaking rule which retains fairness. The above allocation rule is monotone and allocates the ad spot to the bidder with the highest shifted valuation for a given user. Thus, it defines a unique truthful mechanism and corresponding payment rule.
5 Our Algorithm
Algorithm 1 performs a projected gradient descent to find the optimal (4). It starts with an initial coverage , and the corresponding shift . At step , it calculates the gradient , by solving the linear-system in Eq. (16). To solve this linear-system, we need to calculate and . This can be done in steps if we have (see Remark 5.3). Therefore, the algorithm requires a “good” approximation of at each step, it maintains this by “updating” the previous approximation using Algorithm 2 to approximately solve the optimal-shift problem (4).
After calculating , it takes a gradient step and projects the current iterate on in time (Section 5.2), where is the fast matrix multiplication coefficient. It takes roughly steps to obtain an -accurate solution, and then returns its current shift We can bound the error introduced by the approximation of at each step by ensuring that Algorithm 2 has sufficient accuracy. In particular, if it is accurate we can prove that Algorithm 1 converges in steps.
Next, we give the details of the projecting on and calculating the gradient .
5.1 Calculating and Bounding
We fix the shift of one advertiser for each type . Let be the Jacobian of the vectorized coverage, , with respect to the vectorized shift, . Then, is a matrix
To obtain , we use the fact that is always invertible (Lemma 5.2). Given for some , we can calculate by solving
Or equivalently by solving the linear-system in Eq. (16).
is invertible iff we fix the shift of one advertiser for each type . Intuitively, if we increase the for all and by the same amount, then remains invariant. This implies that each row of has 0 sum, or that is not invertible.
(Jacobian is invertible). For all , if all advertisers have non-zero coverage for all types , then is invertible.
The coverage remains invariant if the bids of all advertisers are uniformly shifted for any given user type . Therefore, for all we have
Since, increasing the shift , does not increase the coverage for any , we have that
Now, from Equation (24) we have
Further since the -th advertiser has non-zero coverage, i.e., there is non-zero probability that advertiser bids higher than all other advertisers, changing must affect all other advertisers. In other words, for all . Using this we have,
By observing that , on user type , is independent of the , of any user type such that , i.e.,
and using Equation (26), we get that the Jacobian, is strictly diagonally dominant. Now, by the properties of strictly dominant matrices it is invertible. ∎
5.2 Projection on the Constraint Polytope ()
Given any point , by determining the constraints it violates, we can express the projection on the constraint polytope , as a quadratic program with equality constraints. Using this we can construct a projection oracle , which given a point projects it onto in arithmetic operations, where is the fast matrix multiplication coefficient.
6 Empirical Study
We evaluate our approach empirically on the Yahoo! A1 dataset . We vary the strength of the fairness constraint for all advertisers, find an optimal fair mechanism using Algorithm 1 and compare it against the optimal unconstrained (and hence potentially unfair) mechanism , which is given by Myerson . We first consider the impact of the fairness constraints on the revenue of the platform. Let denote the revenue of mechanism . We report the revenue ratio . Note that the revenue of can be at most that of , as it solves a constrained version of the same problem; thus .
We then consider the impact of the fairness constraints on the advertisers. Towards this, we consider the distribution of winners among advertisers in an auction given by and an auction given by . We report the total variation distance between the two distributions, as a measure of how much the winning distribution changes due to the fairness constraints.
Lastly, we consider the fairness of the resultant mechanism . To this end, we measure selection lift () achieved by , . Where , represents perfect fairness among the two user types.
We use the Yahoo! A1 dataset , which contains bids placed by advertisers on the top 1000 keywords on Yahoo! Online Auctions between June 15, 2002 and June 14, 2003. The dataset has 10475 advertisers, and each advertiser places bids on a subset of keywords; there are approximately bids in the dataset.
For each keyword , let be the set of advertisers that bid on it. We infer the distribution of valuation of an advertiser for a keyword by the bids they place on the keyword. In order to retain sufficiently rich valuation profiles for each advertiser, we remove advertisers who place less than 1000 bids on or whose valuations have variance lower than from , and then those who win the auction less than of the time. This retains more than bids.
The actual keywords in the dataset are anonymized; hence, in order to determine whether two keywords and are related, we consider whether they share more that one advertiser, i.e., . This allows us to identify keywords that are related (see Figure 2(b)), and hence for which spillover effects may be present as described in . Drawing that analogy, one can think of each keyword in the pair as a different type of user for which the same advertisers are competing, and the goal would be for the advertiser to win an equal proportion of each user.
There are such pairs. However, we observe that spillover does not affect all keyword pairs (see Figure 2(a)). To test the effect of imposing fairness constrains in a challenging setting, we consider only the auctions which are not already fair; in particular there are keyword pairs which are less than fair.
6.2 Experimental Setup
As we only consider pairs of keywords in this experiment, a lower bound constraint is equivalent to an upper bound constraint . Hence, it suffices to consider lower bound constraints. We set , and vary uniformly from to , i.e., from the completely unconstrained case (which is equivalent to Myerson’s action) to completely constrained case (which requires each advertiser to win each keywords in the pair with exactly the same probability). We report , , and averaged over all auctions after iterations in Figure 3 and Figure 4; error bars represent the standard error of the mean over iterations and 3282 auctions respectively.
Computationally, we could consider more types (). The bottleneck is empirical; whether the dataset contains enough keywords with overlapping advertisers for the experiment to be meaningful. For we get over 1000 such keywords sets, and observe results similar to case, losing less than of the revenue with a TV-distance smaller than 0.05 even for the setting with .
6.3 Empirical Results
Since the auctions are unbalanced to begin with, we expect the selection lift to increase with the fairness constraint.
We observe a growing trend in the selection lift, eventually achieving perfect fairness for .
We do not expect to outperform the optimal unconstrained mechanism.
However, we observe that even in the perfectly balanced setting with our mechanisms lose less than 5% of the revenue.
Advertiser Displacement. Since the auctions are unbalanced to begin with, we expect TV-distance to grow with the fairness constraint. We observe this growing trend in the TV-distance on lowering the risk-difference. Even for zero risk-difference () our mechanisms obtain a TV-distance smaller than . We present a discussion of this result in Section C.
7.1 Proof of Theorem 4.1
Let us introduce three Lagrangian multipliers, a vector , a vector a continuous function , for the lower bound, upper bound, and single item constraints respectively. Then calculating the Lagrangian function we have
The second integral is well defined by from the continuity of and monotonic nature of . In order for the supremum of the Lagrangian over to be bounded, the coefficient of must be non-positive. Therefore we require that for all and
Since and are continuous, we can equivalently require for all and
If this holds, we can express the supremum of as
Now we can express the dual optimization problem as follows:
(Dual of the infinite-dimensional fair advertising problem). For all , find a optimal and for
Since the primal is linear in , and the constraints are feasible strong duality holds. Therefore, the dual optimal is primal optimal.
For any feasible constraints we have for all and . Therefore the coefficient of , , and that of , . Since and are non-negative, a optimal solution to the dual is finite. Let be a optimal solutions to the dual, and be a optimal solution to the primal. Fixing and to their optimal values and in the dual, let us define new virtual valuations , for all and
Then the leftover problem has only one Lagrangian multiplier, . Let be the affine transformation of defined on virtual valuations, i.e., , then the problem can be expressed as follows.
(Dual with shifted virtual valuations). For all , find the optimal for
This is the dual of the following unconstrained revenue maximizing problem. Myerson’s mechanism is the revenue maximizing solution to the unconstrained optimization problem. Further, by linearity and feasibility of constraints strong duality holds. Therefore the -shifted mechanism, for is a optimal fair mechanism.
(Unconstrained primal for the infinite-dimensional fair advertising problem). For all , find the optimal allocation rule for
Further, Myerson’s mechanism is truthful if the distribution of valuations are regular and independent. Since -shifted mechanism applies a constant shift to all valuation, it follows under the same assumptions that any -shifted mechanism is also truthful, and therefore has a unique payment rule defined by its allocation rule. ∎
7.2 Proof of Theorem 4.3
The next lemma is an algorithm to solve the optimal shift problem. Its proof is presented in Section 7.4