Intersectionality: Multiple Group Fairness in Expectation Constraints

Intersectionality: Multiple Group Fairness in Expectation Constraints

Jack Fitzsimons
Machine Learning Research Group
University of Oxford, UK
jack@robots.ox.ac.uk
&Michael Osborne
Machine Learning Research Group
University of Oxford, UK
mosb@robots.ox.ac.uk
&Stephen Roberts
Machine Learning Research Group
University of Oxford, UK
sjrob@robots.ox.ac.uk
Abstract

Group fairness is an important concern for machine learning researchers, developers, and regulators. However, the strictness to which models must be constrained to be considered fair is still under debate. The focus of this work is on constraining the expected outcome of subpopulations in kernel regression and, in particular, decision tree regression, with application to random forests, boosted trees and other ensemble models. While individual constraints were previously addressed, this work addresses concerns about incorporating multiple constraints simultaneously. The proposed solution does not affect the order of computational or memory complexity of the decision trees and is easily integrated into models post training.

 

Intersectionality: Multiple Group Fairness in Expectation Constraints


  Jack Fitzsimons Machine Learning Research Group University of Oxford, UK jack@robots.ox.ac.uk Michael Osborne Machine Learning Research Group University of Oxford, UK mosb@robots.ox.ac.uk Stephen Roberts Machine Learning Research Group University of Oxford, UK sjrob@robots.ox.ac.uk

\@float

noticebox[b]Presented at the Workshop on Ethical, Social and Governance Issues in AI at 32nd Conference on Neural Information Processing Systems (NIPS 2018). Do not distribute.\end@float

1 Introduction

The widespread use of machine learning algorithms and fully autonomous systems has greatly transformed the industrial landscape of the twenty-first century. However, with these great advances comes a responsibility for researchers, developers, and regulators to consider the impact of these systems on the broader society. In 2014, the US presidential administration published a report on big data collection and analysis, finding that “big data technologies can cause societal harms beyond damages to privacy" (united2014big, ). The report raised the concern that algorithmic decisions inferred from big data may have harmful biases, potentially leading to discrimination against disadvantaged groups.

This drive towards ethical practices in machine learning has led to many developments in algorithmic fairness. One such advance has been towards developing algorithms which display group fairness, also referred to as statistical, conditional or demographical parity. From a regulatory viewpoint, group fairness is particularly interesting as affirmative action policies have already been passed to address discrimination against caste, race and gender weisskopf2004affirmative (); dumont1980homo (); deshpande2017affirmative (). However, it is worth noting there may be a considerable cost involved in achieving such fairness in some cases corbett2017algorithmic ().

In a machine learning context, there have primarily been two approaches towards developing systems which demonstrate group fairness; data alteration endeavors to modify the original dataset in order to prevent discrimination between groups luong2011k (); kamiran2009classifying () in contrast to regularisation which penalizes models for unfair behavior kamishima2011fairness (); berk2017convex (); calders2013controlling (); calders2010three (); raff2017fair ().

More recently, there has been an effort towards constraining models such that they prohibit unfair behavior, a stricter assertion than regularisation. This work directly follows aistats_submission () in which group fairness in expectation for regression models is investigated, defined as:

Group Fairness in Expectation (GFE): A regressor achieves group fairness in expectation with respect to groups and generative distributions and respectively iff,

This work addresses an important issue raised in dwork2018group (), a model that satisfies conditional parity with respect to race and gender independently may fail to satisfy conditional parity with respect to the conjunction of race and gender. In the social science literature concerns about, potentially discriminated against, sub-demographics are referred to as intersectionality mccall2008complexity (). More formally, this work proposes a simple approach to ensure group fairness in expectation across an arbitrary set of subgroups. Applied to the popular decision trees, it is shown that provided the number of parity conditions is negligible compared to the number of training points, the order of computational and memory complexity is not increased.

Figure 1: The above figure visualizes how the kernel matrix of a regressor can incorporate quadrature constraints. In the equations, denotes the difference between two sub-populations and . In the visual matrix representation (right) there are three components; the diagonal sub-matrix represents the leaves of the decision tree being independent of one another, the grey cells represent the relationship between the leaves and the constraints (denoted as in this work) and, finally, the dark upper left cells represents the interactions between constraints (denoted as in this work).

2 Constrained Kernel Methods

As shown in aistats_submission (), kernel regressors may be constrained in terms of their expectation by adding auxiliary noiseless quadrature observations. Take for instance a Gaussian distribution with two dimensions. Given the distribution is zero mean, without loss in generality, correlation and variance and respectively. With independent identically distributed noise , we can constrain the values of the expected outcomes by incorporating a noiseless observation on as follows,

The above covariance matrix, , has rank 2 with one perfectly observed value; namely the mean equality constraint. Thus inference on the two dimensions is constrained for .

As shown visually in Figure 1, this principle can be extended to Gaussian processes and other kernel regression techniques by using the differences in quadrature observations in order to incorporate mean equality constraints. Multiple constraints can also be created by simply adding more columns and rows to the kernel matrix accordingly.

3 Constrained Decision Trees

While the above, constrained kernel inference is interesting, the widespread impact comes into play when we extend the result to decision tree regression, random forests, boosted trees and other ensemble techniques. This is said not to dismiss the importance or value in kernel methods more broadly, but rather due to their popularity amongst data scientists, a profession more common than machine learning researchers wu2008top ().

While decision trees can be represented in either compressed or explicit kernel representation aistats_submission (), for the sake of conciseness this work will present results only for compressed representation. Thus, we will endeavor to minimize the perturbations induced on a per leaf bases, irrespective of the number of data points per leaf. The core difference between single and multiple constraints is that we can no longer use the arrowhead matrix lemma, instead, we must work out the update using the block matrix inversion lemma. Importantly, and for each constraint are defined as the empirical distributions of each subgroup considered. This is an important point as small subgroups may have empirical distributions which are not good approximations of the true generative distributions and hence our constrained space for inference may not constrain predictions to equate accordingly.

The kernel function in the compressed representation is simple the identity matrix, , that is to say each leaf is independent of one another. The kernel regression equation can be denoted as,

where the first values (number of leaves) of indicate to which leaf belongs and the remaining values are set as zero, as the point to predict will not contribute to the empirical distributions of the subgroups under an inductive learning paradigm.

Figure 2: The figure shows the effect of GFE constraints in the inferred scores of the ProPublica dataset between African American, Hispanic and all other defendants. Before perturbation is in blue and after in orange. Vertical lines indicate the mean of the distributions.

Using the block matrix inversion lemma we find,

By simply inserting this into the kernel regression equation and noting that the elements of are necessarily zero, the following update to the expected mean can be found as,

with indicating the row of relating to the difference of subgroup distributions on leaf . The effect of the noise can be removed by post multiplying by .

4 Experiments

4.1 ProPublica & the COMPAS System

The first experiment reproduces the experiment in aistats_submission () which uses a random forest to estimate the recidivism decile scores of the COMPAS algorithm applied to the ProPublica dataset while adding a GFE constraint between African Americans and Non-African Americans. However, it can also be noted that Hispanics also receive a similar discrimination. Figure 2, visualizes the effect of GFE constraints on the predicted distributions of the three demographics.

4.2 Illinois State Employee Salaries

The Illinois state employee salaries111https://data.illinois.gov/datastore/dump/1a0cd05c-7d17-4e3d-938d-c2bfa2a4a0b1 since 2011 can be seen to have a gender bias and bias between veterans and non-veterans. The motivation of this experiment was if one wished to predict a fair salary for future employees based on current staff. Gender labels were inferred using the employees’ first names, parsed through the gender-geusser python library. GFE constraints were applied between all intersections of gender and veteran / non-veterans, the marginals of gender and the marginals of veteran / non-veterans. Table 1 shows the expected outcome of each group before and after GFE constraints are applied and Figure 3 visualizes the perturbations to the marginals of each demographic intersection due to the GFE constraints. The train-test split was set as 80%-20% and the incorporation of the GFE constraints increase the root mean squared error from $12,086 to $12,772, the cost of fairness.

Figure 3: The figure visualizes the distribution of salaries before and after perturbations due to GFE constraints. It is clear that female veterans benefit the most from such a constraint, while male non-veterans lose out. Colors and lines denoting the same meaning as Figure 2. The figures are cropped to the main mode of salaries to facilitate visual comparisons.
Group Female Non-Vet. Male Non-Vet. Female Vet. Male Vet. Male Female Vet. Non-Vet.
Original 47,334 52,777 41,890 51,063 46,962 52,215 49,555 49,805
Perturbed 48,695 48,693 48,694 48,693 48,695 48,698 48,775 48,775
Table 1: The above table shows the expected outcome of a random tree regressor with and without GFE constraints applied to four sub-demographics, between gender and between veterans and non-veterans.

5 Conclusion

Regulatory bodies have shown precedent in developing affirmative action and other group fairness policy. This work extends previous efforts to develop group fairness constrained machine learning techniques. While relatively simple to understand and easy to incorporate into models used by practitioners, the methodology of this paper has a direct impact to four of the ten top data science algorithms according to wu2008top (). All source code used in the experiments are available at https://github.com/OxfordML/Fair_Regression.git.

References

  • [1] R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth. A convex framework for fair regression. arXiv preprint arXiv:1706.02409, 2017.
  • [2] T. Calders, A. Karim, F. Kamiran, W. Ali, and X. Zhang. Controlling attribute effect in linear regression. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 71–80. IEEE, 2013.
  • [3] T. Calders and S. Verwer. Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21(2):277–292, 2010.
  • [4] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797–806. ACM, 2017.
  • [5] A. Deshpande. Affirmative action in india. In Race and Inequality, pages 77–90. Routledge, 2017.
  • [6] L. Dumont. Homo hierarchicus: The caste system and its implications. University of Chicago Press, 1980.
  • [7] C. Dwork and C. Ilvento. Group fairness under composition. FATML, 2018.
  • [8] J. Fitzsimons, A. Al Ali, M. Osborne, and S. Roberts. Group fairness under composition. arXiv preprint arXiv:1810.05041, 2018.
  • [9] F. Kamiran and T. Calders. Classifying without discriminating. In Computer, Control and Communication, 2009. IC4 2009. 2nd International Conference on, pages 1–6. IEEE, 2009.
  • [10] T. Kamishima, S. Akaho, and J. Sakuma. Fairness-aware learning through regularization approach. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pages 643–650. IEEE, 2011.
  • [11] B. T. Luong, S. Ruggieri, and F. Turini. k-nn as an implementation of situation testing for discrimination discovery and prevention. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 502–510. ACM, 2011.
  • [12] L. McCall. The complexity of intersectionality. In Intersectionality and Beyond, pages 65–92. Routledge-Cavendish, 2008.
  • [13] E. Raff, J. Sylvester, and S. Mills. Fair forests: Regularized tree induction to minimize model bias. arXiv preprint arXiv:1712.08197, 2017.
  • [14] United States. Executive Office of the President and J. Podesta. Big data: Seizing opportunities, preserving values. White House, Executive Office of the President, 2014.
  • [15] T. E. Weisskopf. Affirmative action in the United States and India. A Comparative Perspective, London and New York: Routledge, 2004.
  • [16] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip, et al. Top 10 algorithms in data mining. Knowledge and information systems, 14(1):1–37, 2008.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
320274
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description