Explainable Fairness

How do we know if an algorithm is fair?

We propose the Explainable Fairness framework: a process to define and measure when an algorithm is complying with an anti-discrimination law. The central question is, Does the algorithm treat certain groups of people differently than others? The framework has three steps: choose a protected class, choose an outcome of interest, and measure and explain differences.

Example from hiring algorithms

Step 1: Identify protected stakeholder groups. Fair hiring rules prohibit discrimination by employers according to gender, race, national origin, and disability, among other protected classes. So all these could be considered specific groups for whom fairness needs to be verified. 

Step 2: Identify outcomes of interest. In hiring, being offered a job is an obvious topline outcome. Other outcomes could also be considered employment decisions: for instance, whether a candidate gets screened out at the resume stage, or whether they are invited to interview, or who applied in the first place, which might reflect bias in recruitment.

Step 3: Measure and Explain Loop. Measure the outcomes of interest for different categories of the protected class. For example, are fewer women getting interviews? If so, is there a legitimate factor that explains that difference? For example, are men who apply more likely to have a relevant credential or more years of experience? If so, account for those legitimate factors and remeasure the outcomes. If you end up with unexplained large differences, you have a problem. 

The process can be applied more generally, and looks like this: