Demographic parity

Demographic parity

As introduced in Introduction and according to [KLRS17] , if we have a protected attribute , \(A\) , then:

A predictor \(\hat{Y}\) satisfies demographic parity if \(P(\hat{Y}|A=0)=P(\hat{Y}|A=1)\) .

Additionally, for different groups in the attribute \(A=\{a_1,a_2,\dots,a_n\}\) , we can say that a binary classifier \(C\) would satisfy:

\[ P_{a_1}\{C=1\}=P_{a_2}\{C=1\}=\dots=P_{a_n}\{C=1\} \]

An alternative formulation, for approximate demographic parity can be given by

\[ |P_{a_1}\{C=1\}=P_{a_2}\{C=1\}=\dots=P_{a_n}\{C=1\}|\leq\epsilon \]

Additionally, we define a sensitive group as a feature \(S\) that belongs to a dataset \(\mathcal{D}\) and has \(N\) data points and \({d}\) dimensions.

An advantageous/privileged group is a level \(T_a\) of the feature \(S\) having \(N_a\) data points and belonging to \(\mathcal{D}\) having a higher probability of favourable outcome. It is also called \(S=1\) and \(S_a\) .

Finally, a disadvantageous/unprivileged group is a level \(T_d\) of the feature \(S\) having \(N_d\) data points and belonging to \(\mathcal{D}\) which has a higher probability of favourable outcome. It is also called \(S=0\) and \(S_d\) .

Actual labels with data point \(\mathbf{N}\) are represented by \(\mathcal{D}\) having values \({y}\) .

Data

Let’s look at an example 2 of demographic parity using a FAT Forensics (FATF) dataset.

We start by importing one of the bundled FATF example datasets.

import fatf.utils.data.datasets as fatf_datasets

hr_data_dict = fatf_datasets.load_health_records()
hr_data = hr_data_dict["data"]

This dataset contains the following input features:

hr_data.dtype.names
('name', 'email', 'age', 'weight', 'gender', 'zipcode', 'diagnosis', 'dob')

And the following potential outcomes:

hr_data_dict["target_names"]
array(['fail', 'success'], dtype='<U7')

These are encoded numerically as:

import numpy as np

hr_target = hr_data_dict["target"]
np.unique(hr_target)
array([0, 1])

Where:

  • 0 is a failed treatment

  • 1 is a successful one.

Let’s convert the data to a Pandas DataFrame .

import pandas as pd

df = pd.DataFrame(hr_data_dict["data"])
df.head()
name email age weight gender zipcode diagnosis dob
0 Heidi Mitchell uboyd@hotmail.com 74 52 female 1121 cancer 03/06/2018
1 Tina Burns stevenwheeler@williams.bi 3 86 female 0323 hip 26/09/2017
2 Justin Brown velasquezjake@gmail.com 26 56 female 0100 heart 26/12/2015
3 Brent Parker kennethsingh@strong-foley 70 57 male 3131 heart 02/10/2011
4 Bryan Norton erica36@hotmail.com 48 57 male 0301 hip 09/09/2012

We now add the outcomes (targets):

df["target"] = hr_data_dict["target"]
df.head()
name email age weight gender zipcode diagnosis dob target
0 Heidi Mitchell uboyd@hotmail.com 74 52 female 1121 cancer 03/06/2018 0
1 Tina Burns stevenwheeler@williams.bi 3 86 female 0323 hip 26/09/2017 0
2 Justin Brown velasquezjake@gmail.com 26 56 female 0100 heart 26/12/2015 0
3 Brent Parker kennethsingh@strong-foley 70 57 male 3131 heart 02/10/2011 0
4 Bryan Norton erica36@hotmail.com 48 57 male 0301 hip 09/09/2012 0

We will now construct a confusion matrix , as defined in Confusion matrix , but for the protected attribute gender .

We start by grouping the outcome per gender.

Grouping outcomes by gender

f_subset = df[df.gender == "female"]
m_subset = df[df.gender == "male"]

print(f"Female data indices: {list(f_subset.index)}")
print(f"Male data indices: {list(m_subset.index)}")
Female data indices: [0, 1, 2, 6, 9, 12, 13, 14, 16, 17, 18, 19]
Male data indices: [3, 4, 5, 7, 8, 10, 11, 15, 20]

Calculating female treatment outcomes:

f_subset_target_count = f_subset.target.value_counts().to_dict()
print(f_subset_target_count)
{1: 7, 0: 5}

And calculating male treatment outcomes:

m_subset_target_count = m_subset.target.value_counts().to_dict()
print(m_subset_target_count)
{0: 5, 1: 4}
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotutils import *

pdf = pd.DataFrame(
    [
        {"gender": "male", "outcome": "fail", "count": m_subset_target_count[0]},
        {"gender": "male", "outcome": "success", "count": m_subset_target_count[1]},
        {"gender": "female", "outcome": "fail", "count": f_subset_target_count[0]},
        {"gender": "female", "outcome": "success", "count": f_subset_target_count[1]},
    ]
)

g = sns.barplot(x="outcome", y="count", hue="gender", data=pdf, palette=[colours[0], colours[1]])
g.legend_.set_title(None)
plt.title("Outcome count per gender")
plt.show()
_images/fairness-demographic_parity_24_0.png
f_fail_ratio = f_subset_target_count[0] / f_subset.count()[0]
f_success_ratio = f_subset_target_count[1] / f_subset.count()[0]
print(f"Female ratios: sucess = {f_success_ratio}, fail={f_fail_ratio}")
Female ratios: sucess = 0.5833333333333334, fail=0.4166666666666667
m_fail_ratio = m_subset_target_count[0] / m_subset.count()[0]
m_success_ratio = m_subset_target_count[1] / m_subset.count()[0]
print(f"Male ratios: sucess = {m_success_ratio}, fail={m_fail_ratio}")
Male ratios: sucess = 0.4444444444444444, fail=0.5555555555555556
pdf = pd.DataFrame(
    [
        {"gender": "male", "outcome": "fail", "ratio": m_fail_ratio},
        {"gender": "male", "outcome": "success", "ratio": m_success_ratio},
        {"gender": "female", "outcome": "fail", "ratio": f_fail_ratio},
        {"gender": "female", "outcome": "success", "ratio": f_success_ratio},
    ]
)

g = sns.barplot(x="outcome", y="ratio", hue="gender", data=pdf, palette=[colours[0], colours[1]])
g.legend_.set_title(None)
plt.title("Gender success/fail ratio")
plt.show()
_images/fairness-demographic_parity_27_0.png
print(abs(f_success_ratio - m_success_ratio))
0.13888888888888895

Using a K-NN model

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder

np.random.seed(23)

gender_encoder = LabelEncoder()
diagnosis_encoder = LabelEncoder()

df.gender = gender_encoder.fit_transform(df.gender)
df.diagnosis = diagnosis_encoder.fit_transform(df.diagnosis)

model = KNeighborsClassifier(n_neighbors=3)
model = model.fit(df[["age", "gender", "weight", "diagnosis"]], df["target"])
predictions = model.predict(df[["age", "gender", "weight", "diagnosis"]])
df["predictions"] = predictions

df.head()
name email age weight gender zipcode diagnosis dob target predictions
0 Heidi Mitchell uboyd@hotmail.com 74 52 0 1121 0 03/06/2018 0 0
1 Tina Burns stevenwheeler@williams.bi 3 86 0 0323 2 26/09/2017 0 1
2 Justin Brown velasquezjake@gmail.com 26 56 0 0100 1 26/12/2015 0 1
3 Brent Parker kennethsingh@strong-foley 70 57 1 3131 1 02/10/2011 0 0
4 Bryan Norton erica36@hotmail.com 48 57 1 0301 2 09/09/2012 0 0

Confusion matrix

We will now calculate a confusion matrix for each protected attribute (in this example, gender ).

unique_labels = df.gender.unique()

grouped_cm = []


def calculate_cm(df, target_label, prediction_label, pos_value, neg_value):
    positive_class = df[df[target_label] == pos_value]
    negative_class = df[df[target_label] == neg_value]
    TP = positive_class[
        positive_class[target_label] == positive_class[prediction_label]
    ].count()[0]
    TN = negative_class[
        negative_class[target_label] == negative_class[prediction_label]
    ].count()[0]
    FP = positive_class[
        positive_class[target_label] != positive_class[prediction_label]
    ].count()[0]
    FN = negative_class[
        negative_class[target_label] != negative_class[prediction_label]
    ].count()[0]
    return np.array([[TP, FP], [FN, TN]])


for label in unique_labels:
    subgroup = df[df.gender == label]
    cm = calculate_cm(subgroup, "target", "predictions", 1, 0)
    grouped_cm.append(cm)

grouped_cm
[array([[7, 0],
        [2, 3]]),
 array([[3, 1],
        [2, 3]])]
def accuracy(cm):
    TP = cm[0][0]
    FP = cm[0][1]
    FN = cm[1][0]
    TN = cm[1][1]
    return (TP + TN) / (TP + TN + FP + FN)


print(f"Female accuracy: {accuracy(grouped_cm[0])}")
Female accuracy: 0.8333333333333334
print(f"Male accuracy: {accuracy(grouped_cm[1])}")
Male accuracy: 0.6666666666666666

Demographic parity difference

Demographic Parity Difference (DPD) can be defined as the difference between the largest and the smallest group-level selection rate, that is

\[ \mathbb{E}[h(\mathcal{D})|A=a], \]

across all values \(y\) of the sensitive feature(s) \(A\) . If all groups have the same selection rate, then the DPD will be \(0\) .

import fatf.fairness.models.measures as fatf_fairness_models

gender_equal_accuracy = fatf_fairness_models.equal_accuracy(grouped_cm)
fatf_fairness_models.disparate_impact_check(gender_equal_accuracy)
20-Oct-30 16:36:46 fatf.utils.array.tools INFO     Using numpy's numpy.lib.recfunctions.structured_to_unstructured as fatf.utils.array.tools.structured_to_unstructured and fatf.utils.array.tools.structured_to_unstructured_row.
False
gender_equal_opportunity = fatf_fairness_models.equal_opportunity(grouped_cm)
gender_equal_opportunity
array([[False, False],
       [False, False]])

Disparate Impact

fatf_fairness_models.disparate_impact_check(gender_equal_opportunity)
False
gender_demographic_parity = fatf_fairness_models.demographic_parity(grouped_cm)
gender_demographic_parity
array([[False,  True],
       [ True, False]])
fatf_fairness_models.disparate_impact_check(gender_demographic_parity)
True
abs(f_fail_ratio - m_fail_ratio)
0.1388888888888889

Problems

A criticism of demographic parity is that it only measure whether the algorithm is fair, and not if the model is fair, that is, the training data is not taken into account. Let’s look at the following example.

An abstract “offer” is made to the population, which we will denote as offer with a binary value, 1 if offered, 0 if not. We will make this algorithm discriminatory, by offering to two different groups with an ethnicity as a protected attribute. The offer will be of 50% to ethinity “0” and 50% to ethnicity “1”. However, the representation of each group will be severely unbalanced.

df = pd.DataFrame(
    data={"ethnicity": [1] * 400 + [0] * 2, "offer": [1] * 200 + [0] * 200 + [1] + [0]}
)
df["predictions"] = df["offer"]
df.head()
ethnicity offer predictions
0 1 1 1
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
import pandas as pd
import seaborn as sns

from plotutils import *

pdf = pd.DataFrame(
    [
        {"ethnicity": "0", "offer": 1, "count": 200},
        {"ethnicity": "1", "offer": 1, "count": 1},
        {"ethnicity": "0", "offer": 0, "count": 200},
        {"ethnicity": "1", "offer": 0, "count": 1},
    ]
)

g = sns.barplot(x="offer", y="count", hue="ethnicity", data=pdf, palette=[colours[0], colours[1]])
g.legend_.set_title(None)
plt.title("Outcome count per gender")
plt.show()
_images/fairness-demographic_parity_49_0.png

Let’s now calculate the demographic parity as before.

grouped_cm = []

for ethnicity in ["0", "1"]:
    subgroup = df[df["ethnicity"] == ethnicity]
    cm = calculate_cm(subgroup, "offer", "predictions", 1, 0)
    grouped_cm.append(cm)

grouped_cm
[array([[0, 0],
        [0, 0]]),
 array([[0, 0],
        [0, 0]])]
gender_demographic_parity = fatf_fairness_models.demographic_parity(grouped_cm)
gender_demographic_parity
array([[False, False],
       [False, False]])
fatf_fairness_models.disparate_impact_check(gender_demographic_parity)
False

References

KLRS17

Matt Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. Advances in Neural Information Processing Systems , 2017-Decem(Nips):4067–4077, 2017. arXiv:1703.06856 .


2

This example is based on https://fat-forensics.org/tutorials/grouping-fairness.html