# Demographic parity ¶

## Demographic parity ¶

As introduced in Introduction and according to [KLRS17] , if we have a protected attribute , $$A$$ , then:

A predictor $$\hat{Y}$$ satisfies demographic parity if $$P(\hat{Y}|A=0)=P(\hat{Y}|A=1)$$ .

Additionally, for different groups in the attribute $$A=\{a_1,a_2,\dots,a_n\}$$ , we can say that a binary classifier $$C$$ would satisfy:

$P_{a_1}\{C=1\}=P_{a_2}\{C=1\}=\dots=P_{a_n}\{C=1\}$

An alternative formulation, for approximate demographic parity can be given by

$|P_{a_1}\{C=1\}=P_{a_2}\{C=1\}=\dots=P_{a_n}\{C=1\}|\leq\epsilon$

Additionally, we define a sensitive group as a feature $$S$$ that belongs to a dataset $$\mathcal{D}$$ and has $$N$$ data points and $${d}$$ dimensions.

An advantageous/privileged group is a level $$T_a$$ of the feature $$S$$ having $$N_a$$ data points and belonging to $$\mathcal{D}$$ having a higher probability of favourable outcome. It is also called $$S=1$$ and $$S_a$$ .

Finally, a disadvantageous/unprivileged group is a level $$T_d$$ of the feature $$S$$ having $$N_d$$ data points and belonging to $$\mathcal{D}$$ which has a higher probability of favourable outcome. It is also called $$S=0$$ and $$S_d$$ .

Actual labels with data point $$\mathbf{N}$$ are represented by $$\mathcal{D}$$ having values $${y}$$ .

### Data ¶

Let’s look at an example 2 of demographic parity using a FAT Forensics (FATF) dataset.

We start by importing one of the bundled FATF example datasets.

import fatf.utils.data.datasets as fatf_datasets

hr_data = hr_data_dict["data"]


This dataset contains the following input features:

hr_data.dtype.names

('name', 'email', 'age', 'weight', 'gender', 'zipcode', 'diagnosis', 'dob')


And the following potential outcomes:

hr_data_dict["target_names"]

array(['fail', 'success'], dtype='<U7')


These are encoded numerically as:

import numpy as np

hr_target = hr_data_dict["target"]
np.unique(hr_target)

array([0, 1])


Where:

•  0  is a failed treatment

•  1  is a successful one.

Let’s convert the data to a Pandas  DataFrame  .

import pandas as pd

df = pd.DataFrame(hr_data_dict["data"])

name email age weight gender zipcode diagnosis dob
0 Heidi Mitchell uboyd@hotmail.com 74 52 female 1121 cancer 03/06/2018
1 Tina Burns stevenwheeler@williams.bi 3 86 female 0323 hip 26/09/2017
2 Justin Brown velasquezjake@gmail.com 26 56 female 0100 heart 26/12/2015
3 Brent Parker kennethsingh@strong-foley 70 57 male 3131 heart 02/10/2011
4 Bryan Norton erica36@hotmail.com 48 57 male 0301 hip 09/09/2012

We now add the outcomes (targets):

df["target"] = hr_data_dict["target"]

name email age weight gender zipcode diagnosis dob target
0 Heidi Mitchell uboyd@hotmail.com 74 52 female 1121 cancer 03/06/2018 0
1 Tina Burns stevenwheeler@williams.bi 3 86 female 0323 hip 26/09/2017 0
2 Justin Brown velasquezjake@gmail.com 26 56 female 0100 heart 26/12/2015 0
3 Brent Parker kennethsingh@strong-foley 70 57 male 3131 heart 02/10/2011 0
4 Bryan Norton erica36@hotmail.com 48 57 male 0301 hip 09/09/2012 0

We will now construct a confusion matrix , as defined in Confusion matrix , but for the protected attribute  gender  .

We start by grouping the outcome per gender.

### Grouping outcomes by gender ¶

f_subset = df[df.gender == "female"]
m_subset = df[df.gender == "male"]

print(f"Female data indices: {list(f_subset.index)}")
print(f"Male data indices: {list(m_subset.index)}")

Female data indices: [0, 1, 2, 6, 9, 12, 13, 14, 16, 17, 18, 19]
Male data indices: [3, 4, 5, 7, 8, 10, 11, 15, 20]


Calculating female treatment outcomes:

f_subset_target_count = f_subset.target.value_counts().to_dict()
print(f_subset_target_count)

{1: 7, 0: 5}


And calculating male treatment outcomes:

m_subset_target_count = m_subset.target.value_counts().to_dict()
print(m_subset_target_count)

{0: 5, 1: 4}

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotutils import *

pdf = pd.DataFrame(
[
{"gender": "male", "outcome": "fail", "count": m_subset_target_count[0]},
{"gender": "male", "outcome": "success", "count": m_subset_target_count[1]},
{"gender": "female", "outcome": "fail", "count": f_subset_target_count[0]},
{"gender": "female", "outcome": "success", "count": f_subset_target_count[1]},
]
)

g = sns.barplot(x="outcome", y="count", hue="gender", data=pdf, palette=[colours[0], colours[1]])
g.legend_.set_title(None)
plt.title("Outcome count per gender")
plt.show()

f_fail_ratio = f_subset_target_count[0] / f_subset.count()[0]
f_success_ratio = f_subset_target_count[1] / f_subset.count()[0]
print(f"Female ratios: sucess = {f_success_ratio}, fail={f_fail_ratio}")

Female ratios: sucess = 0.5833333333333334, fail=0.4166666666666667

m_fail_ratio = m_subset_target_count[0] / m_subset.count()[0]
m_success_ratio = m_subset_target_count[1] / m_subset.count()[0]
print(f"Male ratios: sucess = {m_success_ratio}, fail={m_fail_ratio}")

Male ratios: sucess = 0.4444444444444444, fail=0.5555555555555556

pdf = pd.DataFrame(
[
{"gender": "male", "outcome": "fail", "ratio": m_fail_ratio},
{"gender": "male", "outcome": "success", "ratio": m_success_ratio},
{"gender": "female", "outcome": "fail", "ratio": f_fail_ratio},
{"gender": "female", "outcome": "success", "ratio": f_success_ratio},
]
)

g = sns.barplot(x="outcome", y="ratio", hue="gender", data=pdf, palette=[colours[0], colours[1]])
g.legend_.set_title(None)
plt.title("Gender success/fail ratio")
plt.show()

print(abs(f_success_ratio - m_success_ratio))

0.13888888888888895


### Using a K-NN model ¶

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder

np.random.seed(23)

gender_encoder = LabelEncoder()
diagnosis_encoder = LabelEncoder()

df.gender = gender_encoder.fit_transform(df.gender)
df.diagnosis = diagnosis_encoder.fit_transform(df.diagnosis)

model = KNeighborsClassifier(n_neighbors=3)
model = model.fit(df[["age", "gender", "weight", "diagnosis"]], df["target"])

predictions = model.predict(df[["age", "gender", "weight", "diagnosis"]])
df["predictions"] = predictions


name email age weight gender zipcode diagnosis dob target predictions
0 Heidi Mitchell uboyd@hotmail.com 74 52 0 1121 0 03/06/2018 0 0
1 Tina Burns stevenwheeler@williams.bi 3 86 0 0323 2 26/09/2017 0 1
2 Justin Brown velasquezjake@gmail.com 26 56 0 0100 1 26/12/2015 0 1
3 Brent Parker kennethsingh@strong-foley 70 57 1 3131 1 02/10/2011 0 0
4 Bryan Norton erica36@hotmail.com 48 57 1 0301 2 09/09/2012 0 0

## Confusion matrix ¶

We will now calculate a confusion matrix for each protected attribute (in this example,  gender  ).

unique_labels = df.gender.unique()

grouped_cm = []

def calculate_cm(df, target_label, prediction_label, pos_value, neg_value):
positive_class = df[df[target_label] == pos_value]
negative_class = df[df[target_label] == neg_value]
TP = positive_class[
positive_class[target_label] == positive_class[prediction_label]
].count()[0]
TN = negative_class[
negative_class[target_label] == negative_class[prediction_label]
].count()[0]
FP = positive_class[
positive_class[target_label] != positive_class[prediction_label]
].count()[0]
FN = negative_class[
negative_class[target_label] != negative_class[prediction_label]
].count()[0]
return np.array([[TP, FP], [FN, TN]])

for label in unique_labels:
subgroup = df[df.gender == label]
cm = calculate_cm(subgroup, "target", "predictions", 1, 0)
grouped_cm.append(cm)

grouped_cm

[array([[7, 0],
[2, 3]]),
array([[3, 1],
[2, 3]])]

def accuracy(cm):
TP = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TN = cm[1][1]
return (TP + TN) / (TP + TN + FP + FN)

print(f"Female accuracy: {accuracy(grouped_cm[0])}")

Female accuracy: 0.8333333333333334

print(f"Male accuracy: {accuracy(grouped_cm[1])}")

Male accuracy: 0.6666666666666666


## Demographic parity difference ¶

Demographic Parity Difference (DPD) can be defined as the difference between the largest and the smallest group-level selection rate, that is

$\mathbb{E}[h(\mathcal{D})|A=a],$

across all values $$y$$ of the sensitive feature(s) $$A$$ . If all groups have the same selection rate, then the DPD will be $$0$$ .

import fatf.fairness.models.measures as fatf_fairness_models

gender_equal_accuracy = fatf_fairness_models.equal_accuracy(grouped_cm)
fatf_fairness_models.disparate_impact_check(gender_equal_accuracy)

20-Oct-30 16:36:46 fatf.utils.array.tools INFO     Using numpy's numpy.lib.recfunctions.structured_to_unstructured as fatf.utils.array.tools.structured_to_unstructured and fatf.utils.array.tools.structured_to_unstructured_row.

False

gender_equal_opportunity = fatf_fairness_models.equal_opportunity(grouped_cm)
gender_equal_opportunity

array([[False, False],
[False, False]])


## Disparate Impact ¶

fatf_fairness_models.disparate_impact_check(gender_equal_opportunity)

False

gender_demographic_parity = fatf_fairness_models.demographic_parity(grouped_cm)
gender_demographic_parity

array([[False,  True],
[ True, False]])

fatf_fairness_models.disparate_impact_check(gender_demographic_parity)

True

abs(f_fail_ratio - m_fail_ratio)

0.1388888888888889


## Problems ¶

A criticism of demographic parity is that it only measure whether the algorithm is fair, and not if the model is fair, that is, the training data is not taken into account. Let’s look at the following example.

An abstract “offer” is made to the population, which we will denote as  offer  with a binary value,  1  if offered,  0  if not. We will make this algorithm discriminatory, by offering to two different groups with an  ethnicity  as a protected attribute. The offer will be of 50% to ethinity “0” and 50% to ethnicity “1”. However, the representation of each group will be severely unbalanced.

df = pd.DataFrame(
data={"ethnicity": [1] * 400 + [0] * 2, "offer": [1] * 200 + [0] * 200 + [1] + [0]}
)
df["predictions"] = df["offer"]

ethnicity offer predictions
0 1 1 1
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
import pandas as pd
import seaborn as sns

from plotutils import *

pdf = pd.DataFrame(
[
{"ethnicity": "0", "offer": 1, "count": 200},
{"ethnicity": "1", "offer": 1, "count": 1},
{"ethnicity": "0", "offer": 0, "count": 200},
{"ethnicity": "1", "offer": 0, "count": 1},
]
)

g = sns.barplot(x="offer", y="count", hue="ethnicity", data=pdf, palette=[colours[0], colours[1]])
g.legend_.set_title(None)
plt.title("Outcome count per gender")
plt.show()


Let’s now calculate the demographic parity as before.

grouped_cm = []

for ethnicity in ["0", "1"]:
subgroup = df[df["ethnicity"] == ethnicity]
cm = calculate_cm(subgroup, "offer", "predictions", 1, 0)
grouped_cm.append(cm)

grouped_cm

[array([[0, 0],
[0, 0]]),
array([[0, 0],
[0, 0]])]

gender_demographic_parity = fatf_fairness_models.demographic_parity(grouped_cm)
gender_demographic_parity

array([[False, False],
[False, False]])

fatf_fairness_models.disparate_impact_check(gender_demographic_parity)

False


## References ¶

Matt Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. Advances in Neural Information Processing Systems , 2017-Decem(Nips):4067–4077, 2017. arXiv:1703.06856 .

This example is based on https://fat-forensics.org/tutorials/grouping-fairness.html