Counterfactual Fairness

Building counterfactually fair models

Data

To evaluate counterfactual fairness we will be using the “law school” dataset (McIntyre and Simkovic 2018).

The Law School Admission Council conducted a survey across 163 law schools in the United States. It contains information on 21,790 law students such as their entrance exam scores (LSAT), their grade-point average (GPA) collected prior to law school, and their first year average grade (FYA). Given this data, a school may wish to predict if an applicant will have a high FYA. The school would also like to make sure these predictions are not biased by an individual’s race and sex. However, the LSAT, GPA, and FYA scores, may be biased due to social factors.

We start by importing the data into a Pandas DataFrame.

import pandas as pd

df = pd.read_csv("_data/law_data.csv", index_col=0)
df.head()

	race	sex	LSAT	UGPA	region_first	ZFYA	sander_index	first_pf
0	White	1	39.0	3.1	GL	-0.98	0.782738	1.0
1	White	1	36.0	3.0	GL	0.09	0.735714	1.0
2	White	2	30.0	3.1	MS	-0.35	0.670238	1.0
5	Hispanic	2	39.0	2.2	NE	0.58	0.697024	1.0
6	White	1	37.0	3.4	GL	-1.26	0.786310	1.0

Pre-processing

We now pre-process the data. We start by creating categorical “dummy” variables according to the race variable.

df = pd.get_dummies(df, columns=["race"], prefix="", prefix_sep="")
df.iloc[:, : 7].head()

	sex	LSAT	UGPA	region_first	ZFYA	sander_index	first_pf
0	1	39.0	3.1	GL	-0.98	0.782738	1.0
1	1	36.0	3.0	GL	0.09	0.735714	1.0
2	2	30.0	3.1	MS	-0.35	0.670238	1.0
5	2	39.0	2.2	NE	0.58	0.697024	1.0
6	1	37.0	3.4	GL	-1.26	0.786310	1.0

We also want to expand the sex variable into male / female categorical variables and remove the original.

df["male"] = df["sex"].map(lambda x: 1 if x == 2 else 0)
df["female"] = df["sex"].map(lambda x: 1 if x == 1 else 0)
df = df.drop(axis=1, columns=["sex"])
df.iloc[:, 0:7].head()

	LSAT	UGPA	region_first	ZFYA	sander_index	first_pf	Amerindian
0	39.0	3.1	GL	-0.98	0.782738	1.0	False
1	36.0	3.0	GL	0.09	0.735714	1.0	False
2	30.0	3.1	MS	-0.35	0.670238	1.0	False
5	39.0	2.2	NE	0.58	0.697024	1.0	False
6	37.0	3.4	GL	-1.26	0.786310	1.0	False

We will also convert the entrance exam scores (LSAT) to a discrete variable.

df["LSAT"] = df["LSAT"].astype(int)
df.iloc[:, :6].head()

	LSAT	UGPA	region_first	ZFYA	sander_index	first_pf
0	39	3.1	GL	-0.98	0.782738	1.0
1	36	3.0	GL	0.09	0.735714	1.0
2	30	3.1	MS	-0.35	0.670238	1.0
5	39	2.2	NE	0.58	0.697024	1.0
6	37	3.4	GL	-1.26	0.786310	1.0

Protected attributes

Counterfactual fairness enforces that a distribution over possible predictions for an individual should remain unchanged in a world where an individual’s protected attributes A had been different in a causal sense. Let’s start by defining the /protected attributes/. Obvious candidates are the different categorical variables for ethnicity (Asian, White, Black, etc) and gender (male, female).

A = [
    "Amerindian",
    "Asian",
    "Black",
    "Hispanic",
    "Mexican",
    "Other",
    "Puertorican",
    "White",
    "male",
    "female",
]

Training and testing subsets

We will now divide the dataset into training and testing subsets. We will use the same ratio as in (Kusner et al. 2017), that is 20%.

from sklearn.model_selection import train_test_split

df_train, df_test = train_test_split(df, random_state=23, test_size=0.2);

Models

Unfair model

As detailed in (Kusner et al. 2017), the concept of counterfactual fairness holds under three levels of assumptions of increasing strength.

The first of such levels is Level 1, where \hat{Y} is built using only the observable non-descendants of A. This only requires partial causal ordering and no further causal assumptions, but in many problems there will be few, if any, observables which are not descendants of protected demographic factors.

For this dataset, since LSAT, GPA, and FYA are all biased by ethnicity and gender, we cannot use any observed features to construct a Level 1 counterfactually fair predictor as described in Level 1.

Instead (and in order to compare the performance with Level 2 and 3 models) we will build two unfair baselines.

A Full model, which will be trained with the totality of the variables
An Unaware model (FTU), which will be trained will all the variables, except the protected attributes A.

Let’s proceed with calculating the Full model.

Full model

As mentioned previously, the full model will be a simple linear regression in order to predict ZFYA using all of the variables.

from sklearn.linear_model import LinearRegression

linreg_unfair = LinearRegression()

The inputs will then be the totality of the variabes (protected variables A, as well as UGPA and LSAT).

import numpy as np

X = np.hstack(
    (
        df_train[A],
        np.array(df_train["UGPA"]).reshape(-1, 1),
        np.array(df_train["LSAT"]).reshape(-1, 1),
    )
)
X

array([[False, False, False, ..., 1, 3.1, 39],
       [False, False, False, ..., 1, 3.5, 36],
       [False, False, False, ..., 1, 3.9, 46],
       ...,
       [False, False, False, ..., 1, 2.9, 33],
       [False, False, False, ..., 0, 2.9, 31],
       [False, False, False, ..., 0, 3.6, 39]],
      shape=(17432, 12), dtype=object)

As for our target, we are trying to predict _ZFYA (first year average grade).

y = df_train["ZFYA"]
y[:10]

10454    0.56
14108    0.60
20624   -0.14
8316     0.20
14250    0.02
18909   -1.47
8949     1.36
1658     0.39
23340    0.10
26884    0.48
Name: ZFYA, dtype: float64

We fit the model:

linreg_unfair = linreg_unfair.fit(X, y)

And perform some predictions on the test subset.

X_test = np.hstack(
    (
        df_test[A],
        np.array(df_test["UGPA"]).reshape(-1, 1),
        np.array(df_test["LSAT"]).reshape(-1, 1),
    )
)
X_test

array([[False, False, False, ..., 0, 3.4, 32],
       [False, False, False, ..., 1, 3.5, 41],
       [False, False, False, ..., 1, 3.9, 42],
       ...,
       [False, False, False, ..., 0, 2.3, 28],
       [False, False, False, ..., 0, 3.3, 36],
       [False, False, False, ..., 0, 2.9, 37]],
      shape=(4359, 12), dtype=object)

predictions_unfair = linreg_unfair.predict(X_test)
predictions_unfair

array([ 0.08733885,  0.34862023,  0.46017962, ..., -0.25892473,
        0.19366485,  0.14526587], shape=(4359,))

We will also calculate the /unfair model/ score for future use.

score_unfair = linreg_unfair.score(X_test, df_test["ZFYA"])
score_unfair

0.1270014727399027

from sklearn.metrics import mean_squared_error

RMSE_unfair = np.sqrt(mean_squared_error(df_test["ZFYA"], predictions_unfair))
float(RMSE_unfair)

0.8666783694285809

Fairness through unawareness (FTU)

As also mentioned in (Kusner et al. 2017), the second baseline we will use is an Unaware model (FTU), which will be trained will all the variables, except the protected attributes A.

linreg_ftu = LinearRegression()

We will create the inputs as previously, but without using the protected attributes, A.

X_ftu = np.hstack(
    (
        np.array(df_train["UGPA"]).reshape(-1, 1),
        np.array(df_train["LSAT"]).reshape(-1, 1),
    )
)
X_ftu

array([[ 3.1, 39. ],
       [ 3.5, 36. ],
       [ 3.9, 46. ],
       ...,
       [ 2.9, 33. ],
       [ 2.9, 31. ],
       [ 3.6, 39. ]], shape=(17432, 2))

And we fit the model:

linreg_ftu = linreg_ftu.fit(X_ftu, y)

Again, let’s perform some predictions on the test subset.

X_ftu_test = np.hstack(
    (np.array(df_test["UGPA"]).reshape(-1, 1), np.array(df_test["LSAT"]).reshape(-1, 1))
)
X_ftu_test

array([[ 3.4, 32. ],
       [ 3.5, 41. ],
       [ 3.9, 42. ],
       ...,
       [ 2.3, 28. ],
       [ 3.3, 36. ],
       [ 2.9, 37. ]], shape=(4359, 2))

predictions_ftu = linreg_ftu.predict(X_ftu_test)
predictions_ftu

array([-0.06909331,  0.35516229,  0.50304555, ..., -0.53109868,
        0.08204563,  0.0226846 ], shape=(4359,))

As previously, let’s calculate this model’s score.

ftu_score = linreg_ftu.score(X_ftu_test, df_test["ZFYA"])
print(ftu_score)

0.0917442226187073

RMSE_ftu = np.sqrt(mean_squared_error(df_test["ZFYA"], predictions_ftu))
print(RMSE_ftu)

0.8840061503773576

Latent variable model

Still according to (Kusner et al. 2017), a Level 2 approach will model latent ‘fair’ variables which are parents of observed variables.

If we consider a predictor parameterised by \theta, such as:

\hat{Y} \equiv g_\theta (U, X_{\nsucc A})

with X_{\nsucc A} \subseteq X are non-descendants of A. Assuming a loss function l(.,.) and training data \mathcal{D}\equiv\{(A^{(i), X^{(i)}, Y^{(i)}})\}, for i=1,2\dots,n, the empirical loss is defined as

L(\theta)\equiv \sum_{i=1}^n \mathbb{E}[l(y^{(i)},g_\theta(U^{(i)}, x^{(i)}_{\nsucc A}))]/n

which has to be minimised in order to \theta. Each n expectation is with respect to random variable U^{(i)} such that

U^{(i)}\sim P_{\mathcal{M}}(U|x^{(i)}, a^{(i)})

where P_{\mathcal{M}}(U|x,a) is the conditional distribution of the background variables as given by a causal model M that is available by assumption.

If this expectation cannot be calculated analytically, Markov chain Monte Carlo (MCMC) can be used to approximate it as in the following algorithm.

We will follow the model specified in the original paper, where the latent variable considered is K, which represents a student’s knowledge. K will affect GPA, LSAT and the outcome, FYA. The model can be defined by:

\begin{aligned} GPA &\sim \mathcal{N}(GPA_0 + w_{GPA}^KK + w_{GPA}^RR + w_{GPA}^SS, \sigma_{GPA}) \\ LSAT &\sim \text{Po}(\exp(LSAT_0 + w_{LSAT}^KK + w_{LSAT}^RR + w_L^SS)) \\ FYA &\sim \mathcal{N}(w_{FYA}^KK + w_{FYA}^RR + w_{FYA}^SS, 1) \\ K &\sim \mathcal{N}(0,1) \end{aligned}

The priors used will be:

\begin{aligned} GPA_0 &\sim \mathcal{N}(0, 1) \\ LSAT_0 &\sim \mathcal{N}(0, 1) \\ GPA_0 &\sim \mathcal{N}(0, 1) \end{aligned}

import pymc as pm

K = len(A)

def MCMC(data, samples=100):  # Reduced default samples for faster execution

    N = len(data)
    # Convert to numeric array explicitly to avoid object dtype
    a = np.array(data[A], dtype=np.float64)

    model = pm.Model()

    with model:
        # Priors
        k = pm.Normal("k", mu=0, sigma=1, shape=(N,))
        gpa0 = pm.Normal("gpa0", mu=0, sigma=1)
        lsat0 = pm.Normal("lsat0", mu=0, sigma=1)
        w_k_gpa = pm.Normal("w_k_gpa", mu=0, sigma=1)
        w_k_lsat = pm.Normal("w_k_lsat", mu=0, sigma=1)
        w_k_zfya = pm.Normal("w_k_zfya", mu=0, sigma=1)

        w_a_gpa = pm.Normal("w_a_gpa", mu=np.zeros(K), sigma=np.ones(K), shape=K)
        w_a_lsat = pm.Normal("w_a_lsat", mu=np.zeros(K), sigma=np.ones(K), shape=K)
        w_a_zfya = pm.Normal("w_a_zfya", mu=np.zeros(K), sigma=np.ones(K), shape=K)

        sigma_gpa_2 = pm.InverseGamma("sigma_gpa_2", alpha=1, beta=1)

        mu = gpa0 + (w_k_gpa * k) + pm.math.dot(a, w_a_gpa)

        # Observed data
        gpa = pm.Normal(
            "gpa",
            mu=mu,
            sigma=pm.math.sqrt(sigma_gpa_2),
            observed=data["UGPA"].values,
        )
        lsat = pm.Poisson(
            "lsat",
            mu=pm.math.exp(lsat0 + w_k_lsat * k + pm.math.dot(a, w_a_lsat)),
            observed=data["LSAT"].values,
        )
        zfya = pm.Normal(
            "zfya",
            mu=w_k_zfya * k + pm.math.dot(a, w_a_zfya),
            sigma=1,
            observed=data["ZFYA"].values,
        )

        # Optimize sampling: reduce tuning steps, use parallel chains, faster target_accept
        trace = pm.sample(
            draws=samples,
            tune=100,  # Reduce tuning steps (default is 1000)
            cores=8,  # Use parallel chains
            chains=2,  # Use 2 chains instead of default 4
            target_accept=0.8,  # Slightly lower than default 0.95 for faster sampling
            progressbar=True,
            return_inferencedata=False,
        )

    return trace

Let’s plot a single trace for k^{(i)}.

train_k = np.mean(train_estimates["k"], axis=0).reshape(-1, 1)
train_k

array([[ 0.16546531],
       [-0.16838145],
       [ 0.80717885],
       ...,
       [-0.54693161],
       [-0.70418241],
       [ 0.0439383 ]], shape=(17432, 1))

We can now estimate k using the test data:

test_map_estimates = MCMC(df_test, samples=50)

Using dummy trace for preview - run MCMC to get real results

test_k = np.mean(test_map_estimates["k"], axis=0).reshape(-1, 1)
test_k

array([[ 0.63329743],
       [ 0.66896649],
       [ 0.08234495],
       ...,
       [ 0.64787624],
       [-0.27514218],
       [-0.02546481]], shape=(4359, 1))

We now build the Level 2 predictor, using k as the input.

linreg_latent = LinearRegression()

linreg_latent = linreg_latent.fit(train_k, df_train["ZFYA"])

predictions_latent = linreg_latent.predict(test_k)
predictions_latent

array([ 0.54369892,  0.57003262,  0.13694249, ...,  0.55446214,
       -0.1269826 ,  0.05734886], shape=(4359,))

latent_score = linreg_latent.score(test_k, df_test["ZFYA"])
print(latent_score)

0.24943554209955066

RMSE_latent = np.sqrt(mean_squared_error(df_test["ZFYA"], predictions_latent))
print(RMSE_latent)

0.803609754210961

Additive error model

Finally, in Level 3, we model GPA, LSAT, and FYA as continuous variables with additive error terms independent of race and sex (though these error terms may in turn be correlated with one-another).

This corresponds to

\begin{aligned} GPA &= b_G + w^R_{GPA}R + w^S_{GPA}S + \epsilon_{GPA}, \epsilon_{GPA} \sim p(\epsilon_{GPA}) \\ LSAT &= b_L + w^R_{LSAT}R + w^S_{LSAT}S + \epsilon_{LSAT}, \epsilon_{LSAT} \sim p(\epsilon_{LSAT}) \\ FYA &= b_{FYA} + w^R_{FYA}R + w^S_{FYA}S + \epsilon_{FYA} , \epsilon_{FYA} \sim p(\epsilon_{FYA}) \end{aligned}

We estimate the error terms \epsilon_{GPA}, \epsilon_{LSAT} by first fitting two models that each use race and sex to individually predict GPA and LSAT. We then compute the residuals of each model (e.g., \epsilon_{GPA} =GPA−\hat{Y}_{GPA}(R, S)). We use these residual estimates of \epsilon_{GPA}, \epsilon_{LSAT} to predict FYA. In (Kusner et al. 2017) this is called Fair Add.

Since the process is similar for the individual predictions for GPA and LSAT, we will write a method to avoid repetion.

def calculate_epsilon(data, var_name, protected_attr):
    X = data[protected_attr]
    y = data[var_name]

    linreg = LinearRegression()
    linreg = linreg.fit(X, y)

    predictions = linreg.predict(X)

    return data[var_name] - predictions

Let’s apply it to each variable, individually. First we calculate \epsilon_{GPA}:

epsilons_gpa = calculate_epsilon(df, "UGPA", A)
epsilons_gpa

0       -0.235638
1       -0.335638
2       -0.104541
5       -0.875931
6        0.064362
           ...   
27472    0.795459
27473    0.364362
27474    0.664362
27475   -0.304541
27476   -0.104541
Name: UGPA, Length: 21791, dtype: float64

Next, we calculate \epsilon_{LSAT}:

epsilons_LSAT = calculate_epsilon(df, "LSAT", A)
epsilons_LSAT

0        1.799960
1       -1.200040
2       -7.692552
5        5.018944
6       -0.200040
           ...   
27472   -4.692552
27473    0.799960
27474   -1.200040
27475   -6.692552
27476   -9.692552
Name: LSAT, Length: 21791, dtype: float64

Let’s visualise the \epsilon distribution quickly:

We finally use the calculated \epsilon to train a model in order to predict FYA. We start by getting the subset of the \epsilon which match the training indices.

X = np.hstack(
    (
        np.array(epsilons_gpa[df_train.index]).reshape(-1, 1),
        np.array(epsilons_LSAT[df_train.index]).reshape(-1, 1),
    )
)
X

array([[-0.23563757,  1.79995974],
       [ 0.16436243, -1.20004026],
       [ 0.56436243,  8.79995974],
       ...,
       [-0.43563757, -4.20004026],
       [-0.25102639, -4.79709541],
       [ 0.39545902,  1.30744827]], shape=(17432, 2))

linreg_fair_add = LinearRegression()

linreg_fair_add = linreg_fair_add.fit(
    X,
    df_train["ZFYA"],
)

We now use this model to calculate the predictions

X_test = np.hstack(
    (
        np.array(epsilons_gpa[df_test.index]).reshape(-1, 1),
        np.array(epsilons_LSAT[df_test.index]).reshape(-1, 1),
    )
)

predictions_fair_add = linreg_fair_add.predict(X_test)
predictions_fair_add

array([-0.04504685,  0.24620604,  0.35729715, ..., -0.38971687,
        0.06035652,  0.01193653], shape=(4359,))

And as previously, we calculate the model’s score:

fair_add_score = linreg_fair_add.score(X_test, df_test["ZFYA"])
print(fair_add_score)

0.044827206244169804

RMSE_fair_add = np.sqrt(mean_squared_error(df_test["ZFYA"], predictions_fair_add))
print(RMSE_fair_add)

0.9065508595292473

Comparison

The scores, so far, are:

print(f"Unfair score:\t{score_unfair}")
print(f"FTU score:\t{ftu_score}")
print(f"L2 score:\t{latent_score}")
print(f"Fair add score:\t{fair_add_score}")

Unfair score:   0.1270014727399027
FTU score:  0.0917442226187073
L2 score:   0.24943554209955066
Fair add score: 0.044827206244169804

print(f"Unfair RMSE:\t{RMSE_unfair}")
print(f"FTU RMSE:\t{RMSE_ftu}")
print(f"L2 RMSE:\t{RMSE_latent}")
print(f"Fair add RMSE:\t{RMSE_fair_add}")

Unfair RMSE:    0.8666783694285809
FTU RMSE:   0.8840061503773576
L2 RMSE:    0.803609754210961
Fair add RMSE:  0.9065508595292473

Measuring counterfactual fairness

First, we will measure two quantities, the Statistical Parity Difference (SPD)¹ and Disparate impact (DI)².

Statistical Parity Difference / Disparate Impact

from fairlearn.metrics import demographic_parity_difference, demographic_parity_ratio

parities = []
impacts = []

for a in A:
    parity = demographic_parity_difference(df_train["ZFYA"], df_train["ZFYA"],
                                                sensitive_features = df_train[a])
    di = demographic_parity_ratio(df_train["ZFYA"], df_train["ZFYA"],
                                                sensitive_features = df_train[a])
    parities.append(parity)
    impacts.append(di)

df_parities = pd.DataFrame({'protected':A,'parity':parities,'impact':impacts})

Finding sensitive features

Typically a SPD > 0.1 and a DI < 0.9 might indicate discrimination on those features. In our dataset, we have two features (_Hispanic and _Mexican) which clearly fail the DI test (with DI < 0.9), indicating potential discrimination. The SPD values for these features are relatively small, suggesting that while there may be disparate impact, the absolute difference in proportions is not as pronounced.

for a in ["Mexican", "Hispanic"]:
    spd = demographic_parity_difference(y_true=df_train["ZFYA"],
                                        y_pred=df_train["ZFYA"],
                                        sensitive_features = df_train[a])
    print(f"SPD({a}) = {spd}")
    di = demographic_parity_ratio(y_true=df_train["ZFYA"],
                                  y_pred=df_train["ZFYA"],
                                  sensitive_features = df_train[a])
    print(f"DI({a}) = {di}")

SPD(Mexican) = 0.0014017257538768636
DI(Mexican) = 0.5556529360210342
SPD(Hispanic) = 0.003272247102713093
DI(Hispanic) = 0.34227833235466826

References

Kusner, Matt J., Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. “Counterfactual Fairness.” In Advances in Neural Information Processing Systems, 4066–76.

McIntyre, Frank, and Michael Simkovic. 2018. “Are Law Degrees as Valuable to Minorities?” International Review of Law and Economics 53: 23–37.

Footnotes

See Statistical parity difference.↩︎
See Disparate Impact Ratio.↩︎