Counterfactuals

A special type of Explainability.

Desiderata

According to Verma et al 1 the counterfactual desiderata is:

Validity

We assume that a counterfactual is valid if it solves the optimisation as states in Wachter et al2. If we defined the loss function as

\[ L(x,x^{\prime},y^{\prime},\lambda)=\lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime}), \]

we can define the counterfactual as

\[ \arg \underset{x^{\prime}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime}) \]

where: * \(x \in \mathcal{X}\) is the original data point * \(x^{\prime} \in \mathcal{X}\) is the counterfactual * \(y^{\prime} \in \mathcal{Y}\) is the desired label * \(d\) is a distance metric to measure the distance between \(x\) and \(x^{\prime}\). this could be a L1 or L2 distance, a quadratic distance, etc.

Actionability

Still according to 2, actionability refers to the ability of a counterfactual method to separate between mutable and immutable features. Immutable, and additionally legally protected features, shouldn't be changed by a counterfactual implementation. Formally, if we defined our set of mutable (or actionable) features as \(\mathcal{A}\), we have

\[ \arg \underset{x^{\prime} \in \mathcal{A}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime}) \]

Sparsity

According to 2 Shorter counterfactuals are easier to understand and an effective counterfactual implementation should change the least amount of features as possible. If a sparsity penalty term is added to our definition

\[ g(x^{\prime}-x) \]

which increases the more features are changed and could be a L0 or L1 metric, for instance. We can then define the counterfactual as

\[ \arg \underset{x^{\prime} \in \mathcal{A}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime})+g(x^{\prime}-x) \]

Data manifold closeness

Still according to 2, data manifold closeness is the property which guarantees that the counterfactual will be as close to the training data as possible. This can translate into a more "realistic" counterfactual, since it is possible that the counterfactual would take extreme or never seen before values in order to satisfy the previous conditions. Formally, we can write a penalty term for the adherence to the training data manifold, \(\mathcal{X}\) as \(l(x^{\prime};\mathcal{X})\) and the define the counterfactual as

\[ \arg \underset{x^{\prime} \in \mathcal{A}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime})+g(x^{\prime}-x)+l(x^{\prime};\mathcal{X}) \]

Causality

Causality refers to the property where feature changes will impact dependent features. That is, we no longer assume that all features are independent. This implies that the counterfactual method needs to mantain the causal relations between features.

Amortised inference

Amortised inference refers to the property of a counterfactual search to provide multiple counterfactuals for a single data point.

Alternative methods

Constraint solvers

An alternative method to find counterfactuals is to use constraint solvers. This is explored more in-depth in Counterfactuals with Constraint Solvers.


  1. Verma, Sahil, John Dickerson, and Keegan Hines. "Counterfactual Explanations for Machine Learning: A Review." arXiv preprint arXiv:2010.10596 (2020). 

  2. Wachter, Sandra, Brent Mittelstadt, and Chris Russell. "Counterfactual explanations without opening the black box: Automated decisions and the GDPR." Harv. JL & Tech. 31 (2017): 841.