Counterfactuals
A special type of Explainability.
Desiderata
According to Verma et al ^{1} the counterfactual desiderata is:
Validity
We assume that a counterfactual is valid if it solves the optimisation as states in Wachter et al^{2}. If we defined the loss function as
\[ L(x,x^{\prime},y^{\prime},\lambda)=\lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime}), \]we can define the counterfactual as
\[ \arg \underset{x^{\prime}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime}) \]where: * \(x \in \mathcal{X}\) is the original data point * \(x^{\prime} \in \mathcal{X}\) is the counterfactual * \(y^{\prime} \in \mathcal{Y}\) is the desired label * \(d\) is a distance metric to measure the distance between \(x\) and \(x^{\prime}\). this could be a L1 or L2 distance, a quadratic distance, etc.
Actionability
Still according to ^{2}, actionability refers to the ability of a counterfactual method to separate between mutable and immutable features. Immutable, and additionally legally protected features, shouldn't be changed by a counterfactual implementation. Formally, if we defined our set of mutable (or actionable) features as \(\mathcal{A}\), we have
\[ \arg \underset{x^{\prime} \in \mathcal{A}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime}) \]Sparsity
According to ^{2} Shorter counterfactuals are easier to understand and an effective counterfactual implementation should change the least amount of features as possible. If a sparsity penalty term is added to our definition
\[ g(x^{\prime}x) \]which increases the more features are changed and could be a L0 or L1 metric, for instance. We can then define the counterfactual as
\[ \arg \underset{x^{\prime} \in \mathcal{A}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime})+g(x^{\prime}x) \]Data manifold closeness
Still according to ^{2}, data manifold closeness is the property which guarantees that the counterfactual will be as close to the training data as possible. This can translate into a more "realistic" counterfactual, since it is possible that the counterfactual would take extreme or never seen before values in order to satisfy the previous conditions. Formally, we can write a penalty term for the adherence to the training data manifold, \(\mathcal{X}\) as \(l(x^{\prime};\mathcal{X})\) and the define the counterfactual as
\[ \arg \underset{x^{\prime} \in \mathcal{A}}{\min}\underset{\lambda}{\max} \lambda\cdot(\hat{f}(x^{\prime})−y^{\prime})^2+d(x,x^{\prime})+g(x^{\prime}x)+l(x^{\prime};\mathcal{X}) \]Causality
Causality refers to the property where feature changes will impact dependent features. That is, we no longer assume that all features are independent. This implies that the counterfactual method needs to mantain the causal relations between features.
Amortised inference
Amortised inference refers to the property of a counterfactual search to provide multiple counterfactuals for a single data point.
Alternative methods
Constraint solvers
An alternative method to find counterfactuals is to use constraint solvers. This is explored more indepth in Counterfactuals with Constraint Solvers.

Verma, Sahil, John Dickerson, and Keegan Hines. "Counterfactual Explanations for Machine Learning: A Review." arXiv preprint arXiv:2010.10596 (2020). ↩

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. "Counterfactual explanations without opening the black box: Automated decisions and the GDPR." Harv. JL & Tech. 31 (2017): 841. ↩↩↩↩