Counterfactuals with Constraint Solvers
Scoring
An implementation on how to calculate counterfactuals with Constraint Solvers (namely OptaPlanner) is available here.
This implementation satisfies several criteria of the counterfactuals.
The penalisation score is represented with a BendableBigDecimalScore
1, having three “hard” levels and two “soft” levels.
The first hard level component, 1, penalises the score according to the distance between the prediction,
The actionability is score with 2. This component penalises the score according to number of immutable features which were changed in the counterfactual.
A confidence score component, 3 is use to, optionally, impose a minimum confidence threshold to the counterfactual’s associated prediction,
Finally, the feature distance, 4, penalises the score according to the feature distance. This is the representation of
In the concrete implementation linked above, the distance,
Implementation

Entities are defined by classes such as Integer
, Categorical
, Boolean
or Float
, as shown in 5.
Each of the features, shown in 6, is created as an instance of one of these entities. For instance, feature1
would be of type Integer
and feature2
would be of type Categorical
, etc.
The original data point,
A planning solution (PlanningSolution
), illustrated in 7 will produce candidate solutions (shown in 8)
For each solution, we propose a new set of features (
In the following section we will look at how each component is calculated. We will refer to each “hard” level component as
Prediction distance
The first component of the score, 1 is established by sending the proposed counterfactual
Tolerance
For numerical features, the above score (
We can solve this problem by introducing a “tolerance” adjustment, which allows proposed values to be accepted if they are “close enough” to the goal.
To make the tolerance scale-invariant and unit-less we can use a relative change and set the distance to zero, if smaller than the threshold
and compare to the threshold

This would however fail for the edge case where

To solve this, we can introduce a special case for
So that we have now the desired behaviour at

Gower distance
An alternative metric for the outcome distance (and mixed variables in general) is the site/Machine learning/Gower distance.
Actionability score
For the second component, the actionability score, 2. We calculate the number of features for the protected set
Confidence score
For each feature
Feature distance
Considering that each datapoint
Since in many scenarios we might not have access to the training data, the above distance are not normalised. In the event that we do have access to training data, then we can use the standard deviation (
so that, in this case, we scale the numerical features with
Searching
To search for a counterfactual, we start by specifying a search domain for each feature. This will include:
- An upper and lower bounds for numerical features, respectively
- A set of categories for categorical features,
for the specific case of boolean/binary values
Typically these values would be either established by someone with domain knowledge, or by values that might reflect our expectation for the actual counterfactual (for instance, an ~age~ would have realistic values).
The algorithm used for the search is Tabu search3 (Glover, 1989).
