Title: | Counterfactual Explanations |
---|---|
Description: | Modular and unified R6-based interface for counterfactual explanation methods. The following methods are currently implemented: Burghmans et al. (2022) <doi:10.48550/arXiv.2104.07411>, Dandl et al. (2020) <doi:10.1007/978-3-030-58112-1_31> and Wexler et al. (2019) <doi:10.1109/TVCG.2019.2934619>. Optional extensions allow these methods to be applied to a variety of models and use cases. Once generated, the counterfactuals can be analyzed and visualized by provided functionalities. |
Authors: | Susanne Dandl [aut, cre] , Andreas Hofheinz [aut], Martin Binder [ctb], Giuseppe Casalicchio [ctb] |
Maintainer: | Susanne Dandl <[email protected]> |
License: | LGPL-3 |
Version: | 0.1.6 |
Built: | 2024-11-16 06:08:22 UTC |
Source: | https://github.com/dandls/counterfactuals |
Abstract base class for counterfactual explanation methods.
Child classes: CounterfactualMethodClassif, CounterfactualMethodRegr
new()
Creates a new CounterfactualMethod
object.
CounterfactualMethod$new( predictor, lower = NULL, upper = NULL, distance_function = NULL )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(character(1)
| function()
)
Either the name of an already implemented distance function
(currently 'gower' or 'gower_c') or a function having three arguments:
x
, y
, and data
. The function should return a double
matrix with
nrow(x)
rows and maximum nrow(y)
columns.
print()
Prints a CounterfactualMethod
object.
The method calls a (private) $print_parameters()
method which should be implemented by the leaf classes.
CounterfactualMethod$print()
clone()
The objects of this class are cloneable with this method.
CounterfactualMethod$clone(deep = FALSE)
deep
Whether to make a deep clone.
Abstract base class for counterfactual explanation methods for classifcation tasks.
CounterfactualMethodClassif
can only be initialized for classification tasks. Child classes inherit the (public)
$find_counterfactuals()
method, which calls a (private) $run()
method. This $run()
method should be implemented
by the child classes and return the counterfactuals as a data.table
(preferably) or a data.frame
.
Child classes: MOCClassif, WhatIfClassif, NICEClassif
counterfactuals::CounterfactualMethod
-> CounterfactualMethodClassif
new()
Creates a new CounterfactualMethodClassif
object.
CounterfactualMethodClassif$new( predictor, lower = NULL, upper = NULL, distance_function = NULL )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(function()
| NULL
)
A distance function that may be used by the leaf classes.
If specified, the function must have three arguments: x
, y
, and data
and return a double
matrix with nrow(x)
rows and nrow(y)
columns.
find_counterfactuals()
Runs the counterfactual method and returns the counterfactuals.
It searches for counterfactuals that have a predicted probability in the interval desired_prob
for the
desired_class
.
CounterfactualMethodClassif$find_counterfactuals( x_interest, desired_class = NULL, desired_prob = c(0.5, 1) )
x_interest
(data.table(1)
| data.frame(1)
)
A single row with the observation of interest.
desired_class
(character(1)
| NULL
)
The desired class. If NULL
(default) then predictor$class
is taken.
desired_prob
(numeric(1)
| numeric(2)
)
The desired predicted probability of the desired_class
. It can be a numeric scalar or a vector with two
numeric values that specify a probability interval.
For hard classification tasks this can be set to 0
or 1
, respectively.
A scalar is internally converted to an interval.
A Counterfactuals object containing the results.
clone()
The objects of this class are cloneable with this method.
CounterfactualMethodClassif$clone(deep = FALSE)
deep
Whether to make a deep clone.
Abstract base class for counterfactual explanation methods for regression tasks.
CounterfactualMethodRegr
can only be initialized for regression tasks. Child classes inherit the (public)
$find_counterfactuals()
method, which calls a (private) $run()
method. This $run()
method should be implemented
by the child classes and return the counterfactuals as a data.table
(preferably) or a data.frame
.
Child classes: MOCRegr, WhatIfRegr, NICERegr
counterfactuals::CounterfactualMethod
-> CounterfactualMethodRegr
new()
Creates a new CounterfactualMethodRegr object.
CounterfactualMethodRegr$new( predictor, lower = NULL, upper = NULL, distance_function = NULL )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(function()
| NULL
)
A distance function that may be used by the leaf classes.
If specified, the function must have three arguments: x
, y
, and data
and return a double
matrix with nrow(x)
rows and nrow(y)
columns.
find_counterfactuals()
Runs the counterfactual method and returns the counterfactuals.
It searches for counterfactuals that have a predicted outcome in the interval desired_outcome
.
CounterfactualMethodRegr$find_counterfactuals(x_interest, desired_outcome)
x_interest
(data.table(1)
| data.frame(1)
)
A single row with the observation of interest.
desired_outcome
(numeric(1)
| numeric(2)
)
The desired predicted outcome. It can be a numeric scalar or a vector with two numeric values that specify an
outcome interval. A scalar is internally converted to an interval.
A Counterfactuals object containing the results.
clone()
The objects of this class are cloneable with this method.
CounterfactualMethodRegr$clone(deep = FALSE)
deep
Whether to make a deep clone.
A Counterfactuals
object should be created by the $find_counterfactuals
method of CounterfactualMethodRegr
or CounterfactualMethodClassif.
It contains the counterfactuals and has several methods for their evaluation and visualization.
desired
(list(1)
| list(2)
)
A list
with the desired properties of the counterfactuals.
For regression tasks it has one element desired_outcome
(CounterfactualMethodRegr) and for
classification tasks two elements desired_class
and desired_prob
(CounterfactualMethodClassif).
data
(data.table
)
The counterfactuals for x_interest
.
x_interest
(data.table(1)
)
A single row with the observation of interest.
distance_function
(function()
)
The distance function used in the second and fourth evaluation measure.
The function must have three arguments:
x
, y
, and data
and return a numeric
matrix. If set to NULL
(default), then Gower distance (Gower 1971) is used.
method
(character
)
A single row with the observation of interest.
new()
Creates a new Counterfactuals
object.
This method should only be called by the $find_counterfactuals
methods of CounterfactualMethodRegr
and CounterfactualMethodClassif.
Counterfactuals$new( cfactuals, predictor, x_interest, param_set, desired, method = NULL )
cfactuals
(data.table
)
The counterfactuals. Must have the same column names and types as predictor$data$X
.
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
x_interest
(data.table(1)
| data.frame(1)
)
A single row with the observation of interest.
param_set
(ParamSet)
A ParamSet based on the features of predictor$data$X
.
desired
(list(1)
| list(2)
)
A list
with the desired properties of the counterfactuals. It should have one element desired_outcome
for
regression tasks (CounterfactualMethodRegr) and two elements desired_class
and desired_prob
for classification tasks (CounterfactualMethodClassif).
method
(character
)
Name of the method with which counterfactuals were generated. Default is
NULL which means that no name is provided.
evaluate()
Evaluates the counterfactuals. It returns the counterfactuals together with the evaluation measures
.
Counterfactuals$evaluate( measures = c("dist_x_interest", "dist_target", "no_changed", "dist_train", "minimality"), show_diff = FALSE, k = 1L, weights = NULL )
measures
(character
)
The name of one or more evaluation measures.
The following measures are available:
dist_x_interest
: The distance of a counterfactual to x_interest
measured by Gower's
dissimilarity measure (Gower 1971).
dist_target
: The absolute distance of the prediction for a counterfactual to the interval desired_outcome
(regression tasks) or desired_prob
(classification tasks).
no_changed
: The number of feature changes w.r.t. x_interest
.
dist_train
: The (weighted) distance to the k
nearest training data points measured by Gower's
dissimilarity measure (Gower 1971).
minimality
: The number of changed features that each could be set to the
value of x_interest
while keeping the desired prediction value.
show_diff
(logical(1)
)
Should the counterfactuals be displayed as their differences to x_interest
? Default is FALSE
.
If set to TRUE
, positive values for numeric features indicate an increase compared to the feature value in
x_interest
, negative values indicate a decrease. For factors, the feature value is displayed if it differs
from x_interest
; NA
means "no difference" in both cases.
k
(integerish(1)
)
How many nearest training points should be considered for computing the dist_train
measure? Default is 1L
.
weights
(numeric(k)
| NULL
)
How should the k
nearest training points be weighted when computing the dist_train
measure? If NULL
(default) then all k
points are weighted equally. If a numeric vector of length k
is given, the i-th element
specifies the weight of the i-th closest data point.
evaluate_set()
Evaluates a set of counterfactuals. It returns the evaluation measures
.
Counterfactuals$evaluate_set( measures = c("diversity", "no_nondom", "frac_nondom", "hypervolume"), nadir = NULL )
measures
(character
)
The name of one or more evaluation measures.
The following measures are available:
diversity
: Diversity of returned counterfactuals in the feature space
no_nondom
: Number of counterfactuals that are not dominated by other
counterfactuals.
frac_nondom
: Fraction of counterfactuals that are not dominated by
other counterfactuals
hypervolume
: Hypervolume of the induced Pareto front
nadir
(numeric
)
Max objective values to calculate dominated hypervolume.
Only considered, if hypervolume
is one of the measures
.
May be a scalar, in which case it is used for all four objectives,
or a vector of length 4.
Default is NULL, meaning the nadir point by Dandl et al. (2020) is used:
(min distance between prediction of x_interest
to desired_prob/_outcome
,
1, number of features, 1).
predict()
Returns the predictions for the counterfactuals.
Counterfactuals$predict()
subset_to_valid()
Subset data to those meeting the desired prediction,
Process could be reverted using revert_subset_to_valid()
.
Counterfactuals$subset_to_valid()
revert_subset_to_valid()
Subset data to those meeting the desired prediction,
Process could be reverted using revert_subset_to_valid()
.
Counterfactuals$revert_subset_to_valid()
plot_parallel()
Plots a parallel plot that connects the (scaled) feature values of each counterfactual and highlights
x_interest
in blue.
Counterfactuals$plot_parallel( feature_names = NULL, row_ids = NULL, digits_min_max = 2L )
feature_names
(character
| NULL
)
The names of the (numeric) features to display. If NULL
(default) all features are displayed.
row_ids
(integerish
| NULL
)
The row ids of the counterfactuals to display. If NULL
(default) all counterfactuals are displayed.
digits_min_max
Maximum number of digits for the minimum and maximum features values. Default is 2L
.
plot_freq_of_feature_changes()
Plots a bar chart with the frequency of feature changes across all counterfactuals.
Counterfactuals$plot_freq_of_feature_changes(subset_zero = FALSE)
subset_zero
(logical(1)
)
Should unchanged features be excluded from the plot? Default is FALSE
.
get_freq_of_feature_changes()
Returns the frequency of feature changes across all counterfactuals.
Counterfactuals$get_freq_of_feature_changes(subset_zero = FALSE)
subset_zero
(logical(1)
)
Should unchanged features be excluded? Default is FALSE
.
A (named) numeric
vector with the frequency of feature changes.
plot_surface()
Creates a surface plot for two features. x_interest
is represented as a white dot and
all counterfactuals that differ from x_interest
only in the two selected features are represented as black dots.
The tick marks next to the axes show the marginal distribution of the observed data (predictor$data$X
).
The exact plot type depends on the selected feature types and number of features:
2 numeric features: surface plot
2 non-numeric features: heatmap
1 numeric or non-numeric feature: line graph
Counterfactuals$plot_surface(feature_names, grid_size = 250L)
feature_names
(character(2)
)
The names of the features to plot.
grid_size
(integerish(1)
)
The grid size of the plot. It is ignored in case of two non-numeric
features. Default is 250L
.
print()
Prints the Counterfactuals
object.
Counterfactuals$print()
clone()
The objects of this class are cloneable with this method.
Counterfactuals$clone(deep = FALSE)
deep
Whether to make a deep clone.
Gower, J. C. (1971), "A general coefficient of similarity and some of its properties". Biometrics, 27, 623–637.
Computes the (absolute, pairwise) distance between the vector elements and an interval
dist_to_interval(x, interval)
dist_to_interval(x, interval)
x |
( |
interval |
( |
This function serves as an evaluation wrapper for some distance function. It checks that the output
of distance_function
is a numeric
matrix with nrow(x)
rows and nrow(y)
columns as expected.
eval_distance(distance_function, x, y, data = NULL)
eval_distance(distance_function, x, y, data = NULL)
distance_function |
( |
x |
( |
y |
( |
data |
( |
Creates a ParamSet for the columns of dt
. Depending on the class of a column, a different
Domain is created:
double
: p_dbl()
integer
: p_int()
character
: p_fct()
(with unique values as levels)
factor
: p_fct()
(with factor levels as levels)
make_param_set(dt, lower = NULL, upper = NULL)
make_param_set(dt, lower = NULL, upper = NULL)
dt |
( |
lower |
(numeric() | NULL) |
upper |
(numeric() | NULL) |
A ParamSet for the features of dt
.
MOC (Dandl et. al 2020) solves a multi-objective optimization problem to find counterfactuals. The four objectives to minimize are:
dist_target
: Distance to desired_prob
(classification tasks) or desired_prob
(regression tasks).
dist_x_interest
: Dissimilarity to x_interest
measured by Gower's dissimilarity measure (Gower 1971).
no_changed
: Number of feature changes.
dist_train
: (Weighted) sum of dissimilarities to the k
nearest data points in predictor$data$X
.
For optimization, it uses the NSGA II algorithm (Deb et. al 2002) with mixed integer evolutionary strategies (Li et al. 2013) and some tailored adjustments for the counterfactual search (Dandl et al. 2020). Default values for the hyperparameters are based on Dandl et al. 2020.
Several population initialization strategies are available:
random
: Feature values of new individuals are sampled from the feature value ranges in predictor$data$X
.
Some features values are randomly reset to their initial value in x_interest
.
sd
: Like random
, except that the sample ranges of numerical features are limited to one standard
deviation from their initial value in x_interest
.
icecurve
: As in random
, feature values are sampled from the feature value ranges in predictor$data$X
.
Then, however, features are reset with probabilities relative to their importance: the higher the importance
of a feature, the higher the probability that its values differ from its value in x_interest
.
The feature importance is measured using ICE curves (Goldstein et al. 2015).
traindata
: Contrary to the other strategies, feature values are drawn from (non-dominated) data points
in predictor$data$X
; if not enough non-dominated data points are available, remaining individuals
are initialized by random sampling. Subsequently, some features values are randomly reset to their initial value
in x_interest
(as for random
).
If use_conditional_mutator
is set to TRUE, a conditional mutator samples
feature values from the conditional distribution given the other feature values
with the help of transformation trees (Hothorn and Zeileis 2017).
For details see Dandl et al. 2020.
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodClassif
-> MOCClassif
optimizer
(OptimInstanceBatchMultiCrit)
The object used for optimization.
new()
Create a new MOCClassif
object.
MOCClassif$new( predictor, epsilon = NULL, fixed_features = NULL, max_changed = NULL, mu = 20L, termination_crit = "gens", n_generations = 175L, p_rec = 0.71, p_rec_gen = 0.62, p_mut = 0.73, p_mut_gen = 0.5, p_mut_use_orig = 0.4, k = 1L, weights = NULL, lower = NULL, upper = NULL, init_strategy = "icecurve", use_conditional_mutator = FALSE, quiet = FALSE, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
epsilon
(numeric(1)
| NULL
)
If not NULL
, candidates whose prediction for the desired_class
is farther away from the interval desired_prob
than epsilon
are penalized. NULL
(default) means no penalization.
fixed_features
(character()
| NULL
)
Names of features that are not allowed to be changed. NULL
(default) allows all features to be changed.
max_changed
(integerish(1)
| NULL
)
Maximum number of feature changes. NULL
(default) allows any number of changes.
mu
(integerish(1)
)
The population size. Default is 20L
.
termination_crit
(character(1)
|NULL
)
Termination criterion, currently, two criterions are implemented: "gens" (default),
which stops after n_generations
generations, and "genstag", which stops after
the hypervolume did not improve for n_generations
generations
(the total number of generations is limited to 500).
n_generations
(integerish(1)
)
The number of generations. Default is 175L
.
p_rec
(numeric(1)
)
Probability with which an individual is selected for recombination. Default is 0.71
.
p_rec_gen
(numeric(1)
)
Probability with which a feature/gene is selected for recombination. Default is 0.62
.
p_mut
(numeric(1)
)
Probability with which an individual is selected for mutation. Default is 0.73
.
p_mut_gen
(numeric(1)
)
Probability with which a feature/gene is selected for mutation. Default is 0.5
.
p_mut_use_orig
(numeric(1)
)
Probability with which a feature/gene is reset to its original value in x_interest
after mutation. Default is 0.4
.
k
(integerish(1)
)
The number of data points to use for the forth objective. Default is 1L
.
weights
(numeric(1) | numeric(k)
| NULL
)
The weights used to compute the weighted sum of dissimilarities for the forth objective. It is either a single value
or a vector of length k
. If it has length k
, the i-th element specifies the weight of the i-th closest data point.
The values should sum up to 1
. NULL
(default) means all data points are weighted equally.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
init_strategy
(character(1)
)
The population initialization strategy. Can be icecurve
(default), random
, sd
or traindata
. For more information,
see the Details
section.
use_conditional_mutator
(logical(1)
)
Should a conditional mutator be used? The conditional mutator generates plausible feature values based
on the values of the other feature. Default is FALSE
.
quiet
(logical(1)
)
Should information about the optimization status be hidden? Default is FALSE
.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function to be used in the second and fourth objective.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
plot_statistics()
Plots the evolution of the mean and minimum objective values together with the dominated hypervolume over the generations. All values for a generation are computed based on all non-dominated individuals that emerged until that generation.
MOCClassif$plot_statistics(centered_obj = TRUE)
centered_obj
(logical(1)
)
Should the objective values be centered? If set to FALSE
, each objective value is visualized in a separate plot,
since they (usually) have different scales. If set to TRUE
(default), they are visualized in a single plot.
get_dominated_hv()
Calculates the dominated hypervolume of each generation.
MOCClassif$get_dominated_hv()
A data.table
with the dominated hypervolume of each generation.
plot_search()
Visualizes two selected objective values of all emerged individuals in a scatter plot.
MOCClassif$plot_search(objectives = c("dist_target", "dist_x_interest"))
objectives
(character(2)
)
The two objectives to be shown in the plot. Possible values are "dist_target", "dist_x_interest, "no_changed",
and "dist_train".
clone()
The objects of this class are cloneable with this method.
MOCClassif$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dandl, S., Molnar, C., Binder, M., and Bischl, B. (2020). "Multi-Objective Counterfactual Explanations". In: Parallel Problem Solving from Nature – PPSN XVI, edited by Thomas Bäck, Mike Preuss, André Deutz, Hao Wang, Carola Doerr, Michael Emmerich, and Heike Trautmann, 448–469, Cham, Springer International Publishing, doi:10.1007/978-3-030-58112-1_31.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). "A fast and elitist multiobjective genetic algorithm: NSGA-II". IEEE transactions on evolutionary computation, 6(2), 182-197.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation". Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.
Hothorn, T., Zeileis, A. (2017), "Transformation Forests". Technical Report, arXiv 1701.02110.
Li, Rui, L., Emmerich, M. T. M., Eggermont, J. Bäck, T., Schütz, M., Dijkstra, J., Reiber, J. H. C. (2013). "Mixed Integer Evolution Strategies for Parameter Optimization." Evolutionary Computation 21 (1): 29–64. doi:10.1162/EVCO_a_00059.
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals for x_interest moc_classif = MOCClassif$new(predictor, n_generations = 15L, quiet = TRUE) cfactuals = moc_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values moc_classif$plot_statistics() }
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals for x_interest moc_classif = MOCClassif$new(predictor, n_generations = 15L, quiet = TRUE) cfactuals = moc_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values moc_classif$plot_statistics() }
MOC (Dandl et. al 2020) solves a multi-objective optimization problem to find counterfactuals. The four objectives to minimize are:
dist_target
: Distance to desired_prob
(classification tasks) or desired_prob
(regression tasks).
dist_x_interest
: Dissimilarity to x_interest
measured by Gower's dissimilarity measure (Gower 1971).
no_changed
: Number of feature changes.
dist_train
: (Weighted) sum of dissimilarities to the k
nearest data points in predictor$data$X
.
For optimization, it uses the NSGA II algorithm (Deb et. al 2002) with mixed integer evolutionary strategies (Li et al. 2013) and some tailored adjustments for the counterfactual search (Dandl et al. 2020). Default values for the hyperparameters are based on Dandl et al. 2020.
Several population initialization strategies are available:
random
: Feature values of new individuals are sampled from the feature value ranges in predictor$data$X
.
Some features values are randomly reset to their initial value in x_interest
.
sd
: Like random
, except that the sample ranges of numerical features are limited to one standard
deviation from their initial value in x_interest
.
icecurve
: As in random
, feature values are sampled from the feature value ranges in predictor$data$X
.
Then, however, features are reset with probabilities relative to their importance: the higher the importance
of a feature, the higher the probability that its values differ from its value in x_interest
.
The feature importance is measured using ICE curves (Goldstein et al. 2015).
traindata
: Contrary to the other strategies, feature values are drawn from (non-dominated) data points
in predictor$data$X
; if not enough non-dominated data points are available, remaining individuals
are initialized by random sampling. Subsequently, some features values are randomly reset to their initial value
in x_interest
(as for random
).
If use_conditional_mutator
is set to TRUE, a conditional mutator samples
feature values from the conditional distribution given the other feature values
with the help of transformation trees (Hothorn and Zeileis 2017).
For details see Dandl et al. 2020.
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodRegr
-> MOCRegr
optimizer
(OptimInstanceBatchMultiCrit)
The object used for optimization.
new()
Create a new MOCRegr
object.
MOCRegr$new( predictor, epsilon = NULL, fixed_features = NULL, max_changed = NULL, mu = 20L, termination_crit = "gens", n_generations = 175L, p_rec = 0.71, p_rec_gen = 0.62, p_mut = 0.73, p_mut_gen = 0.5, p_mut_use_orig = 0.4, k = 1L, weights = NULL, lower = NULL, upper = NULL, init_strategy = "icecurve", use_conditional_mutator = FALSE, quiet = FALSE, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
epsilon
(numeric(1)
| NULL
)
If not NULL
, candidates whose prediction is farther away from the interval desired_outcome
than epsilon
are penalized. NULL
(default) means no penalization.
fixed_features
(character()
| NULL
)
Names of features that are not allowed to be changed. NULL
(default) allows all features to be changed.
max_changed
(integerish(1)
| NULL
)
Maximum number of feature changes. NULL
(default) allows any number of changes.
mu
(integerish(1)
)
The population size. Default is 20L
.
termination_crit
(character(1)
|NULL
)
Termination criterion, currently, two criterions are implemented: "gens" (default),
which stops after n_generations
generations, and "genstag", which stops after
the hypervolume did not improve for n_generations
generations
(the total number of generations is limited to 500).
n_generations
(integerish(1)
)
The number of generations. Default is 175L
.
p_rec
(numeric(1)
)
Probability with which an individual is selected for recombination. Default is 0.71
.
p_rec_gen
(numeric(1)
)
Probability with which a feature/gene is selected for recombination. Default is 0.62
.
p_mut
(numeric(1)
)
Probability with which an individual is selected for mutation. Default is 0.73
.
p_mut_gen
(numeric(1)
)
Probability with which a feature/gene is selected for mutation. Default is 0.5
.
p_mut_use_orig
(numeric(1)
)
Probability with which a feature/gene is reset to its original value in x_interest
after mutation. Default is 0.4
.
k
(integerish(1)
)
The number of data points to use for the forth objective. Default is 1L
.
weights
(numeric(1) | numeric(k)
| NULL
)
The weights used to compute the weighted sum of dissimilarities for the forth objective. It is either a single value
or a vector of length k
. If it has length k
, the i-th element specifies the weight of the i-th closest data point.
The values should sum up to 1
. NULL
(default) means all data points are weighted equally.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
init_strategy
(character(1)
)
The population initialization strategy. Can be icecurve
(default), random
, sd
or traindata
. For more information,
see the Details
section.
use_conditional_mutator
(logical(1)
)
Should a conditional mutator be used? The conditional mutator generates plausible feature values based
on the values of the other feature. Default is FALSE
.
quiet
(logical(1)
)
Should information about the optimization status be hidden? Default is FALSE
.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function to be used in the second and fourth objective.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
plot_statistics()
Plots the evolution of the mean and minimum objective values together with the dominated hypervolume over the generations. All values for a generation are computed based on all non-dominated individuals that emerged until that generation.
MOCRegr$plot_statistics(centered_obj = TRUE)
centered_obj
(logical(1)
)
Should the objective values be centered? If set to FALSE
, each objective value is visualized in a separate plot,
since they (usually) have different scales. If set to TRUE
(default), they are visualized in a single plot.
get_dominated_hv()
Calculates the dominated hypervolume of each generation.
MOCRegr$get_dominated_hv()
A data.table
with the dominated hypervolume of each generation.
plot_search()
Visualizes two selected objective values of all emerged individuals in a scatter plot.
MOCRegr$plot_search(objectives = c("dist_target", "dist_x_interest"))
objectives
(character(2)
)
The two objectives to be shown in the plot. Possible values are "dist_target", "dist_x_interest, "no_changed",
and "dist_train".
clone()
The objects of this class are cloneable with this method.
MOCRegr$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dandl, S., Molnar, C., Binder, M., and Bischl, B. (2020). "Multi-Objective Counterfactual Explanations". In: Parallel Problem Solving from Nature – PPSN XVI, edited by Thomas Bäck, Mike Preuss, André Deutz, Hao Wang, Carola Doerr, Michael Emmerich, and Heike Trautmann, 448–469, Cham, Springer International Publishing, doi:10.1007/978-3-030-58112-1_31.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). "A fast and elitist multiobjective genetic algorithm: NSGA-II". IEEE transactions on evolutionary computation, 6(2), 182-197.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation". Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.
Hothorn, T., Zeileis, A. (2017), "Transformation Forests". Technical Report, arXiv 1701.02110.
Li, Rui, L., Emmerich, M. T. M., Eggermont, J. Bäck, T., Schütz, M., Dijkstra, J., Reiber, J. H. C. (2013). "Mixed Integer Evolution Strategies for Parameter Optimization." Evolutionary Computation 21 (1): 29–64. doi:10.1162/EVCO_a_00059.
if (require("randomForest")) { # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals for x_interest moc_regr = MOCRegr$new(predictor, n_generations = 15L, quiet = TRUE) cfactuals = moc_regr$find_counterfactuals(x_interest = mtcars[1L, ], desired_outcome = c(22, 26)) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values moc_regr$plot_statistics() }
if (require("randomForest")) { # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals for x_interest moc_regr = MOCRegr$new(predictor, n_generations = 15L, quiet = TRUE) cfactuals = moc_regr$find_counterfactuals(x_interest = mtcars[1L, ], desired_outcome = c(22, 26)) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values moc_regr$plot_statistics() }
NICE (Brughmans and Martens 2021) searches for counterfactuals by iteratively replacing feature values
of x_interest
with the corresponding value of its most similar (optionally correctly classified) instance x_nn
.
NICE starts the counterfactual search for x_interest
by finding its most similar (optionally) correctly classified
neighbor x_nn
.
In the first iteration, NICE creates new instances by replacing a different feature value of x_interest
with the corresponding
value of x_nn
in each new instance. Thus, if x_nn
differs from x_interest
in d
features, d
new instances are created.
Then, the reward values for the created instances are computed with the chosen reward function.
Available reward functions are sparsity
, proximity
, and plausibility
.
In the second iteration, NICE creates d-1
new instances by replacing a different feature value of the highest
reward instance of the previous iteration with the corresponding value of x_interest
, and so on.
If finish_early = TRUE
, the algorithm terminates when the predicted desired_class
probability for
the highest reward instance is in the interval desired_prob
; if finish_early = FALSE
, the
algorithm continues until x_nn
is recreated.
Once the algorithm terminated, it depends on return_multiple
which instances
are returned as counterfactuals: if return_multiple = FALSE
, then only the highest reward instance in the
last iteration is returned as counterfactual; if return_multiple = TRUE
, then all instances (of all iterations)
whose predicted desired_class
probability is in the interval desired_prob
are returned as counterfactuals.
If finish_early = FALSE
and return_multiple = FALSE
, then x_nn
is returned as single counterfactual.
This NICE implementation corresponds to the original version of Brughmans and Martens (2021) when
return_multiple = FALSE
, finish_early = TRUE
, and x_nn_correct = TRUE
.
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodClassif
-> NICEClassif
x_nn
(logical(1)
)
The most similar (optionally) correctly classified instance of x_interest
.
archive
(list()
)
A list that stores the history of the algorithm run. For each algorithm iteration, it has one element containing
a data.table
, which stores all created instances of this iteration together with their
reward values and their predictions.
new()
Create a new NICEClassif object.
NICEClassif$new( predictor, optimization = "sparsity", x_nn_correct = TRUE, return_multiple = FALSE, finish_early = TRUE, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
optimization
(character(1)
)
The reward function to optimize. Can be sparsity
(default), proximity
or plausibility
.
x_nn_correct
(logical(1)
)
Should only correctly classified data points in predictor$data$X
be considered for the most similar instance search?
Default is TRUE
.
return_multiple
(logical(1)
)
Should multiple counterfactuals be returned? If TRUE, the algorithm returns all created instances whose desired_class
prediction is in the interval desired_prob
. For more information, see the Details
section.
finish_early
(logical(1)
)
Should the algorithm terminate after an iteration in which the desired_class
prediction for the highest reward instance
is in the interval desired_prob
. If FALSE
, the algorithm continues until x_nn
is recreated.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function used to compute the distances between x_interest
and the training data points for finding x_nn
. If optimization
is set
to proximity
, the distance function is also used for calculating the
distance between candidates and x_interest
.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function is allowed as input.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
clone()
The objects of this class are cloneable with this method.
NICEClassif$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brughmans, D., & Martens, D. (2021). NICE: An Algorithm for Nearest Instance Counterfactual Explanations. arXiv 2104.07411 v2.
Gower, J. C. (1971), "A general coefficient of similarity and some of its properties". Biometrics, 27, 623–637.
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals nice_classif = NICEClassif$new(predictor) cfactuals = nice_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the results cfactuals$data # Print archive nice_classif$archive }
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals nice_classif = NICEClassif$new(predictor) cfactuals = nice_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the results cfactuals$data # Print archive nice_classif$archive }
NICE (Brughmans and Martens 2021) searches for counterfactuals by iteratively replacing feature values
of x_interest
with the corresponding value of its most similar (optionally correctly predicted) instance x_nn
.
While the original method is only applicable to classification tasks (see NICEClassif), this implementation extend it to regression tasks.
NICE starts the counterfactual search for x_interest
by finding its most similar (optionally) correctly predicted
neighbor x_nn
with(in) the desired prediction (range). Correctly predicted means that the prediction of x_nn
is less
than a user-specified margin_correct
away from the true outcome of x_nn
.
This is designed to mimic the search for x_nn
for regression tasks.
If no x_nn
satisfies this constraint, a warning is returned that no counterfactual could be found.
In the first iteration, NICE creates new instances by replacing a different feature value of x_interest
with the corresponding
value of x_nn
in each new instance. Thus, if x_nn
differs from x_interest
in d
features, d
new instances are created.
Then, the reward values for the created instances are computed with the chosen reward function.
Available reward functions are sparsity
, proximity
, and plausibility
.
In the second iteration, NICE creates d-1
new instances by replacing a different feature value of the highest
reward instance of the previous iteration with the corresponding value of x_interest
, and so on.
If finish_early = TRUE
, the algorithm terminates when the predicted outcome for
the highest reward instance is in the interval desired_outcome
; if finish_early = FALSE
, the
algorithm continues until x_nn
is recreated.
Once the algorithm terminated, it depends on return_multiple
which instances
are returned as counterfactuals: if return_multiple = FALSE
, then only the highest reward instance in the
last iteration is returned as counterfactual; if return_multiple = TRUE
, then all instances (of all iterations)
whose predicted outcome is in the interval desired_outcome
are returned as counterfactuals.
If finish_early = FALSE
and return_multiple = FALSE
, then x_nn
is returned as single counterfactual.
The function computes the dissimilarities using Gower's dissimilarity measure (Gower 1971).
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodRegr
-> NICERegr
x_nn
(logical(1)
)
The most similar (optionally) correctly classified instance of x_interest
.
archive
(list()
)
A list that stores the history of the algorithm run. For each algorithm iteration, it has one element containing
a data.table
, which stores all created instances of this iteration together with their
reward values and their predictions.
new()
Create a new NICERegr object.
NICERegr$new( predictor, optimization = "sparsity", x_nn_correct = TRUE, margin_correct = NULL, return_multiple = FALSE, finish_early = TRUE, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
optimization
(character(1)
)
The reward function to optimize. Can be sparsity
(default), proximity
or plausibility
.
x_nn_correct
(logical(1)
)
Should only correctly classified data points in predictor$data$X
be considered for the most similar instance search?
Default is TRUE
.
margin_correct
(numeric(1)
| NULL
)
The accepted margin for considering a prediction as "correct".
Ignored if x_nn_correct = FALSE
.
If NULL, the accepted margin is set to half the median absolute distance between the true and predicted outcomes in the data (predictor$data
).
return_multiple
(logical(1)
)
Should multiple counterfactuals be returned? If TRUE, the algorithm returns all created instances whose
prediction is in the interval desired_outcome
. For more information, see the Details
section.
finish_early
(logical(1)
)
Should the algorithm terminate after an iteration in which the prediction for the highest reward instance
is in the interval desired_outcome
. If FALSE
, the algorithm continues until x_nn
is recreated.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function used to compute the distances between x_interest
and the training data points for finding x_nn
. If optimization
is set
to proximity
, the distance function is also used for calculating the
distance between candidates and x_interest
.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function is allowed as input.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
clone()
The objects of this class are cloneable with this method.
NICERegr$clone(deep = FALSE)
deep
Whether to make a deep clone.
Brughmans, D., & Martens, D. (2021). NICE: An Algorithm for Nearest Instance Counterfactual Explanations. arXiv 2104.07411 v2.
Gower, J. C. (1971), "A general coefficient of similarity and some of its properties". Biometrics, 27, 623–637.
if (require("randomForest")) { set.seed(123456) # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals nice_regr = NICERegr$new(predictor) cfactuals = nice_regr$find_counterfactuals( x_interest = mtcars[1L, ], desired_outcome = c(22, 26) ) # Print the results cfactuals$data # Print archive nice_regr$archive }
if (require("randomForest")) { set.seed(123456) # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals nice_regr = NICERegr$new(predictor) cfactuals = nice_regr$find_counterfactuals( x_interest = mtcars[1L, ], desired_outcome = c(22, 26) ) # Print the results cfactuals$data # Print archive nice_regr$archive }
RandomSearch randomly samples a population of candidates and returns non-dominated candidates w.r.t to the objectives
of MOC (Dandl et. al 2020) as counterfactuals. RandomSearch is equivalent to MOC with zero generations and the random
initialization strategy.
The four objectives of MOC (Dandl et. al 2020) to are:
Distance to desired_prob
(classification tasks) or desired_prob
(regression tasks).
Dissimilarity to x_interest
measured by Gower's dissimilarity measure (Gower 1971).
Number of feature changes.
(Weighted) sum of dissimilarities to the k
nearest data points in predictor$data$X
.
RandomSearch is typically used as a baseline in benchmark comparisons with MOC.
The total number of samples drawn is mu
* n_generations
. Using separate parameters mu
and n_generations
is only required to make certain statistics comparable with MOC (e.g. the evolution of the dominated hypervolume).
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodClassif
-> RandomSearchClassif
optimizer
(OptimInstanceBatchMultiCrit)
The object used for optimization.
new()
Create a new RandomSearchClassif
object.
RandomSearchClassif$new( predictor, fixed_features = NULL, max_changed = NULL, mu = 20L, n_generations = 175L, p_use_orig = 0.5, k = 1L, weights = NULL, lower = NULL, upper = NULL, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
fixed_features
(character()
| NULL
)
Names of features that are not allowed to be changed. NULL
(default) allows all features to be changed.
max_changed
(integerish(1)
| NULL
)
Maximum number of feature changes. NULL
(default) allows any number of changes.
mu
(integerish(1)
)
The population size. Default is 20L
. The total number of random samples is set to mu * n_generations
.
See the Details
for further details.
n_generations
(integerish(1)
)
The number of generations. Default is 175L
. The total number of random samples is set to mu * n_generations
.
See the Details
section for further details.
p_use_orig
(numeric(1)
)
Probability with which a feature/gene is reset to its original value in x_interest
after random sampling. Default is 0.5
.
k
(integerish(1)
)
The number of data points to use for the forth objective. Default is 1L
.
weights
(numeric(1) | numeric(k)
| NULL
)
The weights used to compute the weighted sum of dissimilarities for the forth objective. It is either a single value
or a vector of length k
. If it has length k
, the i-th element specifies the weight of the i-th closest data point.
The values should sum up to 1
. NULL
(default) means all data points are weighted equally.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function to be used in the second and fourth objective.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
plot_statistics()
Plots the evolution of the mean and minimum objective values together with the dominated hypervolume over
the generations. All values for a generation are computed based on all non-dominated individuals that emerged until
that generation. The randomly drawn samples are therefore split into n_generations
folds of size mu.
This function mimics MOCs plot_statistics()
method. See the Details
section for further information.
RandomSearchClassif$plot_statistics(centered_obj = TRUE)
centered_obj
(logical(1)
)
Should the objective values be centered? If set to FALSE
, each objective value is visualized in a separate plot,
since they (usually) have different scales. If set to TRUE
(default), they are visualized in a single plot.
get_dominated_hv()
Calculates the dominated hypervolume of each generation. The randomly drawn samples are therefore split
into n_generations
folds of size mu.
This function mimics MOCs get_dominated_hv()
method. See the Details
section for further information.
RandomSearchClassif$get_dominated_hv()
A data.table
with the dominated hypervolume of each generation.
plot_search()
Visualizes two selected objective values of all emerged individuals in a scatter plot.
The randomly drawn samples are therefore split into n_generations
folds of size mu.
This function mimics MOCs plot_search()
method. See the Details
section for further information.
RandomSearchClassif$plot_search( objectives = c("dist_target", "dist_x_interest") )
objectives
(character(2)
)
The two objectives to be shown in the plot. Possible values are "dist_target", "dist_x_interest, "no_changed",
and "dist_train".
clone()
The objects of this class are cloneable with this method.
RandomSearchClassif$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dandl, S., Molnar, C., Binder, M., and Bischl, B. (2020). "Multi-Objective Counterfactual Explanations". In: Parallel Problem Solving from Nature – PPSN XVI, edited by Thomas Bäck, Mike Preuss, André Deutz, Hao Wang, Carola Doerr, Michael Emmerich, and Heike Trautmann, 448–469, Cham, Springer International Publishing, doi:10.1007/978-3-030-58112-1_31.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). "A fast and elitist multiobjective genetic algorithm: NSGA-II". IEEE transactions on evolutionary computation, 6(2), 182-197.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation". Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.
Li, Rui, L., Emmerich, M. T. M., Eggermont, J. Bäck, T., Schütz, M., Dijkstra, J., Reiber, J. H. C. (2013). "Mixed Integer Evolution Strategies for Parameter Optimization." Evolutionary Computation 21 (1): 29–64. doi:10.1162/EVCO_a_00059.
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals for x_interest rs_classif = RandomSearchClassif$new(predictor, n_generations = 30L) cfactuals = rs_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values rs_classif$plot_statistics() }
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals for x_interest rs_classif = RandomSearchClassif$new(predictor, n_generations = 30L) cfactuals = rs_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values rs_classif$plot_statistics() }
RandomSearch randomly samples a population of candidates and returns non-dominated candidates w.r.t to the objectives
of MOC (Dandl et. al 2020) as counterfactuals. RandomSearch is equivalent to MOC with zero generations and the random
initialization strategy.
The four objectives of MOC (Dandl et. al 2020) to are:
Distance to desired_prob
(classification tasks) or desired_prob
(regression tasks).
Dissimilarity to x_interest
measured by Gower's dissimilarity measure (Gower 1971).
Number of feature changes.
(Weighted) sum of dissimilarities to the k
nearest data points in predictor$data$X
.
RandomSearch is typically used as a baseline in benchmark comparisons with MOC.
The total number of samples drawn is mu
* n_generations
. Using separate parameters mu
and n_generations
is only required to make certain statistics comparable with MOC (e.g. the evolution of the dominated hypervolume).
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodRegr
-> RandomSearchRegr
optimizer
(OptimInstanceBatchMultiCrit)
The object used for optimization.
new()
Create a new RandomSearchRegr
object.
RandomSearchRegr$new( predictor, fixed_features = NULL, max_changed = NULL, mu = 20L, n_generations = 175L, p_use_orig = 0.5, k = 1L, weights = NULL, lower = NULL, upper = NULL, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
fixed_features
(character()
| NULL
)
Names of features that are not allowed to be changed. NULL
(default) allows all features to be changed.
max_changed
(integerish(1)
| NULL
)
Maximum number of feature changes. NULL
(default) allows any number of changes.
mu
(integerish(1)
)
The population size. Default is 20L
. The total number of random samples is set to mu * n_generations
.
See the Details
section for further details.
n_generations
(integerish(1)
)
The number of generations. Default is 175L
. The total number of random samples is set to mu * n_generations
.
See the Details
section for further details.
p_use_orig
(numeric(1)
)
Probability with which a feature/gene is reset to its original value in x_interest
after random sampling. Default is 0.5
.
k
(integerish(1)
)
The number of data points to use for the forth objective. Default is 1L
.
weights
(numeric(1) | numeric(k)
| NULL
)
The weights used to compute the weighted sum of dissimilarities for the forth objective. It is either a single value
or a vector of length k
. If it has length k
, the i-th element specifies the weight of the i-th closest data point.
The values should sum up to 1
. NULL
(default) means all data points are weighted equally.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function to be used in the second and fourth objective.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
plot_statistics()
Plots the evolution of the mean and minimum objective values together with the dominated hypervolume over
the generations. All values for a generation are computed based on all non-dominated individuals that emerged until
that generation. The randomly drawn samples are therefore split into n_generations
folds of size mu.
This function mimics MOCs plot_statistics()
method. See the Details
section for further information.
RandomSearchRegr$plot_statistics(centered_obj = TRUE)
centered_obj
(logical(1)
)
Should the objective values be centered? If set to FALSE
, each objective value is visualized in a separate plot,
since they (usually) have different scales. If set to TRUE
(default), they are visualized in a single plot.
get_dominated_hv()
Calculates the dominated hypervolume of each generation. The randomly drawn samples are therefore split
into n_generations
folds of size mu.
This function mimics MOCs get_dominated_hv()
method. See the Details
section for further information.
RandomSearchRegr$get_dominated_hv()
A data.table
with the dominated hypervolume of each generation.
plot_search()
Visualizes two selected objective values of all emerged individuals in a scatter plot.
The randomly drawn samples are therefore split into n_generations
folds of size mu.
This function mimics MOCs plot_search()
method. See the Details
section for further information.
RandomSearchRegr$plot_search(objectives = c("dist_target", "dist_x_interest"))
objectives
(character(2)
)
The two objectives to be shown in the plot. Possible values are "dist_target", "dist_x_interest, "no_changed",
and "dist_train".
clone()
The objects of this class are cloneable with this method.
RandomSearchRegr$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dandl, S., Molnar, C., Binder, M., and Bischl, B. (2020). "Multi-Objective Counterfactual Explanations". In: Parallel Problem Solving from Nature – PPSN XVI, edited by Thomas Bäck, Mike Preuss, André Deutz, Hao Wang, Carola Doerr, Michael Emmerich, and Heike Trautmann, 448–469, Cham, Springer International Publishing, doi:10.1007/978-3-030-58112-1_31.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. A. M. T. (2002). "A fast and elitist multiobjective genetic algorithm: NSGA-II". IEEE transactions on evolutionary computation, 6(2), 182-197.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). "Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation". Journal of Computational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 623–637.
Li, Rui, L., Emmerich, M. T. M., Eggermont, J. Bäck, T., Schütz, M., Dijkstra, J., Reiber, J. H. C. (2013). "Mixed Integer Evolution Strategies for Parameter Optimization." Evolutionary Computation 21 (1): 29–64. doi:10.1162/EVCO_a_00059.
if (require("randomForest")) { # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals for x_interest rs_regr = RandomSearchRegr$new(predictor, n_generations = 30L) cfactuals = rs_regr$find_counterfactuals(x_interest = mtcars[1L, ], desired_outcome = c(22, 26)) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values rs_regr$plot_statistics() }
if (require("randomForest")) { # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals for x_interest rs_regr = RandomSearchRegr$new(predictor, n_generations = 30L) cfactuals = rs_regr$find_counterfactuals(x_interest = mtcars[1L, ], desired_outcome = c(22, 26)) # Print the counterfactuals cfactuals$data # Plot evolution of hypervolume and mean and minimum objective values rs_regr$plot_statistics() }
Returns the indices of the n smallest elements in a vector
smallest_n_indices(x, n = 1L)
smallest_n_indices(x, n = 1L)
x |
( |
n |
( |
WhatIf returns the n_counterfactual
most similar observations to x_interest
from observations in predictor$data$X
whose prediction for the desired_class
is in the desired_prob
interval.
By default, the dissimilarities are computed using Gower's dissimilarity measure (Gower 1971).
Only observations whose features values lie between the corresponding values in lower
and upper
are considered
counterfactual candidates.
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodClassif
-> WhatIfClassif
new()
Create a new WhatIfClassif object.
WhatIfClassif$new( predictor, n_counterfactuals = 1L, lower = NULL, upper = NULL, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
n_counterfactuals
(integerish(1)
)
The number of counterfactuals to return. Default is 1L
.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function used to compute the distances between x_interest
and the training data points for finding x_nn
.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
clone()
The objects of this class are cloneable with this method.
WhatIfClassif$clone(deep = FALSE)
deep
Whether to make a deep clone.
Gower, J. C. (1971), "A general coefficient of similarity and some of its properties". Biometrics, 27, 623–637.
Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., & Wilson, J. (2019). The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics, 26(1), 56–65.
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals for x_interest wi_classif = WhatIfClassif$new(predictor, n_counterfactuals = 5L) cfactuals = wi_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the results cfactuals$data }
if (require("randomForest")) { # Train a model rf = randomForest(Species ~ ., data = iris) # Create a predictor object predictor = iml::Predictor$new(rf, type = "prob") # Find counterfactuals for x_interest wi_classif = WhatIfClassif$new(predictor, n_counterfactuals = 5L) cfactuals = wi_classif$find_counterfactuals( x_interest = iris[150L, ], desired_class = "versicolor", desired_prob = c(0.5, 1) ) # Print the results cfactuals$data }
WhatIf returns the n_counterfactual
most similar observations to x_interest
from observations in predictor$data$X
whose prediction is in the desired_outcome
interval.
Only observations whose features values lie between the corresponding values in lower
and upper
are considered
counterfactual candidates.
counterfactuals::CounterfactualMethod
-> counterfactuals::CounterfactualMethodRegr
-> WhatIfRegr
new()
Create a new WhatIfRegr object.
WhatIfRegr$new( predictor, n_counterfactuals = 1L, lower = NULL, upper = NULL, distance_function = "gower" )
predictor
(Predictor)
The object (created with iml::Predictor$new()
) holding the machine learning model and the data.
n_counterfactuals
(integerish(1)
)
The number of counterfactuals to return Default is 1L
.
lower
(numeric()
| NULL
)
Vector of minimum values for numeric features.
If NULL
(default), the element for each numeric feature in lower
is taken as its minimum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
upper
(numeric()
| NULL
)
Vector of maximum values for numeric features.
If NULL
(default), the element for each numeric feature in upper
is taken as its maximum value in predictor$data$X
.
If not NULL
, it should be named with the corresponding feature names.
distance_function
(function()
| 'gower'
| 'gower_c'
)
The distance function used to compute the distances between x_interest
and the training data points for finding x_nn
.
Either the name of an already implemented distance function
('gower' or 'gower_c') or a function.
If set to 'gower' (default), then Gower's distance (Gower 1971) is used;
if set to 'gower_c', a C-based more efficient version of Gower's distance is used.
A function must have three arguments x
, y
, and data
and should
return a double
matrix with nrow(x)
rows and maximum nrow(y)
columns.
clone()
The objects of this class are cloneable with this method.
WhatIfRegr$clone(deep = FALSE)
deep
Whether to make a deep clone.
Gower, J. C. (1971), "A general coefficient of similarity and some of its properties". Biometrics, 27, 623–637.
Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., & Wilson, J. (2019). The what-if tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics, 26(1), 56–65.
if (require("randomForest")) { set.seed(123456) # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals for x_interest wi_regr = WhatIfRegr$new(predictor, n_counterfactuals = 5L) cfactuals = wi_regr$find_counterfactuals( x_interest = mtcars[1L, ], desired_outcome = c(22, 26) ) # Print the results cfactuals }
if (require("randomForest")) { set.seed(123456) # Train a model rf = randomForest(mpg ~ ., data = mtcars) # Create a predictor object predictor = iml::Predictor$new(rf) # Find counterfactuals for x_interest wi_regr = WhatIfRegr$new(predictor, n_counterfactuals = 5L) cfactuals = wi_regr$find_counterfactuals( x_interest = mtcars[1L, ], desired_outcome = c(22, 26) ) # Print the results cfactuals }