econml.cate_interpreter.SingleTreeCateInterpreter

class econml.cate_interpreter.SingleTreeCateInterpreter(*, include_model_uncertainty=False, uncertainty_level=0.05, uncertainty_only_on_leaves=True, splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0)[source]

Bases: econml.cate_interpreter._interpreters._SingleTreeInterpreter

An interpreter for the effect estimated by a CATE estimator

Parameters

include_model_uncertainty (bool, default False) – Whether to include confidence interval information when building a simplified model of the cate model. If set to True, then cate estimator needs to support the const_marginal_ate_inference method.
uncertainty_level (double, default 0.05) – The uncertainty level for the confidence intervals to be constructed and used in the simplified model creation. If value=alpha then a multitask decision tree will be built such that all samples in a leaf have similar target prediction but also similar alpha confidence intervals.
uncertainty_only_on_leaves (bool, default True) – Whether uncertainty information should be displayed only on leaf nodes. If False, then interpretation can be slightly slower, especially for cate models that have a computationally expensive inference method.
splitter (str, default “best”) – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
max_depth (int, optional) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, float, default 2) – The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int, float, default 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.
- If int, then consider min_samples_leaf as the minimum number.
- If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default 0.) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (int, float, {“auto”, “sqrt”, “log2”}, or None, default None) – The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=n_features.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
random_state (int, RandomState instance, or None, default None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
max_leaf_nodes (int, optional) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default 0.) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:
```
N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)
```
where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child. N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

__init__(*, include_model_uncertainty=False, uncertainty_level=0.05, uncertainty_only_on_leaves=True, splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0)[source]

Methods

`__init__`(*[, include_model_uncertainty, ...])
`export_graphviz`([out_file, feature_names, ...])	Export a graphviz dot file representing the learned tree model
`interpret`(cate_estimator, X)	Interpret the heterogeneity of a CATE estimator when applied to a set of features
`plot`([ax, title, feature_names, ...])	Exports policy trees to matplotlib
`render`(out_file[, format, view, ...])	Render the tree to a flie

Attributes

`node_dict_`
`tree_model_`

export_graphviz(out_file=None, feature_names=None, treatment_names=None, max_depth=None, filled=True, leaves_parallel=True, rotate=False, rounded=True, special_characters=False, precision=3)

Export a graphviz dot file representing the learned tree model

Parameters

out_file (file object or str, optional) – Handle or name of the output file. If None, the result is returned as a string.
feature_names (list of str, optional) – Names of each of the features.
treatment_names (list of str, optional) – Names of each of the treatments
max_depth (int, optional) – The maximum tree depth to plot
filled (bool, default False) – When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
leaves_parallel (bool, default True) – When set to True, draw all leaf nodes at the bottom of the tree.
rotate (bool, default False) – When set to True, orient tree left to right rather than top-down.
rounded (bool, default True) – When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
special_characters (bool, default False) – When set to False, ignore special characters for PostScript compatibility.
precision (int, default 3) – Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.

interpret(cate_estimator, X)[source]

Interpret the heterogeneity of a CATE estimator when applied to a set of features

Parameters

cate_estimator (LinearCateEstimator) – The fitted estimator to interpret
X (array_like) – The features against which to interpret the estimator; must be compatible shape-wise with the features used to fit the estimator

Returns

self

Return type

object instance

plot(ax=None, title=None, feature_names=None, treatment_names=None, max_depth=None, filled=True, rounded=True, precision=3, fontsize=None)

Exports policy trees to matplotlib

Parameters

ax (matplotlib.axes.Axes, optional) – The axes on which to plot
title (str, optional) – A title for the final figure to be printed at the top of the page.
feature_names (list of str, optional) – Names of each of the features.
treatment_names (list of str, optional) – Names of each of the treatments
max_depth (int, optional) – The maximum tree depth to plot
filled (bool, default False) – When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
rounded (bool, default True) – When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
precision (int, default 3) – Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.
fontsize (int, optional) – Font size for text

render(out_file, format='pdf', view=True, feature_names=None, treatment_names=None, max_depth=None, filled=True, leaves_parallel=True, rotate=False, rounded=True, special_characters=False, precision=3)

Render the tree to a flie

Parameters

out_file (file name to save to)
format (str, default ‘pdf’) – The file format to render to; must be supported by graphviz
view (bool, default True) – Whether to open the rendered result with the default application.
feature_names (list of str, optional) – Names of each of the features.
treatment_names (list of str, optional) – Names of each of the treatments
max_depth (int, optional) – The maximum tree depth to plot
filled (bool, default False) – When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
leaves_parallel (bool, default True) – When set to True, draw all leaf nodes at the bottom of the tree.
rotate (bool, default False) – When set to True, orient tree left to right rather than top-down.
rounded (bool, default True) – When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
special_characters (bool, default False) – When set to False, ignore special characters for PostScript compatibility.
precision (int, default 3) – Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.