econml.policy.DRPolicyForest

class econml.policy.DRPolicyForest(*, model_regression='auto', model_propensity='auto', featurizer=None, min_propensity=1e-06, categories='auto', cv=2, mc_iters=None, mc_agg='mean', n_estimators=100, max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, max_features='auto', min_impurity_decrease=0.0, max_samples=0.5, min_balancedness_tol=0.45, honest=True, n_jobs=-1, verbose=0, random_state=None)[source]

Bases: _BaseDRPolicyLearner

Policy learner that uses doubly-robust correction techniques to account for selection bias between treatment arms.

In this estimator, the policy is estimated by first constructing doubly robust estimates of the counterfactual outcomes

\[Y_{i, t}^{DR} = E[Y | X_i, W_i, T_i=t] + \frac{Y_i - E[Y | X_i, W_i, T_i=t]}{Pr[T_i=t | X_i, W_i]} \cdot 1\{T_i=t\}\]

Then optimizing the objective

\[V(\pi) = \sum_i \sum_t \pi_t(X_i) * (Y_{i, t} - Y_{i, 0})\]

with the constraint that only one of \(\pi_t(X_i)\) is 1 and the rest are 0, for each \(X_i\).

Thus if we estimate the nuisance functions \(h(X, W, T) = E[Y | X, W, T]\) and \(p_t(X, W)=Pr[T=t | X, W]\) in the first stage, we can estimate the final stage cate for each treatment t, by running a constructing a decision tree that maximizes the objective \(V(\pi)\)

The problem of estimating the nuisance function \(p\) is a simple multi-class classification problem of predicting the label \(T\) from \(X, W\). The DRLearner class takes as input the parameter model_propensity, which is an arbitrary scikit-learn classifier, that is internally used to solve this classification problem.

The second nuisance function \(h\) is a simple regression problem and the DRLearner class takes as input the parameter model_regressor, which is an arbitrary scikit-learn regressor that is internally used to solve this regression problem.

Parameters:

model_propensity (estimator, default 'auto') – Classifier for Pr[T=t | X, W]. Trained by regressing treatments on (features, controls) concatenated.
- If 'auto', the model will be the best-fitting of a set of linear and forest models
- Otherwise, see Model Selection for the range of supported options
model_regression (estimator, default 'auto') – Estimator for E[Y | X, W, T]. Trained by regressing Y on (features, controls, one-hot-encoded treatments) concatenated. The one-hot-encoding excludes the baseline treatment.
- If 'auto', the model will be the best-fitting of a set of linear and forest models
- Otherwise, see Model Selection for the range of supported options; if a single model is specified it should be a classifier if discrete_outcome is True and a regressor otherwise
featurizer (transformer, optional) – Must support fit_transform and transform. Used to create composite features in the final CATE regression. It is ignored if X is None. The final CATE will be trained on the outcome of featurizer.fit_transform(X). If featurizer=None, then CATE is trained on X.
min_propensity (float, default 1e-6) – The minimum propensity at which to clip propensity estimates to avoid dividing by zero.
categories (‘auto’ or list, default ‘auto’) – The categories to use when encoding discrete treatments (or ‘auto’ to use the unique sorted values). The first category will be treated as the control treatment.
cv (int, cross-validation generator or an iterable, default 2) – Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- CV splitter
- An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the treatment is discrete StratifiedKFold is used, else, KFold is used (with a random shuffle in either case).

Unless an iterable is used, we call split(concat[W, X], T) to generate the splits. If all W, X are None, then we call split(ones((T.shape[0], 1)), T).
mc_iters (int, optional) – The number of times to rerun the first stage models to reduce the variance of the nuisances.
mc_agg ({‘mean’, ‘median’}, default ‘mean’) – How to aggregate the nuisance value for each sample across the mc_iters monte carlo iterations of cross-fitting.
n_estimators (int, default 100) – The total number of trees in the forest. The forest consists of a forest of sqrt(n_estimators) sub-forests, where each sub-forest contains sqrt(n_estimators) trees.
max_depth (int or None, optional) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, float, default 10) – The minimum number of splitting samples required to split an internal node.
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (int, float, default 5) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf splitting samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. After construction the tree is also pruned so that there are at least min_samples_leaf estimation samples on each leaf.
- If int, then consider min_samples_leaf as the minimum number.
- If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, default 0.) – The minimum weighted fraction of the sum total of weights (of all splitting samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. After construction the tree is pruned so that the fraction of the sum total weight of the estimation samples contained in each leaf node is at least min_weight_fraction_leaf
max_features (int, float, str, or None, default “auto”) – The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=n_features.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
min_impurity_decrease (float, default 0.) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:
```
N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)
```
where N is the total number of split samples, N_t is the number of split samples at the current node, N_t_L is the number of split samples in the left child, and N_t_R is the number of split samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.
max_samples (int or float in (0, 1], default .5,) – The number of samples to use for each subsample that is used to train each tree:
- If int, then train each tree on max_samples samples, sampled without replacement from all the samples
- If float, then train each tree on ceil(max_samples * n_samples), sampled without replacement from all the samples.
min_balancedness_tol (float in [0, .5], default .45) – How imbalanced a split we can tolerate. This enforces that each split leaves at least (.5 - min_balancedness_tol) fraction of samples on each side of the split; or fraction of the total weight of samples, when sample_weight is not None. Default value, ensures that at least 5% of the parent node weight falls in each side of the split. Set it to 0.0 for no balancedness and to .5 for perfectly balanced splits. For the formal inference theory to be valid, this has to be any positive constant bounded away from zero.
honest (bool, default True) – Whether to use honest trees, i.e. half of the samples are used for creating the tree structure and the other half for the estimation at the leafs. If False, then all samples are used for both parts.
n_jobs (int or None, default -1) – The number of jobs to run in parallel for both fit and predict. None means 1 unless in a joblib.parallel_backend() context. -1 means using all processors. See Glossary for more details.
verbose (int, default 0) – Controls the verbosity when fitting and predicting.
random_state (int, RandomState instance, or None, default None) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

__init__(*, model_regression='auto', model_propensity='auto', featurizer=None, min_propensity=1e-06, categories='auto', cv=2, mc_iters=None, mc_agg='mean', n_estimators=100, max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, max_features='auto', min_impurity_decrease=0.0, max_samples=0.5, min_balancedness_tol=0.45, honest=True, n_jobs=-1, verbose=0, random_state=None)[source]

Methods

`__init__`(*[, model_regression, ...])
`export_graphviz`(tree_id, *[, out_file, ...])	Export a graphviz dot file representing the learned tree model.
`feature_importances`([max_depth, ...])	Get the feature importances based on the amount of parameter heterogeneity they create.
`fit`(Y, T, *[, X, W, sample_weight, groups])	Estimate a policy model from data.
`plot`(tree_id, *[, feature_names, ...])	Export policy trees to matplotlib.
`policy_feature_names`(*[, feature_names])	Get the output feature names.
`policy_treatment_names`(*[, treatment_names])	Get the names of the treatments.
`predict`(X)	Get recommended treatment for each sample.
`predict_proba`(X)	Predict the probability of recommending each treatment.
`predict_value`(X)	Get effect values for each non-baseline treatment and for each sample.
`render`(tree_id, out_file, *[, format, view, ...])	Render the tree to a flie.

Attributes

`feature_importances_`
`policy_model_`	The trained final stage policy model.

export_graphviz(tree_id, *, out_file=None, feature_names=None, treatment_names=None, max_depth=None, filled=True, leaves_parallel=True, rotate=False, rounded=True, special_characters=False, precision=3)[source]

Export a graphviz dot file representing the learned tree model.

Parameters:

tree_id (int) – The id of the tree of the forest to plot
out_file (file object or str, optional) – Handle or name of the output file. If None, the result is returned as a string.
feature_names (list of str, optional) – Names of each of the features.
treatment_names (list of str, optional) – Names of each of the treatments, starting with a name for the baseline/control/None treatment (alphanumerically smallest in case of discrete treatment)
max_depth (int or None, optional) – The maximum tree depth to plot
filled (bool, default False) – When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
leaves_parallel (bool, default True) – When set to True, draw all leaf nodes at the bottom of the tree.
rotate (bool, default False) – When set to True, orient tree left to right rather than top-down.
rounded (bool, default True) – When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
special_characters (bool, default False) – When set to False, ignore special characters for PostScript compatibility.
precision (int, default 3) – Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.

feature_importances(max_depth=4, depth_decay_exponent=2.0)

Get the feature importances based on the amount of parameter heterogeneity they create.

The higher, the more important the feature.

Parameters:

max_depth (int, default 4) – Splits of depth larger than max_depth are not used in this calculation
depth_decay_exponent (double, default 2.0) – The contribution of each split to the total score is re-weighted by 1 / (1 + `depth`)**2.0.

Returns:

feature_importances_ – Normalized total parameter heterogeneity inducing importance of each feature

Return type:

ndarray of shape (n_features,)

fit(Y, T, *, X=None, W=None, sample_weight=None, groups=None)

Estimate a policy model from data.

Parameters:

Y ((n,) vector of length n) – Outcomes for each sample
T ((n,) vector of length n) – Treatments for each sample
X ((n, d_x) matrix, optional) – Features for each sample
W ((n, d_w) matrix, optional) – Controls for each sample
sample_weight ((n,) vector, optional) – Weights for each samples
groups ((n,) vector, optional) – All rows corresponding to the same group will be kept together during splitting. If groups is not None, the cv argument passed to this class’s initializer must support a ‘groups’ argument to its split method.

Returns:

self

Return type:

object instance

plot(tree_id, *, feature_names=None, treatment_names=None, ax=None, title=None, max_depth=None, filled=True, rounded=True, precision=3, fontsize=None)[source]

Export policy trees to matplotlib.

Parameters:

tree_id (int) – The id of the tree of the forest to plot
ax (matplotlib.axes.Axes, optional) – The axes on which to plot
title (str, optional) – A title for the final figure to be printed at the top of the page.
feature_names (list of str, optional) – Names of each of the features.
treatment_names (list of str, optional) – Names of each of the treatments, starting with a name for the baseline/control treatment (alphanumerically smallest)
max_depth (int or None, optional) – The maximum tree depth to plot
filled (bool, default False) – When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
rounded (bool, default True) – When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
precision (int, default 3) – Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.
fontsize (int, optional) – Font size for text

policy_feature_names(*, feature_names=None)

Get the output feature names.

Parameters:: feature_names (list of str of length X.shape[1] or None) – The names of the input features. If None and X is a dataframe, it defaults to the column names from the dataframe.
Returns:: out_feature_names – The names of the output features on which the policy model is fitted.
Return type:: list of str or None

policy_treatment_names(*, treatment_names=None)

Get the names of the treatments.

Parameters:: treatment_names (list of str of length n_categories) – The names of the treatments (including the baseling). If None then values are auto-generated based on input metadata.
Returns:: out_treatment_names – The names of the treatments including the baseline/control treatment.
Return type:: list of str

predict(X)

Get recommended treatment for each sample.

Parameters:: X (array_like of shape (n_samples, n_features)) – The training input samples.
Returns:: treatment – The index of the recommended treatment in the same order as in categories, or in lexicographic order if categories=’auto’. 0 corresponds to the baseline/control treatment. For ensemble policy models, recommended treatments are aggregated from each model in the ensemble and the treatment that receives the most votes is returned. Use predict_proba to get the fraction of models in the ensemble that recommend each treatment for each sample.
Return type:: array_like of shape (n_samples,)

predict_proba(X)

Predict the probability of recommending each treatment.

Parameters:: X (array_like of shape (n_samples, n_features)) – The input samples.
Returns:: treatment_proba – The probability of each treatment recommendation
Return type:: array_like of shape (n_samples, n_treatments)

predict_value(X)

Get effect values for each non-baseline treatment and for each sample.

Parameters:: X (array_like of shape (n_samples, n_features)) – The training input samples.
Returns:: values – The predicted average value for each sample and for each non-baseline treatment, as compared to the baseline treatment value and based on the feature neighborhoods defined by the trees.
Return type:: array_like of shape (n_samples, n_treatments - 1)

render(tree_id, out_file, *, format='pdf', view=True, feature_names=None, treatment_names=None, max_depth=None, filled=True, leaves_parallel=True, rotate=False, rounded=True, special_characters=False, precision=3)[source]

Render the tree to a flie.

Parameters:

tree_id (int) – The id of the tree of the forest to plot
out_file (file name to save to)
format (str, default ‘pdf’) – The file format to render to; must be supported by graphviz
view (bool, default True) – Whether to open the rendered result with the default application.
feature_names (list of str, optional) – Names of each of the features.
treatment_names (list of str, optional) – Names of each of the treatments, starting with a name for the baseline/control treatment (alphanumerically smallest in case of discrete treatment)
max_depth (int or None, optional) – The maximum tree depth to plot
filled (bool, default False) – When set to True, paint nodes to indicate majority class for classification, extremity of values for regression, or purity of node for multi-output.
leaves_parallel (bool, default True) – When set to True, draw all leaf nodes at the bottom of the tree.
rotate (bool, default False) – When set to True, orient tree left to right rather than top-down.
rounded (bool, default True) – When set to True, draw node boxes with rounded corners and use Helvetica fonts instead of Times-Roman.
special_characters (bool, default False) – When set to False, ignore special characters for PostScript compatibility.
precision (int, default 3) – Number of digits of precision for floating point in the values of impurity, threshold and value attributes of each node.

property policy_model_: The trained final stage policy model.