# econml.grf._base_grf.BaseGRF¶

class econml.grf._base_grf.BaseGRF(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, min_var_fraction_leaf=None, min_var_leaf_on_val=False, max_features='auto', min_impurity_decrease=0.0, max_samples=0.45, min_balancedness_tol=0.45, honest=True, inference=True, fit_intercept=True, subforest_size=4, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

Bases: econml._ensemble._ensemble.BaseEnsemble

Base class for Genearlized Random Forests for solving linear moment equations of the form:

E[J * theta(x) - A | X = x] = 0


where J is an (d, d) random matrix, A is an (d, 1) random vector and theta(x) is a local parameter to be estimated, which might contain both relevant and nuisance parameters.

Warning: This class should not be used directly. Use derived classes instead.

__init__(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, min_var_fraction_leaf=None, min_var_leaf_on_val=False, max_features='auto', min_impurity_decrease=0.0, max_samples=0.45, min_balancedness_tol=0.45, honest=True, inference=True, fit_intercept=True, subforest_size=4, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

 __init__([n_estimators, criterion, …]) Initialize self. Apply trees in the forest to X, return leaf indices. Return the decision path in the forest. feature_importances([max_depth, …]) The feature importances based on the amount of parameter heterogeneity they create. fit(X, T, y, *[, sample_weight]) Build a forest of trees from the training set (X, T, y) and any other auxiliary variables. get_params([deep]) Get parameters for this estimator. Re-generate the example same sample indices as those at fit time using same pseudo-randomness. oob_predict(Xtrain) Returns the relevant output predictions for each of the training data points, when only trees where that data point was not used are incorporated. predict(X[, interval, alpha]) Return the prefix of relevant fitted local parameters for each x in X, i.e. predict_alpha_and_jac(X[, slice, parallel]) Return the value of the conditional jacobian E[J | X=x] and the conditional alpha E[A | X=x] using the forest as kernel weights, i.e.. Return the prefix of relevant fitted local parameters for each x in X, i.e. predict_full(X[, interval, alpha]) Return the fitted local parameters for each x in X, i.e. predict_interval(X[, alpha]) Return the confidence interval for the relevant fitted local parameters for each x in X, i.e. predict_moment_and_var(X, parameter[, …]) Return the value of the conditional expected moment vector at each sample and for the given parameter estimate for each sample. predict_projection(X, projector) Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. predict_projection_and_var(X, projector) Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. predict_projection_var(X, projector) Return the variance of the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. Return the prefix of relevant fitted local parameters for each X, i.e. Return the fitted local parameters for each X, i.e. Return the covariance matrix of the prefix of relevant fitted local parameters for each x in X. Return the standard deviation of each coordinate of the prefix of relevant fitted local parameters for each x in X. set_params(**params) Set the parameters of this estimator.

Attributes

 feature_importances_
apply(X)[source]

Apply trees in the forest to X, return leaf indices.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

X_leaves – For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.

Return type

ndarray of shape (n_samples, n_estimators)

decision_path(X)[source]

Return the decision path in the forest.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

• indicator (sparse matrix of shape (n_samples, n_nodes)) – Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format.

• n_nodes_ptr (ndarray of shape (n_estimators + 1,)) – The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator.

feature_importances(max_depth=4, depth_decay_exponent=2.0)[source]

The feature importances based on the amount of parameter heterogeneity they create. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total heterogeneity that the feature creates. For each tree and for each split that the feature was chosen adds:

parent_weight * (left_weight * right_weight)
* mean((value_left[k] - value_right[k])**2) / parent_weight**2


to the importance of the feature. Each such quantity is also weighted by the depth of the split. These importances are normalized at the tree level and then averaged across trees.

Parameters
• max_depth (int, default=4) – Splits of depth larger than max_depth are not used in this calculation

• depth_decay_exponent (double, default=2.0) – The contribution of each split to the total score is re-weighted by 1 / (1 + depth)**2.0.

Returns

feature_importances_ – Normalized total parameter heterogeneity inducing importance of each feature

Return type

ndarray of shape (n_features,)

fit(X, T, y, *, sample_weight=None, **kwargs)[source]

Build a forest of trees from the training set (X, T, y) and any other auxiliary variables.

Parameters
• X (array-like of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to dtype=np.float64.

• T (array-like of shape (n_samples, n_treatments)) – The treatment vector for each sample

• y (array-like of shape (n_samples,) or (n_samples, n_outcomes)) – The outcome values for each sample.

• sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

• **kwargs (dictionary of array-like items of shape (n_samples, d_var)) – Auxiliary random variables that go into the moment function (e.g. instrument, censoring etc) Any of these variables will be passed on as is to the get_pointJ and get_alpha method of the children classes.

Returns

self

Return type

object

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_subsample_inds()[source]

Re-generate the example same sample indices as those at fit time using same pseudo-randomness.

oob_predict(Xtrain)[source]

Returns the relevant output predictions for each of the training data points, when only trees where that data point was not used are incorporated. This method is not available is the estimator was trained with warm_start=True.

Parameters

Xtrain ((n_training_samples, n_features) matrix) – Must be the same exact X matrix that was passed to the forest at fit time.

Returns

oob_preds – The out-of-bag predictions of the relevant output parameters for each of the training points

Return type

(n_training_samples, n_relevant_outputs) matrix

predict(X, interval=False, alpha=0.05)[source]

Return the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs].

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• interval (bool, default=False) – Whether to return a confidence interval too

• alpha (float in (0, 1), default=0.05) – The confidence level of the confidence interval. Returns a symmetric (alpha/2, 1-alpha/2) confidence interval.

Returns

• theta(X)[1, .., n_relevant_outputs] (array-like of shape (n_samples, n_relevant_outputs)) – The estimated relevant parameters for each row of X

• lb(x), ub(x) (array-like of shape (n_samples, n_relevant_outputs)) – The lower and upper end of the confidence interval for each parameter. Return value is omitted if interval=False.

predict_alpha_and_jac(X, slice=None, parallel=True)[source]

Return the value of the conditional jacobian E[J | X=x] and the conditional alpha E[A | X=x] using the forest as kernel weights, i.e.:

alpha(x) = (1/n_trees) sum_{trees} (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] A[i]
jac(x) = (1/n_trees) sum_{trees} (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] J[i]


where w[i] is the sample weight (1.0 if sample_weight is None).

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• slice (list of int or None, default=None) – If not None, then only the trees with index in slice, will be used to calculate the mean and the variance.

• parallel (bool , default=True) – Whether the averaging should happen using parallelism or not. Parallelism adds some overhead but makes it faster with many trees.

Returns

• alpha (array-like of shape (n_samples, n_outputs)) – The estimated conditional A, alpha(x) for each sample x in X

• jac (array-like of shape (n_samples, n_outputs, n_outputs)) – The estimated conditional J, jac(x) for each sample x in X

predict_and_var(X)[source]

Return the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs] and their covariance matrix.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

• theta(x)[1, .., n_relevant_outputs] (array-like of shape (n_samples, n_relevant_outputs)) – The estimated relevant parameters for each row of X

• var(theta(x)) (array-like of shape (n_samples, n_relevant_outputs, n_relevant_outputs)) – The covariance of theta(x)[1, .., n_relevant_outputs]

predict_full(X, interval=False, alpha=0.05)[source]

Return the fitted local parameters for each x in X, i.e. theta(x).

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• interval (bool, default=False) – Whether to return a confidence interval too

• alpha (float in (0, 1), default=0.05) – The confidence level of the confidence interval. Returns a symmetric (alpha/2, 1-alpha/2) confidence interval.

Returns

• theta(x) (array-like of shape (n_samples, n_outputs)) – The estimated relevant parameters for each row x of X

• lb(x), ub(x) (array-like of shape (n_samples, n_outputs)) – The lower and upper end of the confidence interval for each parameter. Return value is omitted if interval=False.

predict_interval(X, alpha=0.05)[source]

Return the confidence interval for the relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs].

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• alpha (float in (0, 1), default=0.05) – The confidence level of the confidence interval. Returns a symmetric (alpha/2, 1-alpha/2) confidence interval.

Returns

lb(x), ub(x) – The lower and upper end of the confidence interval for each parameter. Return value is omitted if interval=False.

Return type

array-like of shape (n_samples, n_relevant_outputs)

predict_moment_and_var(X, parameter, slice=None, parallel=True)[source]

Return the value of the conditional expected moment vector at each sample and for the given parameter estimate for each sample:

M(x; theta(x)) := E[J | X=x] theta(x) - E[A | X=x]


where conditional expectations are estimated based on the forest weights, i.e.:

M_tree(x; theta(x)) := (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] (J[i] theta(x) - A[i])
M(x; theta(x) = (1/n_trees) sum_{trees} M_tree(x; theta(x))


where w[i] is the sample weight (1.0 if sample_weight is None), as well as the variance of the local moment vector across trees:

Var(M_tree(x; theta(x))) = (1/n_trees) sum_{trees} M_tree(x; theta(x)) @ M_tree(x; theta(x)).T

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• parameter (array-like of shape (n_samples, n_outputs)) – An estimate of the parameter theta(x) for each sample x in X

• slice (list of int or None, default=None) – If not None, then only the trees with index in slice, will be used to calculate the mean and the variance.

• parallel (bool , default=True) – Whether the averaging should happen using parallelism or not. Parallelism adds some overhead but makes it faster with many trees.

Returns

• moment (array-like of shape (n_samples, n_outputs)) – The estimated conditional moment M(x; theta(x)) for each sample x in X

• moment_var (array-like of shape (n_samples, n_outputs)) – The variance of the conditional moment Var(M_tree(x; theta(x))) across trees for each sample x

predict_projection(X, projector)[source]

Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.:

mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• projector (array-like of shape (n_samples, n_relevant_outputs)) – The projector vector for each sample x in X

Returns

mu(x) – The estimated inner product of the relevant parameters with the projector for each row x of X

Return type

array-like of shape (n_samples, 1)

predict_projection_and_var(X, projector)[source]

Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.:

mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>


as well as the variance of mu(x).

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• projector (array-like of shape (n_samples, n_relevant_outputs)) – The projector vector for each sample x in X

Returns

• mu(x) (array-like of shape (n_samples, 1)) – The estimated inner product of the relevant parameters with the projector for each row x of X

• var(mu(x)) (array-like of shape (n_samples, 1)) – The variance of the estimated inner product

predict_projection_var(X, projector)[source]

Return the variance of the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.:

Var(mu(x)) for mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

Parameters
• X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

• projector (array-like of shape (n_samples, n_relevant_outputs)) – The projector vector for each sample x in X

Returns

var(mu(x)) – The variance of the estimated inner product

Return type

array-like of shape (n_samples, 1)

predict_tree_average(X)[source]

Return the prefix of relevant fitted local parameters for each X, i.e. theta(X)[1..n_relevant_outputs]. This method simply returns the average of the parameters estimated by each tree. predict should be preferred over pred_tree_average, as it performs a more stable averaging across trees.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

theta(X)[1, .., n_relevant_outputs] – The estimated relevant parameters for each row of X

Return type

array-like of shape (n_samples, n_relevant_outputs)

predict_tree_average_full(X)[source]

Return the fitted local parameters for each X, i.e. theta(X). This method simply returns the average of the parameters estimated by each tree. predict_full should be preferred over pred_tree_average_full, as it performs a more stable averaging across trees.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

theta(X) – The estimated relevant parameters for each row of X

Return type

array-like of shape (n_samples, n_outputs)

predict_var(X)[source]

Return the covariance matrix of the prefix of relevant fitted local parameters for each x in X.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

var(theta(x)) – The covariance of theta(x)[1, .., n_relevant_outputs]

Return type

array-like of shape (n_samples, n_relevant_outputs, n_relevant_outputs)

prediction_stderr(X)[source]

Return the standard deviation of each coordinate of the prefix of relevant fitted local parameters for each x in X.

Parameters

X (array-like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

std(theta(x)) – The standard deviation of each theta(x)[i] for i in {1, .., n_relevant_outputs}

Return type

array-like of shape (n_samples, n_relevant_outputs)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance