econml.grf._base_grf.BaseGRF

class econml.grf._base_grf.BaseGRF(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, min_var_fraction_leaf=None, min_var_leaf_on_val=False, max_features='auto', min_impurity_decrease=0.0, max_samples=0.45, min_balancedness_tol=0.45, honest=True, inference=True, fit_intercept=True, subforest_size=4, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

Bases: econml._ensemble._ensemble.BaseEnsemble

Base class for Genearlized Random Forests for solving linear moment equations of the form:

E[J * theta(x) - A | X = x] = 0

where J is an (d, d) random matrix, A is an (d, 1) random vector and theta(x) is a local parameter to be estimated, which might contain both relevant and nuisance parameters.

Warning: This class should not be used directly. Use derived classes instead.

__init__(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, min_var_fraction_leaf=None, min_var_leaf_on_val=False, max_features='auto', min_impurity_decrease=0.0, max_samples=0.45, min_balancedness_tol=0.45, honest=True, inference=True, fit_intercept=True, subforest_size=4, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

Methods

`__init__`([n_estimators, criterion, ...])
`apply`(X)	Apply trees in the forest to X, return leaf indices.
`decision_path`(X)	Return the decision path in the forest.
`feature_importances`([max_depth, ...])	The feature importances based on the amount of parameter heterogeneity they create. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total heterogeneity that the feature creates. For each tree and for each split that the feature was chosen adds::.
`fit`(X, T, y, *[, sample_weight])	Build a forest of trees from the training set (X, T, y) and any other auxiliary variables.
`get_params`([deep])	Get parameters for this estimator.
`get_subsample_inds`()	Re-generate the example same sample indices as those at fit time using same pseudo-randomness.
`oob_predict`(Xtrain)	Returns the relevant output predictions for each of the training data points, when only trees where that data point was not used are incorporated.
`predict`(X[, interval, alpha])	Return the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs].
`predict_alpha_and_jac`(X[, slice, parallel])	Return the value of the conditional jacobian E[J \| X=x] and the conditional alpha E[A \| X=x] using the forest as kernel weights, i.e..
`predict_and_var`(X)	Return the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs] and their covariance matrix.
`predict_full`(X[, interval, alpha])	Return the fitted local parameters for each x in X, i.e. theta(x).
`predict_interval`(X[, alpha])	Return the confidence interval for the relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs].
`predict_moment_and_var`(X, parameter[, ...])	Return the value of the conditional expected moment vector at each sample and for the given parameter estimate for each sample.
`predict_projection`(X, projector)	Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.::.
`predict_projection_and_var`(X, projector)	Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.::.
`predict_projection_var`(X, projector)	Return the variance of the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.::.
`predict_tree_average`(X)	Return the prefix of relevant fitted local parameters for each X, i.e. theta(X)[1..n_relevant_outputs].
`predict_tree_average_full`(X)	Return the fitted local parameters for each X, i.e. theta(X).
`predict_var`(X)	Return the covariance matrix of the prefix of relevant fitted local parameters for each x in X.
`prediction_stderr`(X)	Return the standard deviation of each coordinate of the prefix of relevant fitted local parameters for each x in X.
`set_params`(**params)	Set the parameters of this estimator.

Attributes

feature_importances_

apply(X)[source]

Apply trees in the forest to X, return leaf indices.

Parameters: X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
Returns: X_leaves – For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.
Return type: ndarray of shape (n_samples, n_estimators)

decision_path(X)[source]

Return the decision path in the forest.

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

indicator (sparse matrix of shape (n_samples, n_nodes)) – Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. The matrix is of CSR format.
n_nodes_ptr (ndarray of shape (n_estimators + 1,)) – The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator.

feature_importances(max_depth=4, depth_decay_exponent=2.0)[source]

The feature importances based on the amount of parameter heterogeneity they create. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total heterogeneity that the feature creates. For each tree and for each split that the feature was chosen adds:

parent_weight * (left_weight * right_weight)
    * mean((value_left[k] - value_right[k])**2) / parent_weight**2

to the importance of the feature. Each such quantity is also weighted by the depth of the split. These importances are normalized at the tree level and then averaged across trees.

Parameters

max_depth (int, default 4) – Splits of depth larger than max_depth are not used in this calculation
depth_decay_exponent (double, default 2.0) – The contribution of each split to the total score is re-weighted by 1 / (1 + depth)**2.0.

Returns

feature_importances_ – Normalized total parameter heterogeneity inducing importance of each feature

Return type

ndarray of shape (n_features,)

fit(X, T, y, *, sample_weight=None, **kwargs)[source]

Build a forest of trees from the training set (X, T, y) and any other auxiliary variables.

Parameters

X (array_like of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to dtype=np.float64.
T (array_like of shape (n_samples, n_treatments)) – The treatment vector for each sample
y (array_like of shape (n_samples,) or (n_samples, n_outcomes)) – The outcome values for each sample.
sample_weight (array_like of shape (n_samples,), default None) – Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
**kwargs (dictionary of array_like items of shape (n_samples, d_var)) – Auxiliary random variables that go into the moment function (e.g. instrument, censoring etc) Any of these variables will be passed on as is to the get_pointJ and get_alpha method of the children classes.

Returns

self

Return type

object

get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

get_subsample_inds()[source]: Re-generate the example same sample indices as those at fit time using same pseudo-randomness.

oob_predict(Xtrain)[source]

Returns the relevant output predictions for each of the training data points, when only trees where that data point was not used are incorporated. This method is not available is the estimator was trained with warm_start=True.

Parameters: Xtrain ((n_training_samples, n_features) matrix) – Must be the same exact X matrix that was passed to the forest at fit time.
Returns: oob_preds – The out-of-bag predictions of the relevant output parameters for each of the training points
Return type: (n_training_samples, n_relevant_outputs) matrix

predict(X, interval=False, alpha=0.05)[source]

Return the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs].

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
interval (bool, default False) – Whether to return a confidence interval too
alpha (float in (0, 1), default 0.05) – The confidence level of the confidence interval. Returns a symmetric (alpha/2, 1-alpha/2) confidence interval.

Returns

theta(X)[1, .., n_relevant_outputs] (array_like of shape (n_samples, n_relevant_outputs)) – The estimated relevant parameters for each row of X
lb(x), ub(x) (array_like of shape (n_samples, n_relevant_outputs)) – The lower and upper end of the confidence interval for each parameter. Return value is omitted if interval=False.

predict_alpha_and_jac(X, slice=None, parallel=True)[source]

Return the value of the conditional jacobian E[J | X=x] and the conditional alpha E[A | X=x] using the forest as kernel weights, i.e.:

alpha(x) = (1/n_trees) sum_{trees} (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] A[i]
jac(x) = (1/n_trees) sum_{trees} (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] J[i]

where w[i] is the sample weight (1.0 if sample_weight is None).

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
slice (list of int or None, default None) – If not None, then only the trees with index in slice, will be used to calculate the mean and the variance.
parallel (bool , default True) – Whether the averaging should happen using parallelism or not. Parallelism adds some overhead but makes it faster with many trees.

Returns

alpha (array_like of shape (n_samples, n_outputs)) – The estimated conditional A, alpha(x) for each sample x in X
jac (array_like of shape (n_samples, n_outputs, n_outputs)) – The estimated conditional J, jac(x) for each sample x in X

predict_and_var(X)[source]

Return the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs] and their covariance matrix.

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.

Returns

theta(x)[1, .., n_relevant_outputs] (array_like of shape (n_samples, n_relevant_outputs)) – The estimated relevant parameters for each row of X
var(theta(x)) (array_like of shape (n_samples, n_relevant_outputs, n_relevant_outputs)) – The covariance of theta(x)[1, .., n_relevant_outputs]

predict_full(X, interval=False, alpha=0.05)[source]

Return the fitted local parameters for each x in X, i.e. theta(x).

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
interval (bool, default False) – Whether to return a confidence interval too
alpha (float in (0, 1), default 0.05) – The confidence level of the confidence interval. Returns a symmetric (alpha/2, 1-alpha/2) confidence interval.

Returns

theta(x) (array_like of shape (n_samples, n_outputs)) – The estimated relevant parameters for each row x of X
lb(x), ub(x) (array_like of shape (n_samples, n_outputs)) – The lower and upper end of the confidence interval for each parameter. Return value is omitted if interval=False.

predict_interval(X, alpha=0.05)[source]

Return the confidence interval for the relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs].

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
alpha (float in (0, 1), default 0.05) – The confidence level of the confidence interval. Returns a symmetric (alpha/2, 1-alpha/2) confidence interval.

Returns

lb(x), ub(x) – The lower and upper end of the confidence interval for each parameter. Return value is omitted if interval=False.

Return type

array_like of shape (n_samples, n_relevant_outputs)

predict_moment_and_var(X, parameter, slice=None, parallel=True)[source]

Return the value of the conditional expected moment vector at each sample and for the given parameter estimate for each sample:

M(x; theta(x)) := E[J | X=x] theta(x) - E[A | X=x]

where conditional expectations are estimated based on the forest weights, i.e.:

M_tree(x; theta(x)) := (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] (J[i] theta(x) - A[i])
M(x; theta(x) = (1/n_trees) sum_{trees} M_tree(x; theta(x))

where w[i] is the sample weight (1.0 if sample_weight is None), as well as the variance of the local moment vector across trees:

Var(M_tree(x; theta(x))) = (1/n_trees) sum_{trees} M_tree(x; theta(x)) @ M_tree(x; theta(x)).T

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
parameter (array_like of shape (n_samples, n_outputs)) – An estimate of the parameter theta(x) for each sample x in X
slice (list of int or None, default None) – If not None, then only the trees with index in slice, will be used to calculate the mean and the variance.
parallel (bool , default True) – Whether the averaging should happen using parallelism or not. Parallelism adds some overhead but makes it faster with many trees.

Returns

moment (array_like of shape (n_samples, n_outputs)) – The estimated conditional moment M(x; theta(x)) for each sample x in X
moment_var (array_like of shape (n_samples, n_outputs)) – The variance of the conditional moment Var(M_tree(x; theta(x))) across trees for each sample x

predict_projection(X, projector)[source]

Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.:

mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
projector (array_like of shape (n_samples, n_relevant_outputs)) – The projector vector for each sample x in X

Returns

mu(x) – The estimated inner product of the relevant parameters with the projector for each row x of X

Return type

array_like of shape (n_samples, 1)

predict_projection_and_var(X, projector)[source]

Return the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.:

mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

as well as the variance of mu(x).

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
projector (array_like of shape (n_samples, n_relevant_outputs)) – The projector vector for each sample x in X

Returns

mu(x) (array_like of shape (n_samples, 1)) – The estimated inner product of the relevant parameters with the projector for each row x of X
var(mu(x)) (array_like of shape (n_samples, 1)) – The variance of the estimated inner product

predict_projection_var(X, projector)[source]

Return the variance of the inner product of the prefix of relevant fitted local parameters for each x in X, i.e. theta(x)[1..n_relevant_outputs], with a projector vector projector(x), i.e.:

Var(mu(x)) for mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

Parameters

X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
projector (array_like of shape (n_samples, n_relevant_outputs)) – The projector vector for each sample x in X

Returns

var(mu(x)) – The variance of the estimated inner product

Return type

array_like of shape (n_samples, 1)

predict_tree_average(X)[source]

Return the prefix of relevant fitted local parameters for each X, i.e. theta(X)[1..n_relevant_outputs]. This method simply returns the average of the parameters estimated by each tree. predict should be preferred over pred_tree_average, as it performs a more stable averaging across trees.

Parameters: X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
Returns: theta(X)[1, .., n_relevant_outputs] – The estimated relevant parameters for each row of X
Return type: array_like of shape (n_samples, n_relevant_outputs)

predict_tree_average_full(X)[source]

Return the fitted local parameters for each X, i.e. theta(X). This method simply returns the average of the parameters estimated by each tree. predict_full should be preferred over pred_tree_average_full, as it performs a more stable averaging across trees.

Parameters: X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
Returns: theta(X) – The estimated relevant parameters for each row of X
Return type: array_like of shape (n_samples, n_outputs)

predict_var(X)[source]

Return the covariance matrix of the prefix of relevant fitted local parameters for each x in X.

Parameters: X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
Returns: var(theta(x)) – The covariance of theta(x)[1, .., n_relevant_outputs]
Return type: array_like of shape (n_samples, n_relevant_outputs, n_relevant_outputs)

prediction_stderr(X)[source]

Return the standard deviation of each coordinate of the prefix of relevant fitted local parameters for each x in X.

Parameters: X (array_like of shape (n_samples, n_features)) – The input samples. Internally, it will be converted to dtype=np.float64.
Returns: std(theta(x)) – The standard deviation of each theta(x)[i] for i in {1, .., n_relevant_outputs}
Return type: array_like of shape (n_samples, n_relevant_outputs)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance