Forest Based Estimators
What is it?
This section describes the different estimation methods provided in the package that use a forest based methodology to model the treatment effect heterogeneity. We collect these methods in a single user guide to better illustrate their comparisons and differences. Currently, our package offers three such estimation methods:
The Orthogonal Random Forest Estimator (see
DMLOrthoForest
,DROrthoForest
)The Forest Double Machine Learning Estimator (aka Causal Forest) (see
CausalForestDML
)The Forest Doubly Robust Estimator (see
ForestDRLearner
).
These estimators, similar to the DML and DR sections require the unconfoundedness assumption, i.e. that all potential variables that could simultaneously have affected the treatment and the outcome to be observed.
There many commonalities among these estimators. In particular the DMLOrthoForest
shares
many similarities with the CausalForestDML
and the DROrthoForest
shares
many similarities with the ForestDRLearner
. Specifically, the corresponding classes use the same estimating (moment)
equations to identify the heterogeneous treatment effect. However, they differ in a substantial manner in how they
estimate the first stage regression/classification (nuisance) models. In particular, the OrthoForest methods fit
local nuisance parameters around the target feature
What are the relevant estimator classes?
This section describes the methodology implemented in the classes, DMLOrthoForest
,
DROrthoForest
,
CausalForestDML
, ForestDRLearner
.
Click on each of these links for a detailed module documentation and input parameters of each class.
When should you use it?
These methods estimate very flexible non-linear models of the heterogeneous treatment effect. Moreover, they are data-adaptive methods and adapt to low dimensional latent structures of the data generating process. Hence, they can perform well even with many features, even though they perform non-parametric estimation (which typically requires a small number of features compared to the number of samples). Finally, these methods use recent ideas in the literature so as to provide valid confidence intervals, despite being data-adaptive and non-parametric. Thus you should use these methods if you have many features, you have no good idea how your effect heterogeneity looks like and you want confidence intervals.
Overview of Formal Methodology
Orthogonal Random Forests
Orthogonal Random Forests [Oprescu2019] are a combination of causal forests and double machine learning that allow
for controlling for a high-dimensional set of confounders
For continuous or discrete treatments (see DMLOrthoForest
) the method estimates
But makes no further strong assumption on the functions
Equivalently, if we let
This is a local version of the DML loss, since the above is equivalent to minimizing the residual
When taking these identification approach to estimation, we will replace the local moment equations with a locally
weighted empirical average and replace the function
or equivalently minimize the local square loss (i.e. run a local linear regression):
In fact, in our package we also implement the local-linear correction proposed in [Friedberg2018], where instead
of fitting a constant
The kernel
Moreover, for every target point
In order to handle high-dimensional
where
Algorithmically, the nuisance estimation part of the method is implemented in a
flexible manner, not restricted to WeightedModelWrapper
that
can wrap any class that supports fit and predict and enables sample weight functionality. Moreover, we provide
some extensions to the scikit-learn library that enable sample weights, such as the WeightedLasso
.
>>> est = DMLOrthoForest(model_Y=WeightedLasso(), model_T=WeightedLasso())
In the case of discrete treatments (see DROrthoForest
) the
method estimates
where
Equivalently, we can express this as minimizing a local square loss:
Similar to the continuous treatment case, we transfer this identification strategy to estimation by minimizing a locally weighted square loss, with a local linear correction:
where we use first stage local estimates predict_proba
.
For more details on the input parameters of the orthogonal forest classes and how to customize the estimator checkout the two modules:
CausalForest (aka Forest Double Machine Learning)
In this package we implement the double machine learning version of Causal Forests/Generalized Random Forests (see [Wager2018], [Athey2019]) as for instance described in Section 6.1.1 of [Athey2019]. This version follows a similar structure to the DMLOrthoForest approach, in that the estimation is based on solving a local residual on residual moment condition:
The similarity metric
The Causal Forest has two main differences from the OrthoForest: first the nuisance estimates
Our implementation of a Causal Forest allows for any number of continuous treatments or a multi-valued discrete
treatment. The causal forest is implemented in CausalForest
in a high-performance Cython implementation
as a scikit-learn predictor.
Apart from the criterion proposed in [Athey2019] we also implemented an MSE criterion that penalizes splits
with low variance in the treatment. The difference can potentially lead to small finite sample
differences. In particular, suppose that we want to decide how to split a node in two subsets of samples
where
For more details on Double Machine Learning and how the CausalForestDML
fits into our overall
set of DML based CATE estimators, check out the Double Machine Learning User Guide.
Forest Doubly Robust Learner
The Forest Doubly Robust Learner is a variant of the Generalized Random Forest and the Orthogonal Random Forest (see [Wager2018], [Athey2019], [Oprescu2019]) that uses the doubly robust moments for estimation as opposed to the double machine learning moments (see the Doubly Robust Learning User Guide). The method only applies for categorical treatments.
Essentially, it is an analogue of the DROrthoForest
, that instead of local nuisance estimation
it conducts global nuisance estimation and does not couple the implicit similarity metric used for the nuisance
estimates, with the final stage similarity metric.
More concretely, the method estimates the CATE associated with treatment
where:
and _OrthoLearner
for more details on cross fitting).
The similarity metric RegressionForest
).
Class Hierarchy Structure

Usage Examples
Here is a simple example of how to call DMLOrthoForest
and what the returned values correspond to in a simple data generating process.
For more examples check out our
OrthoForest Jupyter notebook
and the ForestLearners Jupyter notebook .
import numpy as np import sklearn from econml.orf import DMLOrthoForest, DROrthoForest np.random.seed(123)>>> T = np.array([0, 1]*60) >>> W = np.array([0, 1, 1, 0]*30).reshape(-1, 1) >>> Y = (.2 * W[:, 0] + 1) * T + .5 >>> est = DMLOrthoForest(n_trees=1, max_depth=1, subsample_ratio=1, ... model_T=sklearn.linear_model.LinearRegression(), ... model_Y=sklearn.linear_model.LinearRegression()) >>> est.fit(Y, T, X=W, W=W) <econml.orf._ortho_forest.DMLOrthoForest object at 0x...> >>> print(est.effect(W[:2])) [1.00... 1.19...]
Similarly, we can call DROrthoForest
:
>>> T = np.array([0, 1]*60)
>>> W = np.array([0, 1, 1, 0]*30).reshape(-1, 1)
>>> Y = (.2 * W[:, 0] + 1) * T + .5
>>> est = DROrthoForest(n_trees=1, max_depth=1, subsample_ratio=1,
... propensity_model=sklearn.linear_model.LogisticRegression(),
... model_Y=sklearn.linear_model.LinearRegression())
>>> est.fit(Y, T, X=W, W=W)
<econml.orf._ortho_forest.DROrthoForest object at 0x...>
>>> print(est.effect(W[:2]))
[0.99... 1.35...]
Let’s now look at a more involved example with a high-dimensional set of confounders LassoCV
for
both the treatment and the outcome regressions, in the case of continuous treatments.
>>> from econml.orf import DMLOrthoForest
>>> from econml.orf import DMLOrthoForest
>>> from econml.sklearn_extensions.linear_model import WeightedLasso
>>> import matplotlib.pyplot as plt
>>> np.random.seed(123)
>>> X = np.random.uniform(-1, 1, size=(4000, 1))
>>> W = np.random.normal(size=(4000, 50))
>>> support = np.random.choice(50, 4, replace=False)
>>> T = np.dot(W[:, support], np.random.normal(size=4)) + np.random.normal(size=4000)
>>> Y = np.exp(2*X[:, 0]) * T + np.dot(W[:, support], np.random.normal(size=4)) + .5
>>> est = DMLOrthoForest(n_trees=100,
... max_depth=5,
... model_Y=WeightedLasso(alpha=0.01),
... model_T=WeightedLasso(alpha=0.01))
>>> est.fit(Y, T, X=X, W=W)
<econml.orf._ortho_forest.DMLOrthoForest object at 0x...>
>>> X_test = np.linspace(-1, 1, 30).reshape(-1, 1)
>>> treatment_effects = est.effect(X_test)
>>> plt.plot(X_test[:, 0], treatment_effects, label='ORF estimate')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.plot(X_test[:, 0], np.exp(2*X_test[:, 0]), 'b--', label='True effect')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.legend()
<matplotlib.legend.Legend object at 0x...>
>>> plt.show(block=False)

Synthetic data estimation with high dimensional controls