econml._ortho_learner¶
Orthogonal Machine Learning is a general approach to estimating causal models by formulating them as minimizers of some loss function that depends on auxiliary regression models that also need to be estimated from data. The class in this module implements the general logic in a very versatile way so that various child classes can simply instantiate the appropriate models and save a lot of code repetition.
References
 Dylan Foster, Vasilis Syrgkanis (2019). Orthogonal Statistical Learning.
ACM Conference on Learning Theory. https://arxiv.org/abs/1901.09036
 Xinkun Nie, Stefan Wager (2017). QuasiOracle Estimation of Heterogeneous Treatment Effects.
 Chernozhukov et al. (2017). Double/debiased machine learning for treatment and structural parameters.
The Econometrics Journal. https://arxiv.org/abs/1608.00060
Functions

General crossfit based calculation of nuisance parameters. 
Classes
alias of 


Base class for all orthogonal learners. 

econml._ortho_learner.
CachedValues
¶ alias of
econml._ortho_learner._CachedValues

class
econml._ortho_learner.
_OrthoLearner
(*, discrete_treatment, discrete_instrument, categories, cv, random_state, mc_iters=None, mc_agg='mean')[source]¶ Bases:
econml._cate_estimator.TreatmentExpansionMixin
,econml._cate_estimator.LinearCateEstimator
Base class for all orthogonal learners. This class is a parent class to any method that has the following architecture:
The CATE \(\theta(X)\) is the minimizer of some expected loss function
\[\mathbb{E}[\ell(V; \theta(X), h(V))]\]where \(V\) are all the random variables and h is a vector of nuisance functions. Alternatively, the class would also work if \(\theta(X)\) is the solution to a set of moment equations that also depend on nuisance functions \(h\).
To estimate \(\theta(X)\) we first fit the h functions and calculate \(h(V_i)\) for each sample \(i\) in a crossfit manner:
Let (F1_train, F1_test), …, (Fk_train, Fk_test) be any KFold partition of the data, where Ft_train, Ft_test are subsets of indices of the input samples and such that F1_train is disjoint from F1_test. The sets F1_test, …, Fk_test form an incomplete partition of all the input indices, i.e. they are be disjoint and their union could potentially be a subset of all input indices. For instance, in a time series split F0_train could be a prefix of the data and F0_test the suffix. Typically, these folds will be created by a KFold split, i.e. if S1, …, Sk is any partition of the data, then Ft_train is the set of all indices except St and Ft_test = St. If the union of the Ft_test is not all the data, then only the subset of the data in the union of the Ft_test sets will be used in the final stage.
Then for each t in [1, …, k]
Estimate a model \(\hat{h}_t\) for \(h\) using Ft_train
Evaluate the learned \(\hat{h}_t\) model on the data in Ft_test and use that value as the nuisance value/vector \(\hat{U}_i=\hat{h}(V_i)\) for the indices i in Ft_test
Estimate the model for \(\theta(X)\) by minimizing the empirical (regularized) plugin loss on the subset of indices for which we have a nuisance value, i.e. the union of {F1_test, …, Fk_test}:
\[\mathbb{E}_n[\ell(V; \theta(X), \hat{h}(V))] = \frac{1}{n} \sum_{i=1}^n \sum_i \ell(V_i; \theta(X_i), \hat{U}_i)\]The method is a bit more general in that the final step does not need to be a loss minimization step. The class takes as input a model for fitting an estimate of the nuisance h given a set of samples and predicting the value of the learned nuisance model on any other set of samples. It also takes as input a model for the final estimation, that takes as input the data and their associated estimated nuisance values from the first stage and fits a model for the CATE \(\theta(X)\). Then at predict time, the final model given any set of samples of the X variable, returns the estimated \(\theta(X)\).
The method essentially implements all the crossfit and plugin logic, so that any child classes need to only implement the appropriate model_nuisance and model_final and essentially nothing more. It also implements the basic preprocessing logic behind the expansion of discrete treatments into onehot encodings.
 Parameters
discrete_treatment (bool) – Whether the treatment values should be treated as categorical, rather than continuous, quantities
discrete_instrument (bool) – Whether the instrument values should be treated as categorical, rather than continuous, quantities
categories (‘auto’ or list) – The categories to use when encoding discrete treatments (or ‘auto’ to use the unique sorted values). The first category will be treated as the control treatment.
cv (int, crossvalidation generator or an iterable) – Determines the crossvalidation splitting strategy. Possible inputs for cv are:
None, to use the default 3fold crossvalidation,
integer, to specify the number of folds.
An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the treatment is discrete
StratifiedKFold
is used, else,KFold
is used (with a random shuffle in either case).Unless an iterable is used, we call split(concat[Z, W, X], T) to generate the splits. If all Z, W, X are None, then we call split(ones((T.shape[0], 1)), T).
random_state (int,
RandomState
instance or None) – If int, random_state is the seed used by the random number generator; IfRandomState
instance, random_state is the random number generator; If None, the random number generator is theRandomState
instance used bynp.random
.mc_iters (int, optional (default=None)) – The number of times to rerun the first stage models to reduce the variance of the nuisances.
mc_agg ({‘mean’, ‘median’}, optional (default=’mean’)) – How to aggregate the nuisance value for each sample across the mc_iters monte carlo iterations of crossfitting.
Examples
The example code below implements a very simple version of the double machine learning method on top of the
_OrthoLearner
class, for expository purposes. For a more elaborate implementation of a Double Machine Learning child class of the class_OrthoLearner
check outDML
and its child classes:import numpy as np from sklearn.linear_model import LinearRegression from econml._ortho_learner import _OrthoLearner class ModelNuisance: def __init__(self, model_t, model_y): self._model_t = model_t self._model_y = model_y def fit(self, Y, T, W=None): self._model_t.fit(W, T) self._model_y.fit(W, Y) return self def predict(self, Y, T, W=None): return Y  self._model_y.predict(W), T  self._model_t.predict(W) class ModelFinal: def __init__(self): return def fit(self, Y, T, W=None, nuisances=None): Y_res, T_res = nuisances self.model = LinearRegression(fit_intercept=False).fit(T_res.reshape(1, 1), Y_res) return self def predict(self, X=None): return self.model.coef_[0] def score(self, Y, T, W=None, nuisances=None): Y_res, T_res = nuisances return np.mean((Y_res  self.model.predict(T_res.reshape(1, 1)))**2) class OrthoLearner(_OrthoLearner): def _gen_ortho_learner_model_nuisance(self): return ModelNuisance(LinearRegression(), LinearRegression()) def _gen_ortho_learner_model_final(self): return ModelFinal() np.random.seed(123) X = np.random.normal(size=(100, 3)) y = X[:, 0] + X[:, 1] + np.random.normal(0, 0.1, size=(100,)) est = OrthoLearner(cv=2, discrete_treatment=False, discrete_instrument=False, categories='auto', random_state=None) est.fit(y, X[:, 0], W=X[:, 1:])
>>> est.score_ 0.00756830... >>> est.const_marginal_effect() 1.02364992... >>> est.effect() array([1.023649...]) >>> est.effect(T0=0, T1=10) array([10.236499...]) >>> est.score(y, X[:, 0], W=X[:, 1:]) 0.00727995... >>> est.ortho_learner_model_final_.model LinearRegression(fit_intercept=False) >>> est.ortho_learner_model_final_.model.coef_ array([1.023649...])
The following example shows how to do double machine learning with discrete treatments, using the _OrthoLearner:
class ModelNuisance: def __init__(self, model_t, model_y): self._model_t = model_t self._model_y = model_y def fit(self, Y, T, W=None): self._model_t.fit(W, np.matmul(T, np.arange(1, T.shape[1]+1))) self._model_y.fit(W, Y) return self def predict(self, Y, T, W=None): return Y  self._model_y.predict(W), T  self._model_t.predict_proba(W)[:, 1:] class ModelFinal: def __init__(self): return def fit(self, Y, T, W=None, nuisances=None): Y_res, T_res = nuisances self.model = LinearRegression(fit_intercept=False).fit(T_res.reshape(1, 1), Y_res) return self def predict(self): # theta needs to be of dimension (1, d_t) if T is (n, d_t) return np.array([[self.model.coef_[0]]]) def score(self, Y, T, W=None, nuisances=None): Y_res, T_res = nuisances return np.mean((Y_res  self.model.predict(T_res.reshape(1, 1)))**2) from sklearn.linear_model import LogisticRegression class OrthoLearner(_OrthoLearner): def _gen_ortho_learner_model_nuisance(self): return ModelNuisance(LogisticRegression(solver='lbfgs'), LinearRegression()) def _gen_ortho_learner_model_final(self): return ModelFinal() np.random.seed(123) W = np.random.normal(size=(100, 3)) import scipy.special T = np.random.binomial(1, scipy.special.expit(W[:, 0])) y = T + W[:, 0] + np.random.normal(0, 0.01, size=(100,)) est = OrthoLearner(cv=2, discrete_treatment=True, discrete_instrument=False, categories='auto', random_state=None) est.fit(y, T, W=W)
>>> est.score_ 0.00673015... >>> est.const_marginal_effect() array([[1.008401...]]) >>> est.effect() array([1.008401...]) >>> est.score(y, T, W=W) 0.00310431... >>> est.ortho_learner_model_final_.model.coef_[0] 1.00840170...

models_nuisance_
¶ A nested list of instances of the model_nuisance object. The number of sublist equals to the number of monte carlo iterations. Each element in the sublist corresponds to a crossfitting fold and is the model instance that was fitted for that training fold.
 Type
nested list of objects of type(model_nuisance)

ortho_learner_model_final_
¶ An instance of the model_final object that was fitted after calling fit.
 Type
object of type(model_final)

score_
¶ If the model_final has a score method, then score_ contains the outcome of the final model score when evaluated on the fitted nuisances from the first stage. Represents goodness of fit, of the final CATE model.
 Type
float or array of floats

nuisance_scores_
¶ The outofsample scores from training each nuisance model
 Type
tuple of nested lists of floats or None

ate
(X=None, *, T0=0, T1=1)¶ Calculate the average treatment effect \(E_X[\tau(X, T0, T1)]\).
The effect is calculated between the two treatment points and is averaged over the population of X variables.
 Parameters
T0 ((m, d_t) matrix or vector of length m) – Base treatments for each sample
T1 ((m, d_t) matrix or vector of length m) – Target treatments for each sample
X (optional (m, d_x) matrix) – Features for each sample
 Returns
τ – Average treatment effects on each outcome Note that when Y is a vector rather than a 2dimensional array, the result will be a scalar
 Return type
float or (d_y,) array

ate_inference
(X=None, *, T0=0, T1=1)¶ Inference results for the quantity \(E_X[\tau(X, T0, T1)]\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix) – Features for each sample
T0 (optional (m, d_t) matrix or vector of length m (Default=0)) – Base treatments for each sample
T1 (optional (m, d_t) matrix or vector of length m (Default=1)) – Target treatments for each sample
 Returns
PopulationSummaryResults – The inference results instance contains prediction and prediction standard error and can on demand calculate confidence interval, z statistic and p value. It can also output a dataframe summary of these inference results.
 Return type

ate_interval
(X=None, *, T0=0, T1=1, alpha=0.1)¶ Confidence intervals for the quantity \(E_X[\tau(X, T0, T1)]\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix) – Features for each sample
T0 (optional (m, d_t) matrix or vector of length m (Default=0)) – Base treatments for each sample
T1 (optional (m, d_t) matrix or vector of length m (Default=1)) – Target treatments for each sample
alpha (optional float in [0, 1] (Default=0.1)) – The overall level of confidence of the reported interval. The alpha/2, 1alpha/2 confidence interval is reported.
 Returns
lower, upper – The lower and the upper bounds of the confidence interval for each quantity.
 Return type
tuple(type of
ate(X, T0, T1)
, type ofate(X, T0, T1))
)

cate_feature_names
(feature_names=None)¶ Public interface for getting feature names.
To be overriden by estimators that apply transformations the input features.
 Parameters
feature_names (list of strings of length X.shape[1] or None) – The names of the input features. If None and X is a dataframe, it defaults to the column names from the dataframe.
 Returns
out_feature_names – Returns feature names.
 Return type
list of strings or None

cate_output_names
(output_names=None)¶ Public interface for getting output names.
To be overriden by estimators that apply transformations the outputs.
 Parameters
output_names (list of strings of length Y.shape[1] or None) – The names of the outcomes. If None and the Y passed to fit was a dataframe, it defaults to the column names from the dataframe.
 Returns
output_names – Returns output names.
 Return type
list of strings

cate_treatment_names
(treatment_names=None)¶ Get treatment names.
If the treatment is discrete, it will return expanded treatment names.
 Parameters
treatment_names (list of strings of length T.shape[1] or None) – The names of the treatments. If None and the T passed to fit was a dataframe, it defaults to the column names from the dataframe.
 Returns
out_treatment_names – Returns (possibly expanded) treatment names.
 Return type
list of strings

const_marginal_ate
(X=None)¶ Calculate the average constant marginal CATE \(E_X[\theta(X)]\).
 Parameters
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample.
 Returns
theta – Average constant marginal CATE of each treatment on each outcome. Note that when Y or T is a vector rather than a 2dimensional array, the corresponding singleton dimensions in the output will be collapsed (e.g. if both are vectors, then the output of this method will be a scalar)
 Return type
(d_y, d_t) matrix

const_marginal_ate_inference
(X=None)¶ Inference results for the quantities \(E_X[\theta(X)]\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
 Returns
PopulationSummaryResults – The inference results instance contains prediction and prediction standard error and can on demand calculate confidence interval, z statistic and p value. It can also output a dataframe summary of these inference results.
 Return type

const_marginal_ate_interval
(X=None, *, alpha=0.1)¶ Confidence intervals for the quantities \(E_X[\theta(X)]\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
alpha (optional float in [0, 1] (Default=0.1)) – The overall level of confidence of the reported interval. The alpha/2, 1alpha/2 confidence interval is reported.
 Returns
lower, upper – The lower and the upper bounds of the confidence interval for each quantity.
 Return type
tuple(type of
const_marginal_ate(X)
, type ofconst_marginal_ate(X)
)

const_marginal_effect
(X=None)[source]¶ Calculate the constant marginal CATE \(\theta(·)\).
The marginal effect is conditional on a vector of features on a set of m test samples X[i].
 Parameters
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample.
 Returns
theta – Constant marginal CATE of each treatment on each outcome for each sample X[i]. Note that when Y or T is a vector rather than a 2dimensional array, the corresponding singleton dimensions in the output will be collapsed (e.g. if both are vectors, then the output of this method will also be a vector)
 Return type
(m, d_y, d_t) matrix or (d_y, d_t) matrix if X is None

const_marginal_effect_inference
(X=None)[source]¶ Inference results for the quantities \(\theta(X)\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
 Returns
InferenceResults – The inference results instance contains prediction and prediction standard error and can on demand calculate confidence interval, z statistic and p value. It can also output a dataframe summary of these inference results.
 Return type

const_marginal_effect_interval
(X=None, *, alpha=0.1)[source]¶ Confidence intervals for the quantities \(\theta(X)\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
alpha (optional float in [0, 1] (Default=0.1)) – The overall level of confidence of the reported interval. The alpha/2, 1alpha/2 confidence interval is reported.
 Returns
lower, upper – The lower and the upper bounds of the confidence interval for each quantity.
 Return type
tuple(type of
const_marginal_effect(X)
, type ofconst_marginal_effect(X)
)

effect
(X=None, *, T0=0, T1=1)¶ Calculate the heterogeneous treatment effect \(\tau(X, T0, T1)\).
The effect is calculated between the two treatment points conditional on a vector of features on a set of m test samples \(\{T0_i, T1_i, X_i\}\).
 Parameters
T0 ((m, d_t) matrix or vector of length m) – Base treatments for each sample
T1 ((m, d_t) matrix or vector of length m) – Target treatments for each sample
X (optional (m, d_x) matrix) – Features for each sample
 Returns
τ – Heterogeneous treatment effects on each outcome for each sample Note that when Y is a vector rather than a 2dimensional array, the corresponding singleton dimension will be collapsed (so this method will return a vector)
 Return type
(m, d_y) matrix

effect_inference
(X=None, *, T0=0, T1=1)[source]¶ Inference results for the quantities \(\tau(X, T0, T1)\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix) – Features for each sample
T0 (optional (m, d_t) matrix or vector of length m (Default=0)) – Base treatments for each sample
T1 (optional (m, d_t) matrix or vector of length m (Default=1)) – Target treatments for each sample
 Returns
InferenceResults – The inference results instance contains prediction and prediction standard error and can on demand calculate confidence interval, z statistic and p value. It can also output a dataframe summary of these inference results.
 Return type

effect_interval
(X=None, *, T0=0, T1=1, alpha=0.1)[source]¶ Confidence intervals for the quantities \(\tau(X, T0, T1)\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
X (optional (m, d_x) matrix) – Features for each sample
T0 (optional (m, d_t) matrix or vector of length m (Default=0)) – Base treatments for each sample
T1 (optional (m, d_t) matrix or vector of length m (Default=1)) – Target treatments for each sample
alpha (optional float in [0, 1] (Default=0.1)) – The overall level of confidence of the reported interval. The alpha/2, 1alpha/2 confidence interval is reported.
 Returns
lower, upper – The lower and the upper bounds of the confidence interval for each quantity.
 Return type
tuple(type of
effect(X, T0, T1)
, type ofeffect(X, T0, T1))
)

fit
(Y, T, X=None, W=None, Z=None, *, sample_weight=None, freq_weight=None, sample_var=None, groups=None, cache_values=False, inference=None, only_final=False, check_input=True)[source]¶ Estimate the counterfactual model from data, i.e. estimates function \(\theta(\cdot)\).
 Parameters
Y ((n, d_y) matrix or vector of length n) – Outcomes for each sample
T ((n, d_t) matrix or vector of length n) – Treatments for each sample
X (optional (n, d_x) matrix or None (Default=None)) – Features for each sample
W (optional (n, d_w) matrix or None (Default=None)) – Controls for each sample
Z (optional (n, d_z) matrix or None (Default=None)) – Instruments for each sample
sample_weight ((n,) array like, default None) – Individual weights for each sample. If None, it assumes equal weight.
freq_weight ((n, ) array like of integers, default None) – Weight for the observation. Observation i is treated as the mean outcome of freq_weight[i] independent observations. When
sample_var
is not None, this should be provided.sample_var ({(n,), (n, d_y)} nd array like, default None) – Variance of the outcome(s) of the original freq_weight[i] observations that were used to compute the mean outcome represented by observation i.
groups ((n,) vector, optional) – All rows corresponding to the same group will be kept together during splitting. If groups is not None, the cv argument passed to this class’s initializer must support a ‘groups’ argument to its split method.
cache_values (bool, default False) – Whether to cache the inputs and computed nuisances, which will allow refitting a different final model
inference (string,
Inference
instance, or None) – Method for performing inference. This estimator supports ‘bootstrap’ (or an instance ofBootstrapInference
).only_final (bool, defaul False) – Whether to fit the nuisance models or use the existing cached values Note. This parameter is only used internally by the refit method and should not be exposed publicly by overwrites of the fit method in public classes.
check_input (bool, default True) – Whether to check if the input is valid Note. This parameter is only used internally by the refit method and should not be exposed publicly by overwrites of the fit method in public classes.
 Returns
self
 Return type

marginal_ate
(T, X=None)¶ Calculate the average marginal effect \(E_{T, X}[\partial\tau(T, X)]\).
The marginal effect is calculated around a base treatment point and averaged over the population of X.
 Parameters
T ((m, d_t) matrix) – Base treatments for each sample
X (optional (m, d_x) matrix) – Features for each sample
 Returns
grad_tau – Average marginal effects on each outcome Note that when Y or T is a vector rather than a 2dimensional array, the corresponding singleton dimensions in the output will be collapsed (e.g. if both are vectors, then the output of this method will be a scalar)
 Return type
(d_y, d_t) array

marginal_ate_inference
(T, X=None)¶ Inference results for the quantities \(E_{T,X}[\partial \tau(T, X)]\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
T ((m, d_t) matrix) – Base treatments for each sample
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
 Returns
PopulationSummaryResults – The inference results instance contains prediction and prediction standard error and can on demand calculate confidence interval, z statistic and p value. It can also output a dataframe summary of these inference results.
 Return type

marginal_ate_interval
(T, X=None, *, alpha=0.1)¶ Confidence intervals for the quantities \(E_{T,X}[\partial \tau(T, X)]\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
T ((m, d_t) matrix) – Base treatments for each sample
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
alpha (optional float in [0, 1] (Default=0.1)) – The overall level of confidence of the reported interval. The alpha/2, 1alpha/2 confidence interval is reported.
 Returns
lower, upper – The lower and the upper bounds of the confidence interval for each quantity.
 Return type
tuple(type of
marginal_ate(T, X)
, type ofmarginal_ate(T, X)
)

marginal_effect
(T, X=None)¶ Calculate the heterogeneous marginal effect \(\partial\tau(T, X)\).
The marginal effect is calculated around a base treatment point conditional on a vector of features on a set of m test samples \(\{T_i, X_i\}\). Since this class assumes a linear model, the base treatment is ignored in this calculation.
 Parameters
T ((m, d_t) matrix) – Base treatments for each sample
X (optional (m, d_x) matrix) – Features for each sample
 Returns
grad_tau – Heterogeneous marginal effects on each outcome for each sample Note that when Y or T is a vector rather than a 2dimensional array, the corresponding singleton dimensions in the output will be collapsed (e.g. if both are vectors, then the output of this method will also be a vector)
 Return type
(m, d_y, d_t) array

marginal_effect_inference
(T, X=None)¶ Inference results for the quantities \(\partial \tau(T, X)\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
T ((m, d_t) matrix) – Base treatments for each sample
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
 Returns
InferenceResults – The inference results instance contains prediction and prediction standard error and can on demand calculate confidence interval, z statistic and p value. It can also output a dataframe summary of these inference results.
 Return type

marginal_effect_interval
(T, X=None, *, alpha=0.1)¶ Confidence intervals for the quantities \(\partial \tau(T, X)\) produced by the model. Available only when
inference
is notNone
, when calling the fit method. Parameters
T ((m, d_t) matrix) – Base treatments for each sample
X (optional (m, d_x) matrix or None (Default=None)) – Features for each sample
alpha (optional float in [0, 1] (Default=0.1)) – The overall level of confidence of the reported interval. The alpha/2, 1alpha/2 confidence interval is reported.
 Returns
lower, upper – The lower and the upper bounds of the confidence interval for each quantity.
 Return type
tuple(type of
marginal_effect(T, X)
, type ofmarginal_effect(T, X)
)

refit_final
(inference=None)[source]¶ Estimate the counterfactual model using a new final model specification but with cached first stage results.
In order for this to succeed,
fit
must have been called withcache_values=True
. This call will only refit the final model. This call we use the current setting of any parameters that change the final stage estimation. If any parameters that change how the first stage nuisance estimates has also been changed then it will have no effect. You need to call fit again to change the first stage estimation results. Parameters
inference (inference method, optional) – The string or object that represents the inference method
 Returns
self – This instance
 Return type

score
(Y, T, X=None, W=None, Z=None, sample_weight=None)[source]¶ Score the fitted CATE model on a new data set. Generates nuisance parameters for the new data set based on the fitted nuisance models created at fit time. It uses the mean prediction of the models fitted by the different crossfit folds under different iterations. Then calls the score function of the model_final and returns the calculated score. The model_final model must have a score method.
If model_final does not have a score method, then it raises an
AttributeError
 Parameters
Y ((n, d_y) matrix or vector of length n) – Outcomes for each sample
T ((n, d_t) matrix or vector of length n) – Treatments for each sample
X (optional (n, d_x) matrix or None (Default=None)) – Features for each sample
W (optional (n, d_w) matrix or None (Default=None)) – Controls for each sample
Z (optional (n, d_z) matrix or None (Default=None)) – Instruments for each sample
sample_weight (optional(n,) vector or None (Default=None)) – Weights for each samples
 Returns
score – The score of the final CATE model on the new data. Same type as the return type of the model_final.score method.
 Return type
float or (array of float)

shap_values
(X, *, feature_names=None, treatment_names=None, output_names=None, background_samples=100)¶ Shap value for the final stage models (const_marginal_effect)
 Parameters
X ((m, d_x) matrix) – Features for each sample. Should be in the same shape of fitted X in final stage.
feature_names (optional None or list of strings of length X.shape[1] (Default=None)) – The names of input features.
treatment_names (optional None or list (Default=None)) – The name of treatment. In discrete treatment scenario, the name should not include the name of the baseline treatment (i.e. the control treatment, which by default is the alphabetically smaller)
output_names (optional None or list (Default=None)) – The name of the outcome.
background_samples (int or None, (Default=100)) – How many samples to use to compute the baseline effect. If None then all samples are used.
 Returns
shap_outs – A nested dictionary by using each output name (e.g. ‘Y0’, ‘Y1’, … when output_names=None) and each treatment name (e.g. ‘T0’, ‘T1’, … when treatment_names=None) as key and the shap_values explanation object as value. If the input data at fit time also contain metadata, (e.g. are pandas DataFrames), then the column metatdata for the treatments, outcomes and features are used instead of the above defaults (unless the user overrides with explicitly passing the corresponding names).
 Return type
nested dictionary of Explanation object

property
dowhy
¶ Get an instance of
DoWhyWrapper
to allow other functionalities from dowhy package. (e.g. causal graph, refutation test, etc.) Returns
DoWhyWrapper – An instance of
DoWhyWrapper
 Return type
instance

econml._ortho_learner.
_crossfit
(model, folds, *args, **kwargs)[source]¶ General crossfit based calculation of nuisance parameters.
 Parameters
model (object) – An object that supports fit and predict. Fit must accept all the args and the keyword arguments kwargs. Similarly predict must all accept all the args as arguments and kwards as keyword arguments. The fit function estimates a model of the nuisance function, based on the input data to fit. Predict evaluates the fitted nuisance function on the input data to predict.
folds (list of tuples or None) – The crossfitting fold structure. Every entry in the list is a tuple whose first element are the training indices of the args and kwargs data and the second entry are the test indices. If the union of the test indices is not the full set of all indices, then the remaining nuisance parameters for the missing indices have value NaN. If folds is None, then cross fitting is not performed; all indices are used for both model fitting and prediction
args (a sequence of (numpy matrices or None)) – Each matrix is a data variable whose first index corresponds to a sample
kwargs (a sequence of keyvalue args, with values being (numpy matrices or None)) – Each keyword argument is of the form Var=x, with x a numpy array. Each of these arrays are data variables. The model fit and predict will be called with signature: model.fit(*args, **kwargs) and model.predict(*args, **kwargs). Keyvalue arguments that have value None, are ommitted from the two calls. So all the args and the non None kwargs variables must be part of the models signature.
 Returns
nuisances (tuple of numpy matrices) – Each entry in the tuple is a nuisance parameter matrix. Each row ith in the matrix corresponds to the value of the nuisance parameter for the ith input sample.
model_list (list of objects of same type as input model) – The cloned and fitted models for each fold. Can be used for inspection of the variability of the fitted models across folds.
fitted_inds (np array1d) – The indices of the arrays for which the nuisance value was calculated. This corresponds to the union of the indices of the test part of each fold in the input fold list.
scores (tuple of list of float or None) – The outofsample model scores for each nuisance model
Examples
import numpy as np from sklearn.model_selection import KFold from sklearn.linear_model import Lasso from econml._ortho_learner import _crossfit class Wrapper: def __init__(self, model): self._model = model def fit(self, X, y, W=None): self._model.fit(X, y) return self def predict(self, X, y, W=None): return self._model.predict(X) np.random.seed(123) X = np.random.normal(size=(5000, 3)) y = X[:, 0] + np.random.normal(size=(5000,)) folds = list(KFold(2).split(X, y)) model = Lasso(alpha=0.01) nuisance, model_list, fitted_inds, scores = _crossfit(Wrapper(model), folds, X, y, W=y, Z=None)
>>> nuisance (array([1.105728... , 1.537566..., 2.451827... , ..., 1.106287..., 1.829662..., 1.782273...]),) >>> model_list [<Wrapper object at 0x...>, <Wrapper object at 0x...>] >>> fitted_inds array([ 0, 1, 2, ..., 4997, 4998, 4999])