econml.utilities

Utility methods.

Functions

add_intercept(X)

Adds an intercept feature to an array by prepending a column of ones.

broadcast_unit_treatments(X, d_t)

Generate d_t unit treatments for each row of X.

check_high_dimensional(X, T, *, threshold[, …])

check_input_arrays(*args[, validate_len, …])

Cast input sequences into numpy arrays.

check_inputs(Y, T, X[, W, multi_output_T, …])

Input validation for CATE estimators.

check_models(models, n)

Input validation for metalearner models.

concatenate(XS[, axis])

Join a sequence of arrays along an existing axis.

cross_product(*XS)

Compute the cross product of features.

deprecated(message[, category])

Enables decorating a method or class to providing a warning when it is used.

einsum_sparse(subscripts, *arrs)

Evaluate the Einstein summation convention on the operands.

filter_none_kwargs(**kwargs)

Filters out any keyword arguments that are None.

fit_with_groups(model, X, y[, groups])

Fit a model while correctly handling grouping if necessary.

get_feature_names_or_default(featurizer, …)

get_input_columns(X[, prefix])

Extracts column names from dataframe-like input object.

hstack(XS)

Stack arrays in sequence horizontally (column wise).

inverse_onehot(T)

Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices.

iscoo(X)

Determine whether an input is a sparse.COO array.

issparse(X)

Determine whether an input is sparse.

ndim(X)

Return the number of array dimensions.

parse_final_model_params(coef, intercept, …)

reshape(X, shape)

Return a new array that is a reshaped version of an input array.

reshape_Y_T(Y, T)

Reshapes Y and T when Y.ndim = 2 and/or T.ndim = 1.

reshape_arrays_2dim(length, *args)

Reshape the input arrays as two dimensional.

reshape_treatmentwise_effects(A, d_t, d_y)

Given an effects matrix ordered first by treatment, transform it to be ordered by outcome.

shape(X)

Return a tuple of array dimensions.

size(X)

Return the number of elements in the array.

stack(XS[, axis])

Join a sequence of arrays along a new axis.

tensordot(X1, X2, axes)

Compute tensor dot product along specified axes for arrays >= 1-D.

tocoo(X)

Convert an array to a sparse COO array.

todense(X)

Convert an array to a dense numpy array.

transpose(X[, axes])

Permute the dimensions of an array.

transpose_dictionary(d)

Transpose a dictionary of dictionaries, bringing the keys from the second level to the top and vice versa

vstack(XS)

Stack arrays in sequence vertically (row wise).

Classes

IdentityFeatures()

Featurizer that just returns the input data.

LassoCVWrapper(**kwargs)

Helper class to wrap either LassoCV or MultiTaskLassoCV depending on the shape of the target.

MissingModule(msg, exn)

Placeholder to stand in for a module that couldn’t be imported, delaying ImportErrors until use.

MultiModelWrapper([model_list])

Helper class for training different models for each treatment.

SeparateModel(*models)

Splits the data based on the last feature and trains a separate model for each subsample.

Summary()

Result summary

WeightedModelWrapper(model_instance[, …])

Helper class for assiging weights to models without this option.

class econml.utilities.IdentityFeatures[source]

Bases: sklearn.base.TransformerMixin

Featurizer that just returns the input data.

fit(X)[source]

Fit method (does nothing, just returns self).

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

transform(X)[source]

Perform the identity transform, which returns the input unmodified.

class econml.utilities.LassoCVWrapper(**kwargs)[source]

Bases: object

Helper class to wrap either LassoCV or MultiTaskLassoCV depending on the shape of the target.

class econml.utilities.MissingModule(msg, exn)[source]

Bases: object

Placeholder to stand in for a module that couldn’t be imported, delaying ImportErrors until use.

Parameters
  • msg (string) – The message to display when an attempt to access a module memeber is made

  • exn (ImportError) – The original ImportError to pass as the source of the exception

class econml.utilities.MultiModelWrapper(model_list=[])[source]

Bases: object

Helper class for training different models for each treatment.

Parameters

model_list (array-like, shape (n_T, )) – List of models to be trained separately for each treatment group.

fit(Xt, y, sample_weight=None)[source]

Fit underlying list of models with weighted inputs.

Parameters
  • X (array-like, shape (n_samples, n_features + n_treatments)) – Training data. The last n_T columns should be a one-hot encoding of the treatment assignment.

  • y (array-like, shape (n_samples, )) – Target values.

Returns

self

Return type

an instance of the class

predict(Xt)[source]

Predict using the linear model.

Parameters

X (array-like, shape (n_samples, n_features + n_treatments)) – Samples. The last n_T columns should be a one-hot encoding of the treatment assignment.

Returns

C – Returns predicted values.

Return type

array, shape (n_samples, )

class econml.utilities.SeparateModel(*models)[source]

Bases: object

Splits the data based on the last feature and trains a separate model for each subsample. At predict time, it uses the last feature to choose which model to use to predict.

class econml.utilities.Summary[source]

Bases: object

Result summary

Construction does not take any parameters. Tables and text can be added with the add_ methods.

tables

Contains the list of SimpleTable instances, horizontally concatenated tables are not saved separately.

Type

list of tables

extra_txt

extra lines that are added to the text output, used for warnings and explanations.

Type

str

add_extra_txt(etext)[source]

add additional text that will be added at the end in text format

Parameters

etext (list[str]) – string with lines that are added to the text output.

as_csv()[source]

return tables as string

Returns

csv – concatenated summary tables in comma delimited format

Return type

str

as_html()[source]

return tables as string

Returns

html – concatenated summary tables in HTML format

Return type

str

as_latex()[source]

return tables as string

Returns

latex – summary tables and extra text as string of Latex

Return type

str

Notes

This currently merges tables with different number of columns. It is recommended to use as_latex_tabular directly on the individual tables.

as_text()[source]

return tables as string

Returns

txt – summary tables and extra text as one string

Return type

str

class econml.utilities.WeightedModelWrapper(model_instance, sample_type='weighted')[source]

Bases: object

Helper class for assiging weights to models without this option.

Parameters
  • model_instance (estimator) – Model that requires weights.

  • sample_type (string, optional (default=`weighted`)) – Method for adding weights to the model. weighted for linear regression models where the weights can be incorporated in the matrix multiplication, sampled for other models. sampled samples the training set according to the normalized weights and creates a dataset larger than the original.

fit(X, y, sample_weight=None)[source]

Fit underlying model instance with weighted inputs.

Parameters
  • X (array-like, shape (n_samples, n_features)) – Training data.

  • y (array-like, shape (n_samples, n_outcomes)) – Target values.

Returns

self

Return type

an instance of the underlying estimator.

predict(X)[source]

Predict using the linear model.

Parameters

X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.

Returns

C – Returns predicted values.

Return type

array, shape (n_samples, n_outcomes)

econml.utilities.add_intercept(X)[source]

Adds an intercept feature to an array by prepending a column of ones.

Parameters

X (array-like) – Input array. Must be 2D.

Returns

arrX with a column of ones prepended

Return type

ndarray

econml.utilities.broadcast_unit_treatments(X, d_t)[source]

Generate d_t unit treatments for each row of X.

Parameters
  • d_t (int) – Number of treatments

  • X (array) – Features

Returns

X, T – The updated X array (with each row repeated d_t times), and the generated T array

Return type

(array, array)

econml.utilities.check_input_arrays(*args, validate_len=True, force_all_finite=True)[source]

Cast input sequences into numpy arrays.

Only inputs that are sequence-like will be converted, all other inputs will be left as is. When validate_len is True, the sequences will be checked for equal length.

Parameters
  • args (scalar or array_like) – Inputs to be checked.

  • validate_len (bool (default=True)) – Whether to check if the input arrays have the same length.

  • force_all_finite (bool (default=True)) – Whether to allow inf and nan in input arrays.

Returns

args – List of inputs where sequence-like objects have been cast to numpy arrays.

Return type

array-like

econml.utilities.check_inputs(Y, T, X, W=None, multi_output_T=True, multi_output_Y=True)[source]

Input validation for CATE estimators.

Checks Y, T, X, W for consistent length, enforces X, W 2d. Standard input checks are only applied to all inputs, such as checking that an input does not have np.nan or np.inf targets. Converts regular Python lists to numpy arrays.

Parameters
  • Y (array_like, shape (n, ) or (n, d_y)) – Outcome for the treatment policy.

  • T (array_like, shape (n, ) or (n, d_t)) – Treatment policy.

  • X (array-like, shape (n, d_x)) – Feature vector that captures heterogeneity.

  • W (array-like, shape (n, d_w) or None (default=None)) – High-dimensional controls.

  • multi_output_T (bool) – Whether to allow more than one treatment.

  • multi_output_Y (bool) – Whether to allow more than one outcome.

Returns

  • Y (array_like, shape (n, ) or (n, d_y)) – Converted and validated Y.

  • T (array_like, shape (n, ) or (n, d_t)) – Converted and validated T.

  • X (array-like, shape (n, d_x)) – Converted and validated X.

  • W (array-like, shape (n, d_w) or None (default=None)) – Converted and validated W.

econml.utilities.check_models(models, n)[source]

Input validation for metalearner models.

Check whether the input models satisfy the criteria below.

Parameters
  • models : estimator or a list/tuple of estimators

  • n (int) – Number of models needed

Returns

models

Return type

a list/tuple of estimators

econml.utilities.concatenate(XS, axis=0)[source]

Join a sequence of arrays along an existing axis.

Parameters
  • X1, X2, … (sequence of array_like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).

  • axis (int, optional) – The axis along which the arrays will be joined. Default is 0.

Returns

The concatenated array. It will be sparse if the inputs are.

Return type

ndarray or SparseArray

econml.utilities.cross_product(*XS)[source]

Compute the cross product of features.

Parameters
  • X1 (n x d1 matrix) – First matrix of n samples of d1 features (or an n-element vector, which will be treated as an n x 1 matrix)

  • X2 (n x d2 matrix) – Second matrix of n samples of d2 features (or an n-element vector, which will be treated as an n x 1 matrix)

Returns

A – Matrix of n samples of d1*d2*… cross product features, arranged in form such that each row t of X12 contains: [X1[t,0]*X2[t,0]*…, …, X1[t,d1-1]*X2[t,0]*…, X1[t,0]*X2[t,1]*…, …, X1[t,d1-1]*X2[t,1]*…, …]

Return type

n x (d1*d2*…) matrix

econml.utilities.deprecated(message, category=<class 'FutureWarning'>)[source]

Enables decorating a method or class to providing a warning when it is used.

Parameters
  • message (string) – The deprecation message to use

  • category (optional type, default FutureWarning) – The warning category to use

econml.utilities.einsum_sparse(subscripts, *arrs)[source]

Evaluate the Einstein summation convention on the operands.

Using the Einstein summation convention, many common multi-dimensional array operations can be represented in a simple fashion. This function provides a way to compute such summations.

Parameters
  • subscripts (str) – Specifies the subscripts for summation. Unlike np.eisnum elipses are not supported and the output must be explicitly included

  • arrs (list of COO arrays) – These are the sparse arrays for the operation.

Returns

The sparse array calculated based on the Einstein summation convention.

Return type

SparseArray

econml.utilities.filter_none_kwargs(**kwargs)[source]

Filters out any keyword arguments that are None.

This is useful when specific optional keyword arguments might not be universally supported, so that stripping them out when they are not set enables more uses to succeed.

Parameters

kwargs (dict) – The keyword arguments to filter

Returns

filtered_kwargs – The input dictionary, but with all entries having value None removed

Return type

dict

econml.utilities.fit_with_groups(model, X, y, groups=None, **kwargs)[source]

Fit a model while correctly handling grouping if necessary.

This enables us to perform an inner-loop cross-validation of a model which handles grouping correctly, which is not easy using typical sklearn models.

For example, GridSearchCV and RandomSearchCV both support passing ‘groups’ to fit, but other CV-related estimators (such as those derived from LinearModelCV, including LassoCV), do not support passing groups to fit which meanst that GroupKFold cannot be used as the cv instance when using these types, because the required ‘groups’ argument will never be passed to the GroupKFold’s split method. See also https://github.com/scikit-learn/scikit-learn/issues/12052

The (hacky) workaround that is used here is to explicitly set the ‘cv’ attribute (if there is one) to the exact set of rows and not to use GroupKFold even with the sklearn classes that could support it; this should work with classes derived from BaseSearchCV, LinearModelCV, and CalibratedClassifierCV.

Parameters
  • model (estimator) – The model to fit

  • X (array-like) – The features to fit against

  • y (array-like) – The target to fit against

  • groups (array-like, optional) – The set of groupings that should be kept together when splitting rows for cross-validation

  • kwargs (dict) – Any other named arguments to pass to the model’s fit

econml.utilities.get_input_columns(X, prefix='X')[source]

Extracts column names from dataframe-like input object.

Currently supports column name extraction from pandas DataFrame and Series objects.

Parameters
  • X (array_like or None) – Input array with column names to be extracted.

  • prefix (string or None) – If input array doesn’t have column names, a default using the naming scheme “{prefix}{column number}” will be returned.

Returns

cols – List of columns corresponding to the dataframe-like object. None if the input array is not in the supported types.

Return type

array-like or None

econml.utilities.hstack(XS)[source]

Stack arrays in sequence horizontally (column wise).

This is equivalent to concatenation along the second axis

Parameters

XS (sequence of ndarrays) – The arrays must have the same shape along all but the second axis.

Returns

The array formed by stacking the given arrays. It will be sparse if the inputs are.

Return type

ndarray or SparseArray

econml.utilities.inverse_onehot(T)[source]

Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices.

Note that we assume that the first column has been removed from the input.

Parameters

T (array (shape (n, d_t-1))) – The one-hot-encoded array

Returns

A – The un-encoded 0-based category indices

Return type

vector of int (shape (n,))

econml.utilities.iscoo(X)[source]

Determine whether an input is a sparse.COO array.

Parameters

X (array-like) – The input to check

Returns

Whether the input is a COO array

Return type

bool

econml.utilities.issparse(X)[source]

Determine whether an input is sparse.

For the purposes of this function, both scipy.sparse matrices and sparse.SparseArray types are considered sparse.

Parameters

X (array-like) – The input to check

Returns

Whether the input is sparse

Return type

bool

econml.utilities.ndim(X)[source]

Return the number of array dimensions.

econml.utilities.reshape(X, shape)[source]

Return a new array that is a reshaped version of an input array.

The output will be sparse iff the input is.

Parameters
  • X (array_like) – The array to reshape

  • shape (tuple of ints) – The desired shape of the output array

Returns

The reshaped output array

Return type

ndarray or SparseArray

econml.utilities.reshape_Y_T(Y, T)[source]

Reshapes Y and T when Y.ndim = 2 and/or T.ndim = 1.

Parameters
  • Y (array_like, shape (n, ) or (n, 1)) – Outcome for the treatment policy. Must be a vector or single-column matrix.

  • T (array_like, shape (n, ) or (n, d_t)) – Treatment policy.

Returns

  • Y (array_like, shape (n, )) – Flattened outcome for the treatment policy.

  • T (array_like, shape (n, 1) or (n, d_t)) – Reshaped treatment policy.

econml.utilities.reshape_arrays_2dim(length, *args)[source]

Reshape the input arrays as two dimensional. If None, will be reshaped as (n, 0).

Parameters
  • length (scalar) – Number of samples

  • args (arrays) – Inputs to be reshaped

Returns

new_args – Output of reshaped arrays

Return type

arrays

econml.utilities.reshape_treatmentwise_effects(A, d_t, d_y)[source]

Given an effects matrix ordered first by treatment, transform it to be ordered by outcome.

Parameters
  • A (array) – The array of effects, of size n*d_y*d_t

  • d_t (tuple of int) – Either () if T was a vector, or a 1-tuple of the number of columns of T if it was an array

  • d_y (tuple of int) – Either () if Y was a vector, or a 1-tuple of the number of columns of Y if it was an array

Returns

A – The transformed array. Note that singleton dimensions will be dropped for any inputs which were vectors, as in the specification of BaseCateEstimator.marginal_effect.

Return type

array (shape (m, d_y, d_t))

econml.utilities.shape(X)[source]

Return a tuple of array dimensions.

econml.utilities.size(X)[source]

Return the number of elements in the array.

Parameters

a (array_like) – Input data

Returns

The number of elements of the array

Return type

int

econml.utilities.stack(XS, axis=0)[source]

Join a sequence of arrays along a new axis.

The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.

Parameters
  • arrays (sequence of array_like) – Each array must have the same shape

  • axis (int, optional) – The axis in the result array along which the input arrays are stacked

Returns

The stacked array, which has one more dimension than the input arrays. It will be sparse if the inputs are.

Return type

ndarray or SparseArray

econml.utilities.tensordot(X1, X2, axes)[source]

Compute tensor dot product along specified axes for arrays >= 1-D.

Parameters
  • X1, X2 (array_like, len(shape) >= 1) – Tensors to “dot”

  • axes (int or (2,) array_like) –

    integer_like

    If an int N, sum over the last N axes of X1 and the first N axes of X2 in order. The sizes of the corresponding axes must match

    (2,) array_like

    Or, a list of axes to be summed over, first sequence applying to X1, second to X2. Both elements array_like must be of the same length.

econml.utilities.tocoo(X)[source]

Convert an array to a sparse COO array.

If the input is already an sparse.COO object, this returns the object directly; otherwise it is converted.

econml.utilities.todense(X)[source]

Convert an array to a dense numpy array.

If the input is already a numpy array, this may create a new copy.

econml.utilities.transpose(X, axes=None)[source]

Permute the dimensions of an array.

Parameters
  • X (array_like) – Input array.

  • axes (list of ints, optional) – By default, reverse the dimensions, otherwise permute the axes according to the values given

Returns

pX with its axes permuted. This will be sparse if X is.

Return type

ndarray or SparseArray

econml.utilities.transpose_dictionary(d)[source]

Transpose a dictionary of dictionaries, bringing the keys from the second level to the top and vice versa

Parameters

d (dict) – The dictionary to transpose; the values of this dictionary should all themselves be dictionaries

Returns

output – The output dictionary with first- and second-level keys swapped

Return type

dict

econml.utilities.vstack(XS)[source]

Stack arrays in sequence vertically (row wise).

This is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N).

Parameters

XS (sequence of ndarrays) – The arrays must have the same shape along all but the first axis. 1-D arrays must have the same length.

Returns

The array formed by stacking the given arrays, will be at least 2-D. It will be sparse if the inputs are.

Return type

ndarray or SparseArray