econml.utilities
Utility methods.
Functions
Add an intercept feature to an array by prepending a column of ones. |
|
|
Generate d_t unit treatments for each row of X. |
|
|
|
Cast input sequences into numpy arrays. |
|
Input validation for CATE estimators. |
|
Input validation for metalearner models. |
|
Join a sequence of arrays along an existing axis. |
|
Compute the cross product of features. |
|
Enable decorating a method or class to providing a warning when it is used. |
|
Evaluate the Einstein summation convention on the operands. |
|
Filter out any keyword arguments that are None. |
|
Extract feature names from sklearn transformers. |
|
Extract column names from dataframe-like input object. |
|
Stack arrays in sequence horizontally (column wise). |
Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices. |
|
|
Determine whether an input is a sparse.COO array. |
|
Determine whether an input is sparse. |
|
Convert a featurizer into a wrapper class that includes a function for calculating the jacobian. |
|
Return the number of array dimensions. |
|
Create a |
|
|
|
Return a new array that is a reshaped version of an input array. |
|
Reshapes Y and T when Y.ndim = 2 and/or T.ndim = 1. |
|
Reshape the input arrays as two dimensional. |
|
Given an effects matrix, reshape second dimension to be consistent with d_y[0]. |
|
Given an effects matrix ordered first by treatment, transform it to be ordered by outcome. |
|
Return a tuple of array dimensions. |
|
Return the number of elements in the array. |
|
Join a sequence of arrays along a new axis. |
Combine multiple discrete arrays into a single array for stratification purposes. |
|
|
Compute tensor dot product along specified axes for arrays >= 1-D. |
|
Convert an array to a sparse COO array. |
|
Convert an array to a dense numpy array. |
|
Permute the dimensions of an array. |
Transpose a dictionary of dictionaries, bringing the keys from the second level to the top and vice versa. |
|
|
Stack arrays in sequence vertically (row wise). |
Classes
Featurizer that just returns the input data. |
|
|
Placeholder to stand in for a module that couldn't be imported, delaying ImportErrors until use. |
|
Helper class for training different models for each treatment. |
|
Splits the data based on the last feature and trains a separate model for each subsample. |
|
Result summary. |
|
Helper class for assiging weights to models without this option. |
- class econml.utilities.IdentityFeatures[source]
Bases:
TransformerMixin
Featurizer that just returns the input data.
- fit_transform(X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- set_output(*, transform=None)
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({“default”, “pandas”, “polars”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- class econml.utilities.MissingModule(msg, exn)[source]
Bases:
object
Placeholder to stand in for a module that couldn’t be imported, delaying ImportErrors until use.
- Parameters:
msg (str) – The message to display when an attempt to access a module memeber is made
exn (ImportError) – The original ImportError to pass as the source of the exception
- class econml.utilities.MultiModelWrapper(model_list=[])[source]
Bases:
object
Helper class for training different models for each treatment.
- Parameters:
model_list (array_like, shape (n_T, )) – List of models to be trained separately for each treatment group.
- fit(Xt, y, sample_weight=None)[source]
Fit underlying list of models with weighted inputs.
- Parameters:
X (array_like, shape (n_samples, n_features + n_treatments)) – Training data. The last n_T columns should be a one-hot encoding of the treatment assignment.
y (array_like, shape (n_samples, )) – Target values.
- Returns:
self
- Return type:
an instance of the class
- class econml.utilities.SeparateModel(*models)[source]
Bases:
object
Splits the data based on the last feature and trains a separate model for each subsample.
At predict time, it uses the last feature to choose which model to use to predict.
- class econml.utilities.Summary[source]
Bases:
object
Result summary.
Construction does not take any parameters. Tables and text can be added with the add_ methods.
- tables
Contains the list of SimpleTable instances, horizontally concatenated tables are not saved separately.
- Type:
list of table
- extra_txt
extra lines that are added to the text output, used for warnings and explanations.
- Type:
- add_extra_txt(etext)[source]
Add additional text that will be added at the end in text format.
- Parameters:
etext (list[str]) – string with lines that are added to the text output.
- as_csv()[source]
Return tables as string.
- Returns:
csv – concatenated summary tables in comma delimited format
- Return type:
- as_html()[source]
Return tables as string.
- Returns:
html – concatenated summary tables in HTML format
- Return type:
- class econml.utilities.WeightedModelWrapper(model_instance, sample_type='weighted')[source]
Bases:
object
Helper class for assiging weights to models without this option.
- Parameters:
model_instance (estimator) – Model that requires weights.
sample_type (str, default weighted) – Method for adding weights to the model. weighted for linear regression models where the weights can be incorporated in the matrix multiplication, sampled for other models. sampled samples the training set according to the normalized weights and creates a dataset larger than the original.
- econml.utilities.add_intercept(X)[source]
Add an intercept feature to an array by prepending a column of ones.
- Parameters:
X (array_like) – Input array. Must be 2D.
- Returns:
arr – X with a column of ones prepended
- Return type:
ndarray
- econml.utilities.broadcast_unit_treatments(X, d_t)[source]
Generate d_t unit treatments for each row of X.
- Parameters:
d_t (int) – Number of treatments
X (array) – Features
- Returns:
X, T – The updated X array (with each row repeated d_t times), and the generated T array
- Return type:
(array, array)
- econml.utilities.check_input_arrays(*args, validate_len=True, force_all_finite=True, dtype=None)[source]
Cast input sequences into numpy arrays.
Only inputs that are sequence-like will be converted, all other inputs will be left as is. When validate_len is True, the sequences will be checked for equal length.
- Parameters:
args (scalar or array_like) – Inputs to be checked.
validate_len (bool, default True) – Whether to check if the input arrays have the same length.
force_all_finite (bool or ‘allow-nan’, default True) – Whether to allow inf and nan in input arrays. ‘allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.
dtype (‘numeric’, type, list of type, optional) – Argument passed to sklearn.utils.check_array. Specifies data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.
- Returns:
args – List of inputs where sequence-like objects have been cast to numpy arrays.
- Return type:
array_like
- econml.utilities.check_inputs(Y, T, X, W=None, multi_output_T=True, multi_output_Y=True, force_all_finite_X=True, force_all_finite_W=True)[source]
Input validation for CATE estimators.
Checks Y, T, X, W for consistent length, enforces X, W 2d. Standard input checks are only applied to all inputs, such as checking that an input does not have np.nan or np.inf targets. Converts regular Python lists to numpy arrays.
- Parameters:
Y (array_like, shape (n, ) or (n, d_y)) – Outcome for the treatment policy.
T (array_like, shape (n, ) or (n, d_t)) – Treatment policy.
X (array_like, shape (n, d_x)) – Feature vector that captures heterogeneity.
W (array_like, shape (n, d_w), optional) – High-dimensional controls.
multi_output_T (bool) – Whether to allow more than one treatment.
multi_output_Y (bool) – Whether to allow more than one outcome.
force_all_finite_X (bool or ‘allow-nan’, default True) – Whether to allow inf and nan in input arrays in X. ‘allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.
force_all_finite_W (bool or ‘allow-nan’, default True) – Whether to allow inf and nan in input arrays in W. ‘allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.
- Returns:
Y (array_like, shape (n, ) or (n, d_y)) – Converted and validated Y.
T (array_like, shape (n, ) or (n, d_t)) – Converted and validated T.
X (array_like, shape (n, d_x)) – Converted and validated X.
W (array_like, shape (n, d_w), optional) – Converted and validated W.
- econml.utilities.check_models(models, n)[source]
Input validation for metalearner models.
Check whether the input models satisfy the criteria below.
- econml.utilities.concatenate(XS, axis=0)[source]
Join a sequence of arrays along an existing axis.
- Parameters:
X1, X2, … (sequence of array_like) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis (int, optional) – The axis along which the arrays will be joined. Default is 0.
- Returns:
The concatenated array. It will be sparse if the inputs are.
- Return type:
ndarray or SparseArray
- econml.utilities.cross_product(*XS)[source]
Compute the cross product of features.
- Parameters:
X1 (n x d1 matrix) – First matrix of n samples of d1 features (or an n-element vector, which will be treated as an n x 1 matrix)
X2 (n x d2 matrix) – Second matrix of n samples of d2 features (or an n-element vector, which will be treated as an n x 1 matrix)
- Returns:
A – Matrix of n samples of d1*d2*… cross product features, arranged in form such that each row t of X12 contains: [X1[t,0]*X2[t,0]*…, …, X1[t,d1-1]*X2[t,0]*…, X1[t,0]*X2[t,1]*…, …, X1[t,d1-1]*X2[t,1]*…, …]
- Return type:
n x (d1*d2*…) matrix
- econml.utilities.deprecated(message, category=<class 'FutureWarning'>)[source]
Enable decorating a method or class to providing a warning when it is used.
- Parameters:
message (str) – The deprecation message to use
category (
type
, defaultFutureWarning
) – The warning category to use
- econml.utilities.einsum_sparse(subscripts, *arrs)[source]
Evaluate the Einstein summation convention on the operands.
Using the Einstein summation convention, many common multi-dimensional array operations can be represented in a simple fashion. This function provides a way to compute such summations.
- Parameters:
subscripts (str) – Specifies the subscripts for summation. Unlike np.eisnum elipses are not supported and the output must be explicitly included
arrs (list of sparse.COO) – These are the sparse arrays for the operation.
- Returns:
The sparse array calculated based on the Einstein summation convention.
- Return type:
SparseArray
- econml.utilities.filter_none_kwargs(**kwargs)[source]
Filter out any keyword arguments that are None.
This is useful when specific optional keyword arguments might not be universally supported, so that stripping them out when they are not set enables more uses to succeed.
- Parameters:
kwargs (dict) – The keyword arguments to filter
- Returns:
filtered_kwargs – The input dictionary, but with all entries having value None removed
- Return type:
- econml.utilities.get_feature_names_or_default(featurizer, feature_names, prefix='feat(X)')[source]
Extract feature names from sklearn transformers. Otherwise attempts to assign default feature names.
Designed to be compatible with old and new sklearn versions.
- econml.utilities.get_input_columns(X, prefix='X')[source]
Extract column names from dataframe-like input object.
Currently supports column name extraction from pandas DataFrame and Series objects.
- Parameters:
X (array_like or None) – Input array with column names to be extracted.
prefix (str, default “X”) – If input array doesn’t have column names, a default using the naming scheme “{prefix}{column number}” will be returned.
- Returns:
cols – List of columns corresponding to the dataframe-like object. None if the input array is not in the supported types.
- Return type:
array_like or None
- econml.utilities.hstack(XS)[source]
Stack arrays in sequence horizontally (column wise).
This is equivalent to concatenation along the second axis
- Parameters:
XS (sequence of ndarray) – The arrays must have the same shape along all but the second axis.
- Returns:
The array formed by stacking the given arrays. It will be sparse if the inputs are.
- Return type:
ndarray or SparseArray
- econml.utilities.inverse_onehot(T)[source]
Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices.
Note that we assume that the first column has been removed from the input.
- Parameters:
T (array (shape (n, d_t-1))) – The one-hot-encoded array
- Returns:
A – The un-encoded 0-based category indices
- Return type:
vector of int (shape (n,))
- econml.utilities.iscoo(X)[source]
Determine whether an input is a sparse.COO array.
- Parameters:
X (array_like) – The input to check
- Returns:
Whether the input is a COO array
- Return type:
- econml.utilities.issparse(X)[source]
Determine whether an input is sparse.
For the purposes of this function, both scipy.sparse matrices and sparse.SparseArray types are considered sparse.
- Parameters:
X (array_like) – The input to check
- Returns:
Whether the input is sparse
- Return type:
- econml.utilities.jacify_featurizer(featurizer)[source]
Convert a featurizer into a wrapper class that includes a function for calculating the jacobian.
- econml.utilities.one_hot_encoder(sparse=False, **kwargs)[source]
Create a
OneHotEncoder
.This handles the breaking name change from sparse to sparse_output between sklearn versions 1.1 and 1.2.
- econml.utilities.reshape(X, shape)[source]
Return a new array that is a reshaped version of an input array.
The output will be sparse iff the input is.
- Parameters:
X (array_like) – The array to reshape
shape (tuple of int) – The desired shape of the output array
- Returns:
The reshaped output array
- Return type:
ndarray or SparseArray
- econml.utilities.reshape_Y_T(Y, T)[source]
Reshapes Y and T when Y.ndim = 2 and/or T.ndim = 1.
- Parameters:
Y (array_like, shape (n, ) or (n, 1)) – Outcome for the treatment policy. Must be a vector or single-column matrix.
T (array_like, shape (n, ) or (n, d_t)) – Treatment policy.
- Returns:
Y (array_like, shape (n, )) – Flattened outcome for the treatment policy.
T (array_like, shape (n, 1) or (n, d_t)) – Reshaped treatment policy.
- econml.utilities.reshape_arrays_2dim(length, *args)[source]
Reshape the input arrays as two dimensional.
If any entry is None, will be reshaped as (n, 0).
- Parameters:
length (scalar) – Number of samples
args (tuple of array_like) – Inputs to be reshaped
- Returns:
new_args – Output of reshaped arrays
- Return type:
list of array
- econml.utilities.reshape_outcomewise_effects(A, d_y)[source]
Given an effects matrix, reshape second dimension to be consistent with d_y[0].
- Parameters:
A (array) – The effects array to be reshaped. It should have shape (m,) or (m, d_y).
d_y (tuple of int) – Either () if Y was a vector, or a 1-tuple of the number of columns of Y if it was an array.
- Returns:
A –
- The reshaped effects array with shape:
(m, ) if d_y is () and Y is a vector,
(m, d_y) if d_y is a 1-tuple and Y is an array.
- Return type:
array
- econml.utilities.reshape_treatmentwise_effects(A, d_t, d_y)[source]
Given an effects matrix ordered first by treatment, transform it to be ordered by outcome.
- Parameters:
A (array) – The array of effects, of size n*d_y*d_t
d_t (tuple of int) – Either () if T was a vector, or a 1-tuple of the number of columns of T if it was an array
d_y (tuple of int) – Either () if Y was a vector, or a 1-tuple of the number of columns of Y if it was an array
- Returns:
A – The transformed array. Note that singleton dimensions will be dropped for any inputs which were vectors, as in the specification of BaseCateEstimator.marginal_effect.
- Return type:
array (shape (m, d_y, d_t))
- econml.utilities.size(X)[source]
Return the number of elements in the array.
- Parameters:
a (array_like) – Input data
- Returns:
The number of elements of the array
- Return type:
- econml.utilities.stack(XS, axis=0)[source]
Join a sequence of arrays along a new axis.
The axis parameter specifies the index of the new axis in the dimensions of the result. For example, if axis=0 it will be the first dimension and if axis=-1 it will be the last dimension.
- Parameters:
arrays (sequence of array_like) – Each array must have the same shape
axis (int, optional) – The axis in the result array along which the input arrays are stacked
- Returns:
The stacked array, which has one more dimension than the input arrays. It will be sparse if the inputs are.
- Return type:
ndarray or SparseArray
- econml.utilities.strata_from_discrete_arrays(arrs)[source]
Combine multiple discrete arrays into a single array for stratification purposes.
For example, if arrs is [[0 1 2 0 1 2 0 1 2 0 1 2], [0 1 0 1 0 1 0 1 0 1 0 1], [0 0 0 0 0 0 1 1 1 1 1 1]] then output will be [0 8 4 6 2 10 1 9 5 7 3 11]
Every distinct combination of these discrete arrays will have it’s own label.
- econml.utilities.tensordot(X1, X2, axes)[source]
Compute tensor dot product along specified axes for arrays >= 1-D.
- Parameters:
X1, X2 (array_like, len(shape) >= 1) – Tensors to “dot”
axes (int or (2,) array_like) –
- integer_like
If an int N, sum over the last N axes of X1 and the first N axes of X2 in order. The sizes of the corresponding axes must match
- (2,) array_like
Or, a list of axes to be summed over, first sequence applying to X1, second to X2. Both elements array_like must be of the same length.
- econml.utilities.tocoo(X)[source]
Convert an array to a sparse COO array.
If the input is already an sparse.COO object, this returns the object directly; otherwise it is converted.
- econml.utilities.todense(X)[source]
Convert an array to a dense numpy array.
If the input is already a numpy array, this may create a new copy.
- econml.utilities.transpose(X, axes=None)[source]
Permute the dimensions of an array.
- Parameters:
X (array_like) – Input array.
axes (list of int, optional) – By default, reverse the dimensions, otherwise permute the axes according to the values given
- Returns:
p – X with its axes permuted. This will be sparse if X is.
- Return type:
ndarray or SparseArray
- econml.utilities.transpose_dictionary(d)[source]
Transpose a dictionary of dictionaries, bringing the keys from the second level to the top and vice versa.
- Parameters:
d (dict) – The dictionary to transpose; the values of this dictionary should all themselves be dictionaries
- Returns:
output – The output dictionary with first- and second-level keys swapped
- Return type:
- econml.utilities.vstack(XS)[source]
Stack arrays in sequence vertically (row wise).
This is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N).
- Parameters:
XS (sequence of ndarray) – The arrays must have the same shape along all but the first axis. 1-D arrays must have the same length.
- Returns:
The array formed by stacking the given arrays, will be at least 2-D. It will be sparse if the inputs are.
- Return type:
ndarray or SparseArray