# econml.sklearn_extensions.linear_model.SelectiveRegularization

class econml.sklearn_extensions.linear_model.SelectiveRegularization(unpenalized_inds, penalized_model, fit_intercept=True)[source]

Bases: object

Estimator of a linear model where regularization is applied to only a subset of the coefficients.

Assume that our loss is

$\ell(\beta_1, \beta_2) = \lVert y - X_1 \beta_1 - X_2 \beta_2 \rVert^2 + f(\beta_2)$

so that we’re regularizing only the coefficients in $$\beta_2$$.

Then, since $$\beta_1$$ doesn’t appear in the penalty, the problem of finding $$\beta_1$$ to minimize the loss once $$\beta_2$$ is known reduces to just a normal OLS regression, so that:

$\beta_1 = (X_1^\top X_1)^{-1}X_1^\top(y - X_2 \beta_2)$

Plugging this into the loss, we obtain

$\begin{split}~& \lVert y - X_1 (X_1^\top X_1)^{-1}X_1^\top(y - X_2 \beta_2) - X_2 \beta_2 \rVert^2 + f(\beta_2) \\ =~& \lVert (I - X_1 (X_1^\top X_1)^{-1}X_1^\top)(y - X_2 \beta_2) \rVert^2 + f(\beta_2)\end{split}$

But, letting $$M_{X_1} = I - X_1 (X_1^\top X_1)^{-1}X_1^\top$$, we see that this is

$\lVert (M_{X_1} y) - (M_{X_1} X_2) \beta_2 \rVert^2 + f(\beta_2)$

so finding the minimizing $$\beta_2$$ can be done by regressing $$M_{X_1} y$$ on $$M_{X_1} X_2$$ using the penalized regression method incorporating $$f$$. Note that these are just the residual values of $$y$$ and $$X_2$$ when regressed on $$X_1$$ using OLS.

Parameters
• unpenalized_inds (list of int, other 1-dimensional indexing expression, or callable) – The indices that should not be penalized when the model is fit; all other indices will be penalized. If this is a callable, it will be called with the arguments to fit and should return a corresponding indexing expression. For example, lambda X, y: unpenalized_inds=slice(1,-1) will result in only the first and last indices being penalized.

• penalized_model (regressor) – A penalized linear regression model

• fit_intercept (bool, optional, default True) – Whether to fit an intercept; the intercept will not be penalized if it is fit

coef_

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

Type

array, shape (n_features, ) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float or array of shape (n_targets)

penalized_model

The penalized linear regression model, cloned from the one passed into the initializer

Type

regressor

__init__(unpenalized_inds, penalized_model, fit_intercept=True)[source]

Methods

 __init__(unpenalized_inds, penalized_model) fit(X, y[, sample_weight]) Fit the model. predict(X) Make a prediction for each sample. score(X, y) Score the predictions for a set of features to ground truth.

Attributes

 known_params
fit(X, y, sample_weight=None)[source]

Fit the model.

Parameters
• X (array-like, shape (n, d_x)) – The features to regress against

• y (array-like, shape (n,) or (n, d_y)) – The regression target

• sample_weight (array-like, shape (n,), optional, default None) – Relative weights for each sample

predict(X)[source]

Make a prediction for each sample.

Parameters
• X (array-like, shape (m, d_x)) – The samples whose targets to predict

• Output

• ——

• arr (array-like, shape (m,) or (m, d_y)) – The predicted targets

score(X, y)[source]

Score the predictions for a set of features to ground truth.

Parameters
• X (array-like, shape (m, d_x)) – The samples to predict

• y (array-like, shape (m,) or (m, d_y)) – The ground truth targets

• Output

• ——

• score (float) – The model’s score