Validation

Validating causal estimates is inherently challenging, as the true counterfactual outcome for a given treatment is unobservable. However, there are several checks and tools available in EconML to help assess the credibility of causal estimates.

Sensitivity Analysis

For many EconML estimators, unobserved confounding can lead to biased causal estimates. Moreover, it is impossible to prove the absence of unobserved confounders. This is a fundamental problem for observational causal inference.

To mitigate this problem, EconML provides a suite of sensitivity analysis tools, based on [Chernozhukov2022], to assess the robustness of causal estimates to unobserved confounding.

Specifically, select estimators (subclasses of DML and DRLearner) have access to sensitivity_analysis, robustness_value, and sensitivity_summary methods.

sensitivity_analysis provides an updated confidence interval for the ATE based on a specified level of unobserved confounding.

robustness_value computes the minimum level of unobserved confounding required so that confidence intervals around the ATE would begin to include the given point (0 by default).

sensitivity_summary provides a summary of the the two above methods.

DRTester

EconML provides the DRTester class, which implements Best Linear Predictor (BLP), calibration r-squared, and uplift modeling methods for validation.

See an example notebook here.

Scoring

Many EconML estimators implement a .score method to evaluate the goodness-of-fit of the final model. While it may be difficult to make direct sense of results from .score, EconML offers the RScorer class to facilitate model selection based on scoring.

RScorer enables comparison and selection among different causal models.

See an example notebook here.

Confidence Intervals and Inference

Most EconML estimators allow for inference, including standard errors, confidence intervals, and p-values for estimated effects. A common validation approach is to check whether the p-values are below a chosen significance level (e.g., 0.05). If not, the null hypothesis that the causal effect is zero cannot be rejected.

Note: Inference results are only valid if the model specification is correct. For example, if a linear model is used but the true data-generating process is nonlinear, the inference may not be reliable. It is generally not possible to guarantee correct specification, so p-value inspection should be considered a surface-level check.

DoWhy Refutation Tests

The DoWhy library, which complements EconML, includes several refutation tests for validating causal estimates. These tests work by comparing the original causal estimate to estimates obtained from perturbed versions of the data, helping to assess the robustness of causal conclusions.