Library Flow Chart


Did you explicitly randomize treatment across a targeted audience?


Did everyone in the experiment receive the treatment to which they were assigned? For example, if you are interested in the causal effect of joining a loyalty program a useful experiment might randomly send some customers an email prompting them to join the program. Only some of these targeted customers will join the loyalty program, so compliance is imperfect. In a medical experiment where patients come into an office each week and receive either a novel drug or a placebo, compliance with the assigned group is likely to be perfect.


If treatment was not an outcome of an experiment (i.e. controlled), there will be multiple reasons that people received the treatment. Suppose you are interested in the causal effect of minutes of exercise per week (the treatment variable) on body fat percentage (the outcome). Some people exercise more minutes than others. Reasons for this (controls) may include having more flexible work schedules or being more health conscious. Some of these controls may be confounders (i.e. they also affect the outcome directly). In this example, being more health conscious is probably a confounding factor (it affects nutrition, which affects body fat), whereas having a more flexible work schedule is probably not. The question posed in this bubble is whether all confounding controls are measured in the data.

Can set W

If you run an experiment, there are not concerns about confoundedness. Any features included in the control set will help with the efficiency of the estimate in smaller samples, but are not necessary to identify the causal effect.


Bias is eliminated when all confounders are measurable. See Orthogonal/Double Machine Learning section for guidance on when this assumption holds.

Set Z to intended treatment

With imperfect compliance, the indicator of the assigned treatment category is not equivalent to the indicator for actually receiving the treatment. If you are interested in the causal effect of the treatment, you should use assignment as an instrument for treatment, rather than simply treating assignment as the treatment itself.


Many estimators only behave well with a small set of specified features X that affect the size of a user’s response to the treatment. If you do not already know which few features might reasonably affect the user’s response, use one of our sparse estimators that can handle large feature sets and penalize them to discover the features that are most correlated with treatment effect heterogeneity.


Some estimators identify the causal effect of a treatment by considering only a subset of the variation in treatment intensity that is conditionally random given other data features. This subset of the variation is driven by an instrument, which is often some kind of randomization (i.e. an earlier experiment or a lottery). See the Instrumental Variable Regression section for more information on picking a good instrument.


Some estimators impose the assumption that the outcome is a linear function of the treatment. These estimators can also estimate a non-linear relationship between a treatment and the outcome if the structure of the relationship is known and additively separable (for example, the linear function could include both treatment and treatment-squared for continuous treatments). These linear functions can also include specified interactions between treatments. However, these estimators cannot estimate a fully flexible non-parametric relationship between treatments and the outcome (for example, the relationship cannot be modeled by a forest).


The CATE function determines how the size of a user’s response to the treatment varies by user features. Some estimators impose the assumption that effect size is a linear function of user features.


The MetaLearner and DRLearner estimators offer the choice of any ML estimation model in all stages and allows for model selection via cross validation. This enhances flexibility, but because the sample data is used to choose among models it is impossible to calculate honest analytic confidence intervals. Moreover, most ML estimation approaches introduce bias for regularization purposes, so as to optimally balance bias and variance. Hence, confidence intervals based on such biased estimates will be invalid. For these models it is still possible to construct bootstrap confidence intervals, but this process is slow, may not be accurate in small samples and these intervals only capture the variance but not the bias of the model.