avatarBruno Gonçalves

Summary

The article discusses the Front-Door Criterion as a method for estimating causal effects from non-experimental data when the Backdoor Criterion cannot be applied.

Abstract

The twelfth post in a series on causal inference, based on the book co-authored by Judea Pearl, focuses on the Front-Door Criterion as a complementary approach to the Backdoor Criterion. This method is particularly useful when unmeasured confounders prevent the use of the Backdoor Criterion. The article illustrates the application of the Front-Door Criterion through the example of estimating the direct effect of smoking on cancer, where genotype is an unmeasured confounder. By using variables that intercept all directed paths from the cause to the effect and ensure no unblocked backdoor paths, researchers can estimate causal effects. The Front-Door formula is presented as a means to calculate these effects, and the article demonstrates its use with a contingency table to compute the correct effect, overcoming counter-intuitive results from raw data. The Front-Door Criterion is defined by three conditions that must be met for a set of variables to be used in this manner. The post concludes by encouraging readers to engage with the accompanying Jupyter notebooks and to subscribe for further updates on the series.

Opinions

  • The author highly recommends obtaining the book "Causal Inference In Statistics" for a deeper understanding of the topic.
  • The Front-Door Criterion is presented as a powerful tool for causal inference when dealing with non-experimental data and unmeasured confounders.
  • The article emphasizes the importance of the Front-Door formula in deriving causal effects when the Backdoor Criterion is not viable.
  • The author acknowledges the counter-intuitive nature of raw data analysis and the necessity of applying the Front-Door formula to recover the correct causal effects.
  • The use of the Front-Door Criterion is argued to be more straightforward than the Backdoor Criterion in certain scenarios, as it may require controlling fewer variables.
  • The article suggests that the Front-Door Criterion is part of a more general machinery known as the do-calculus, which is capable of identifying all causal effects from a given graph.
  • Engagement with the series is encouraged through the use of Jupyter notebooks and subscription to the newsletter for updates on new posts.
Photo by This Guy on Unsplash

Causal Inference

Front-door Criterion

This is the twelfth post on the series we work our way through “Causal Inference In Statistics” a nice Primer co-authored by Judea Pearl himself.

Amazon Affiliate Link

You can find the previous post here and all the we relevant Python code in the companion GitHub Repository:

While I will do my best to introduce the content in a clear and accessible way, I highly recommend that you get the book yourself and follow along. So, without further ado, let’s get started!

3.4 — Front-Door Criterion

The Front-Door Criterion is a complementary approach to identifying sets of variables we can use in order to estimate causal effects from non-experimental data. It is particularly useful when we are unable to identify any sets of variables that obey the Backdoor Criterion discussed previously.

Pearl motivates the Front-Door criterion by going back to the smoke-cancer problem. Using this DAG:

Here our goal is to estimate the direct effect of Smoking (X) on Cancer (Y), while being unable to directly measure the Genotype (U). From the DAG we can see that no variable satisfies the back-door criterion as U is unmeasured, so we can immediately write:

On the other hand, we can directly identify the effect of Tar of Cancer by using the back-door criterion to block the back-door path through X:

Now we can chain the two expressions together to obtain the direct effect of X on Y:

The motivation for this expression is clear if we consider a two state intervention. If we set the value of X, we can determine what the corresponding value of Z is, and we can then intervene again to fix that value of Z. By doing this for every value of Z we are able to determine the effect of X on Y! The general expression, known as the front-door formula is:

To complete this example, let us consider the values given by this contingency table:

From there we can easily compute P(Cancer | Tar, Smoker):

Conditional Probability of developing cancer given Tar and Smoking status

implying that Non-Smokers are a lot more likelier to develop cancer! This counter-intuitive effect is due to limitations of the data we collected where most non-smokers had cancer and most smokers didn’t.

However, by applying the front-door formula above we do recover the correct effect (see notebook for the detailed computation):

The Front-Door criterion is simply the rule that allows us to determine which variables (like Tar in the example above) allow for this kind of computation. The book defines it as:

Front-Door Criterion: A set of variables Z is said to satisfy the front-door criterion relative to an ordered pair of variables (X, Y), if:

1. Z intercepts all directed paths from X to Y

2. There is no unblocked backdoor path from X to Z

3. All backdoor paths from Z to Y are blocked by X

Which essentially means that by controlling Z we are able to control all the causal paths between X and Y and that there are no unblocked backdoor paths that could lead to spurious correlations between X, Y and Z. If we can identify a set of variables that obeys the Front-Door Criterion, then we can directly derive the Front-Door Formula using:

Front-Door Adjustment: If Z satisfies the front-door criterion relative to (X, Y) and if P(x, z) > 0, then the causal effect of X on Y is identifiable and is given by:

The Intervention operations we’ve explored so-far are just direct and simple applications of a much more general machinery known as the do-calculus that is able to identify all causal effects from any given graph. The details of this more general approach are beyond the scope of the Primer book but are covered extensively in the Causality text book and elsewhere.

To further familiarize ourselves with this concept by considering the DAG from Fig 3.8, analyzed previously:

From this figure we quickly see that W satisfies the Front-door criterion for the causal effect of X on Y:

  1. It intercepts the only direct path between X and Y
  2. There are no unblocked backdoor paths between W and X (as they must all pass through the collider at Z).
  3. All backdoor paths between W and Y are blocked by X

All the paths mentioned above are visualized in the Jupyter notebook.

As we had previously seen, estimating the causal effect of X on Y using the back-door criterion requires conditioning on at least 2 variables (Z and B, for example) while the front-door approach requires only W.

Congratulations on making it through another post on Causal Inference. The nest post in the series is already out:

As always, you can find all the notebooks of this series in the GitHub repository:

And if you would like to be notified when the next post comes out, you can subscribe to the The Sunday Briefing newsletter:

Causality
Causal Inference
Machine Learning
Data Science
Probability
Recommended from ReadMedium