This article teaches how to generate visually rich conditional probability tables in just one line of Python using the pgmpy library.
Abstract
The article focuses on using the pgmpy library to generate intuitive and comprehensive conditional probability tables for visualizing and understanding causal inference models. The author provides a quick refresher on causal inference models and explains how to build a causal model using the pgmpy library. The article also highlights the limitations of the native functionality in the pgmpy library for displaying CPTs and introduces the cpt_tools library, which resolves these issues and provides a much better solution for visualizing CPTs.
Opinions
The author believes that the native functionality in the pgmpy library for displaying CPTs is unsatisfactory and difficult to understand.
The author has developed the cpt_tools library to resolve the issues with the pgmpy library's native functionality for displaying CPTs.
The author believes that visualizing directed acyclic graphs and conditional probability tables is essential for using causal inference to solve business problems.
The author encourages readers to consider buying them a coffee if they decide to download and use the cpt_tools or dag_tools code.
How to Visualise Causal Inference Models with Intuitive Conditional Probability Tables
How to generate intuitive and comprehensive Conditional Probability Tables to visualise and understand causal inference models in 1 line of Python code
Causal Inference is a hot topic at the moment but the various libraries that exist can be complicated with inconsistent documentation and examples and most of the available articles and posts focus on a particular aspect of causal inference without covering all the things a data scientist needs to know.
This led to me writing a series of articles with this latest one diving into “Conditional Probability Tables” and how to generate them easily in a format that is intuitive and meaningful.
What You Will Learn
By the end of this article you will be able to generate visually rich conditional probability tables in just one line of Python and you will have full access to the source code and documentation!
Before we dive into the CPT visualisations please consider …
The data I have selected relates to the impact of having a graduate degree on salary and has been obtained from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/census+income) which is free to use with an acknowledgement (see References section).
Image by Author
Building a Causal Model
I have chosen to build a causal model using the pgmpy library (https://pgmpy.org/) as follows ...
The CPT representing the probabilities for age is spread out vertically (because age has many stages, one for each age between 17 and 90)
The CPT for the probabilities of hasGraduateDegree is even worse. Because this table is spread out horizontally pgmpy has truncated all of the columns for ages 17 - 87 and just left ages 88 and 90 in the display. This might have fitted the table in the cell but the resulting truncation makes it impossible to understand what is going on.
The CPT for greaterThan50k has the same problems as hasGraduateDegree.
The last problem with the pgmpy output for CPTs is that they are "upside-down". If you are a reader of Judea Pearl who has published many seminal works on causality (including “The Book of Why”) you will have read examples where Pearl expresses his CPTs with the "Probability" expressed across the columns and the "Given" conditions expressed down the rows ...
All of these issues make it very difficult to visualise what is going on in a causal model and that leads to a lack of understanding which in turn leads to an inability to use these models to solve real-world problems for customers.
So the un-intuitive output of pgmpy led me to develop my own cpt_tools library to resolve all the issues (a link to the full source code is provided below).
Let’s take a look at the output generated using cpt_tools ...
Image by Author
This is looking much nicer in just 1 line of Python code from the cpt_tools library!
The tables are returned as pandas DataFrames and the truncation has taken place against the Y-axis (rows) to give the best compromise between readability and space-utilisation.
If you would like to see the whole CPT without the horizontal truncation, simply change the pandas display.max_rows parameter and then use cpt_tools.display_cpt as follows -
Causal inference is a great tool to have in your data science toolkit but to use causal inference to solve a business problem you need to be able to visualise the directed acyclic graphs and the conditional probability tables.
The pgmpy library is comprehensive and easy to use but the functionality for visualising the models can benefit from being extended and improved.
This article has shown how to visualise the Conditional Probability Tables in a way that is visually powerful, intuitive and easy to understand in just one line of Python code.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.