Visualizing the Customer Journey with Python’s Sankey Diagram: A Plotly Example
Learn How to Create a Stunning Customer Journey Visualization with this Comprehensive Tutorial

Sankey diagrams are an excellent tool for understanding customer behavior. By visualizing the connections between customers, products, and transactions, we can gain insights into the patterns and relationships that drive customer behavior. In this article, I will show you how to use a Sankey diagram to visualize customer behavior data across multiple dimensions that allow you to visualize the customer journey of your products using Python.
This article includes the following:
- Introduction to Sankey diagram
- Why Sankey diagram in customer journey analysis
- Examples of using the Sankey diagram to generate the customer journey diagram via Python Plotly
What is Sankey Diagram?
Sankey diagrams are a type of flow diagram in which the width of the arrows is proportional to the flow rate. It emphasizes most transfers or flows within a system to help locate the most important contributions. They often show conserved quantities within defined system boundaries. The things being connected are called nodes, and the connections are called links. Sankeys are best used to show a many-to-many mapping between two domains (e.g., universities and majors) or multiple paths through a set of stages (for instance, customer journey on buyer's movements across all touchpoints of your brand).
Why Sankey Diagram in the customer journey?
Sankey diagrams in the customer journeys can help the business look from the viewpoint of their customers on the products. It helps to identify the following:
- The areas with the most significant opportunities. What's the happy path for customers to complete an order?
- The areas that need more improvements. What happens before customers abandon their shopping carts?
How can Python Plotly visualize the customer journey in a Sankey diagram?
Let's imagine we are a small business owner selling products on an e-commerce shopping website; we would like to understand the journey for our customers from discovering to buying. In the case of customer journeys, the number of nodes can convey the event's quantity and chronological order information, and the width of the links can display the proportion of users who moved from one specific event to another.
Starting Point — A made-up Dataset on customer behavior
We start with a made-up DataFrame in pandas named df, with the following columns:
user_id
: distinct user-id
event_name
: event name ['Home,' 'Cart,' 'Product,' 'Cancel,' 'Purchase,' etc.]
platform
: platform associated with the event ['Andriod,' 'iOS,' 'PC']
time
: timestamp at which the event took place
The syntax to generate the DataFrame: