avatarDarío Weitz

Summary

The website content provides a comprehensive guide on creating waterfall charts using Plotly, an open-source graphing library, for both categorical and time-based data analysis, emphasizing their utility in financial and business contexts for illustrating cumulative effects of sequential positive and negative values.

Abstract

Waterfall charts, also known as flying bricks or Mario charts, are a data visualization tool used to represent the cumulative impact of sequentially added positive or negative values over time or across categories. The guide explains the significance of waterfall charts in financial analysis and business reporting, detailing how they can clarify the contributions of individual items or periods to a net change. It offers a step-by-step tutorial on using Plotly to create these charts, including how to set up data, use specific traces like go.Waterfall(), and customize the appearance for better storytelling. The article also demonstrates the creation of a time-based waterfall chart using a hypothetical dataset of monthly visitors to a location, showcasing the flexibility of Plotly in handling different types of data.

Opinions

  • Waterfall charts are highly valued for their ability to provide contextual information and clearly identify periods or items with the greatest gains or losses.
  • The author suggests that waterfall charts offer more insight than other similar charts, making them particularly useful for analysts.
  • The use of Plotly is recommended for its ease of use and customization options, which enhance the narrative quality of the charts.
  • The article promotes the idea that waterfall charts are simple to implement with the right tools, encouraging their adoption in financial and business analysis.
  • The author emphasizes the importance of visualizing data effectively to communicate complex information clearly, positioning waterfall charts as a powerful tool in data storytelling.

Waterfall Charts with Plotly

Why & How

Image by Muhamad Rizal Firmansyah from Unsplash

Waterfall Charts

AKA: Flying Bricks Charts, Floating Bricks Charts, Mario Charts

Why: is a 2D plot used to represent the cumulative effects of sequentially added positive or negative values over time or over multiple categorical steps. Over time or time-based waterfall charts represent additions and subtractions over a time period. Categorical steps or category-based waterfall charts represent additions and subtractions over revenues and expenses or any other variable with sequentially positive and negative values.

How: waterfall charts (WCs) are made up of a series of vertical bars (columns). Initial and final values are represented by full columns (usually starting at a zero baseline), while intermediate values are shown as floating columns representing the additions and subtractions. The last vertical bar indicates the outcome of such additions and subtractions. Additions are usually represented in green while subtractions are usually shown in red color. Also, it is customary to indicate the initial and final columns with another color. It is recommended to display the idea of cumulative effects by linking the columns with connecting horizontal lines.

Fig.1: schematic representation of a waterfall chart. Made by the author with Plotly.

It should be clear now why they are known as Flying Bricks or Floating Bricks charts. Someone named them Mario Charts because of a certain resemblance to the popular video game.

Storytelling: WCs are commonly used in finance and business for data that swings between positive and negative values. Time-based WCs show monthly or yearly total changes while showing profits or losses throughout the month or year. Category-based WCs show the cumulative effects of sequentially added positive or negative values for a given variable. Positive values may be revenues, gains, stock added in warehouses, positive changes, or incoming streams. Negative values may be expenses, losses, stock taken from warehouses, negative changes, or outgoing streams. Always keep in mind that the reading is done sequentially from left to right.

A WC is a valuable data visualization technique because it allows the analyst to clearly determine which periods or items showed the greatest gains, when the greatest losses were observed, and what the net change was over the time period evaluated. It provides more contextual information than other similar charts.

Waterfall Charts with Plotly

We used Plotly, an open source graphing library, which provides a group of classes called graph objects for constructing figures. Figure is a primary class with a data attribute and a layout attribute. The data attribute refers to a trace, a particular type of chart with its corresponding parameters. The layout attribute specifies the title, axes, legends, and other properties of the figure.

For the waterfall chart in this article, the Plotly trace is go.Waterfall() and the corresponding parameters are: x= to set the x coordinates (usually strings or date time objects); y= to set the y coordinates (usually a list con numerical values, including None); base= to set the numerical baseline.

The most important parameter is measure=, an array with one of the following values: relative; absolute; total. relative, the default value, indicates additions or subtractions. absolute sets the initial value while total compute the algebraic sums.

So this is the code for the waterfall chart in Figure 1:

import plotly.graph_objects as go
fig1  = go.Figure()
hrz = ["Initial", "Addition 1","Addition 2",
       "Subtraction 1","Subtraction 2","Final"]
vrt = [100, 600, 700, -400, -300, None]
fig1.add_trace(go.Waterfall(                 
                  x = hrz, y = vrt,
                  base = 0,
                  measure = [ "absolute","relative",    
                              "relative","relative",
                              "relative","total" ]                        
                ))                                 
fig1.show()

We updated the chart to improve the storytelling: text to set annotations for each bar; textposition to locate the text list inside or outside the bars; update.layout to set the title text and the title font.

This is the code for the waterfall chart in Figure 2:

import plotly.graph_objects as go
fig2  = go.Figure()
hrz = [ "Initial",  "Addition 1", "Addition 2",
       "Subtraction 1","Subtraction 2","Final"]
vrt  = [100, 600, 700, -400, -300, None]
text = ['100', '+600', '+700', '-400', '-300', '700']
fig2.add_trace(go.Waterfall(
               x = hrz, y = vrt,
               base = 0,
               text = text, textposition = 'inside',   
               measure = ["absolute",  "relative", "relative",
                          "relative","relative","total"]  
               ))              
fig2.update_layout(
                   title_text = "Category-Based Waterfall Chart",
                   title_font=dict(size=25,family='Verdana', 
                                   color='darkred')
                   )
fig2.show()
Fig.2: schematic representation of a waterfall chart. Made by the author with Plotly.

In our second example, we are going to represent with a time-based waterfall chart the cumulative effects of sequentially increasing and decreasing number of monthly visitors to a fictional place.

First, we are going to create a dataframe with the data we were supposed to collect about the increase and decrease in the number of visitors. We need to import the libraries Numpy & Pandas as np and pd respectively.

import numpy  as np
import pandas as pd
months =   ['Initial', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
            'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', 'Final']
visitors = [15000, +18000, +12000, -4000,  +8000, 
           -10000, -5000,  +20000, +15000, +18000,
           -16000, -18000, +10000, 63000]
df = pd.DataFrame({'months' : months, 'visitors' : visitors,
                   'text' : visitors})

We need to create a column in the dataframe that indicates the values associated with measure. Remember that this parameter can take any of the following three values: absolute; relative; total. To fill the column named measure we used the Numpy method np.select() which returns an array drawn from elements in choicelist, depending on a conditions list.

conditionlist = [(df['months'] == 'Initial'),
                 (df['months'] == 'Final'),
            (df['months'] != 'Initial') & (df['months'] != 'Final')]
choicelist   = ['absolute', 'total', 'relative']
df['measure'] = np.select(conditionlist, choicelist,
                          default='absolute')

The screenshot below shows the fourteen records of the dataset:

Now we are ready to draw the WC.

Plotly allows to customize the colors in the floating bars with increasing, decreasing, and totals.

fig3  = go.Figure()
fig3.add_trace(go.Waterfall(
               x = df['months'], y = df['visitors'],
               measure = df['measure'],
               base = 0,
               text = df['visitors'],
               textposition = 'outside',
               decreasing = {"marker":{"color":"crimson",                 
                  "line":{"color":"lightsalmon","width":2}}},
               increasing = {"marker":{"color":"forestgreen",
                  "line":{"color":"lightgreen", "width":2}}},
               totals     = {"marker":{"color":"mediumblue"}}
               ))

We decided to locate the annotations outside the bars to avoid cluttering. Finally we set the title and updated the axes:

fig3.update_layout(
                   title_text = "Time-Based Waterfall Chart",
                   title_font = dict(size=25,family='Verdana',
                                     color='darkred'))
fig3.update_yaxes(title = 'Visitors' , range = [0, 100000])
fig3.update_xaxes(title = 'Year 2020')
fig3.show()
Fig.3: a time-based waterfall chart. Made by the author with Plotly.

To sum up: the key concept in a Waterfall Chart is to communicate changes in positive and negative values across a time period or across a list of related items. They are very simple to implement, particularly with Plotly. They are widely used in financial analysis and business environments.

If you find this article of interest, please read my previous (https://medium.com/@dar.wtz):

Diverging Bars, Why & How, Storytelling with Divergences

Slope Charts, Why & How, Storytelling with Slopes

Data Visualization
Data Science
Plotly
Storytelling
Charts And Graphs
Recommended from ReadMedium