avatarThiago Carvalho

Summary

The website content provides a comprehensive guide on creating Gantt charts using Python, Pandas, and Matplotlib, detailing the process from basic to more advanced visualizations for project management.

Abstract

The article "Gantt charts with Python’s Matplotlib" offers a step-by-step tutorial on visualizing project schedules using Python programming. It begins by acknowledging the historical significance of Gantt charts in project management and their evolution to include various encodings such as task completeness, dependencies, and deadlines. The author then demonstrates how to generate a Gantt chart with dummy data, starting with a simple bar chart representation of tasks over time and progressively enhancing the chart with color-coding for departments, proper x-axis date labeling, and encoding task completion percentages. The guide emphasizes the flexibility of Gantt charts in displaying detailed project information and discusses their utility in both traditional and agile project management contexts, although it notes the potential complexity and maintenance challenges. The article concludes by providing links to additional Python data visualization tutorials and acknowledging the insightfulness of Gantt charts for analyzing completed projects.

Opinions

  • The author believes that Gantt charts, despite their age, remain highly relevant for project management due to their ability to visually track project schedules and productivity.
  • The author suggests that while Gantt charts are useful, they may not align perfectly with the most current agile project management approaches due to the high level of detail and maintenance they require.
  • It is implied that Gantt charts are more insightful than other project visualization methods like flowcharts, tables, or Kanban/Scrum boards when it comes to scrutinizing a single process or project.
  • The article conveys that Gantt charts can become complex and difficult to maintain, which might be counterproductive in agile environments where plans change frequently.
  • The author endorses the use of Gantt charts for retrospective analysis of completed projects, highlighting their effectiveness in this context.

Gantt charts with Python’s Matplotlib

A guide to visualizing project schedules with Python

Image by the author

With more than 100 years of history, this visualization continues to be very useful for project management.

Henry Gantt initially created the graph for analyzing completed projects. More specifically, he designed this visualization to measure productivity and identify underperforming employees. Through the years, it became a tool for planning and tracking, often discarded once the project is over.

It’s undeniable that Gantt charts have changed a lot since their first design. Analysts introduced many encodings to display distinctions between departments, tasks completeness, dependencies, deadlines, and much more.

This article will explore how to create Gantt charts using Python, Pandas, and Matplotlib.

Hands-on

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

For this example, we’ll need some dummy data; the dataset we’ll use has columns for the task’s name, department, a start and end date, and completion.

df = pd.read_excel('../data/plan.xlsx')
df
Image by the author

To make our plotting easier, we’ll need to derive some measures.

We’ll start with a variable for the project’s start date.

Then, we’ll add a column with the number of days from the start of the project to the beginning of each task; this will help position the bars on the x-axis.

Same for the task’s end; This facilitates calculating the total days needed to complete the task, the bar’s length, and helps position the texts later on.

# project start date
proj_start = df.Start.min()
# number of days from project start to task start
df['start_num'] = (df.Start-proj_start).dt.days
# number of days from project start to end of tasks
df['end_num'] = (df.End-proj_start).dt.days
# days between start and end of each task
df['days_start_to_end'] = df.end_num - df.start_num
Image by the author

Now we can plot a bar chart. Y will be the task name, the width is the number of days between the start and end of the task, and the left is the number of days between the project start to the task start.

fig, ax = plt.subplots(1, figsize=(16,6))
ax.barh(df.Task, df.days_start_to_end, left=df.start_num)
plt.show()
Image by the author

Cool, we got the simplest of the Gantt charts.

There are lots of details we can add to make our chart more insightful. We’ll start with the most essential, a proper x-axis with dates and colors to distinguish the departments.

# create a column with the color for each department
def color(row):
    c_dict = {'MKT':'#E64646', 'FIN':'#E69646', 'ENG':'#34D05C', 'PROD':'#34D0C3', 'IT':'#3475D0'}
    return c_dict[row['Department']]
df['color'] = df.apply(color, axis=1)

For the x-axis, we’ll add a label every three days, and we’ll also add minor ticks for each day.

from matplotlib.patches import Patch
fig, ax = plt.subplots(1, figsize=(16,6))
ax.barh(df.Task, df.days_start_to_end, left=df.start_num, color=df.color)
##### LEGENDS #####
c_dict = {'MKT':'#E64646', 'FIN':'#E69646', 'ENG':'#34D05C',
          'PROD':'#34D0C3', 'IT':'#3475D0'}
legend_elements = [Patch(facecolor=c_dict[i], label=i)  for i in c_dict]
plt.legend(handles=legend_elements)
##### TICKS #####
xticks = np.arange(0, df.end_num.max()+1, 3)
xticks_labels = pd.date_range(proj_start, end=df.End.max()).strftime("%m/%d")
xticks_minor = np.arange(0, df.end_num.max()+1, 1)
ax.set_xticks(xticks)
ax.set_xticks(xticks_minor, minor=True)
ax.set_xticklabels(xticks_labels[::3])
plt.show()
Image by the author

Great! This graph is way more insightful than our previous version.

Now let’s encode the completeness of the project to our visualization.

# days between start and current progression of each task
df['current_num'] = (df.days_start_to_end * df.Completion)

We’ll add another bar to our plot and use the measure we just created as the width.

To increase the precision, we’ll write the percentage of completeness at the end of the bars. And to distinguish the completed from uncompleted, we can play with the alpha parameter of the bars.

from matplotlib.patches import Patch
fig, ax = plt.subplots(1, figsize=(16,6))
# bars
ax.barh(df.Task, df.current_num, left=df.start_num, color=df.color)
ax.barh(df.Task, df.days_start_to_end, left=df.start_num, color=df.color, alpha=0.5)
# texts
for idx, row in df.iterrows():
    ax.text(row.end_num+0.1, idx, 
            f"{int(row.Completion*100)}%", 
            va='center', alpha=0.8)
##### LEGENDS #####
c_dict = {'MKT':'#E64646', 'FIN':'#E69646', 'ENG':'#34D05C', 'PROD':'#34D0C3', 'IT':'#3475D0'}
legend_elements = [Patch(facecolor=c_dict[i], label=i)  for i in c_dict]
plt.legend(handles=legend_elements)
##### TICKS #####
xticks = np.arange(0, df.end_num.max()+1, 3)
xticks_labels = pd.date_range(proj_start, end=df.End.max()).strftime("%m/%d")
xticks_minor = np.arange(0, df.end_num.max()+1, 1)
ax.set_xticks(xticks)
ax.set_xticks(xticks_minor, minor=True)
ax.set_xticklabels(xticks_labels[::3])
plt.show()
Image by the author

And that’s it!

We can improve this visualization, make it more appealing, add more information with another axis, draw gridlines, add a title, and so much more.

CODE — Image by the author
CODE — Image by the author

Conclusions

Overall this is an excellent way of visualizing projects, even though it might not fit with the most current project management approaches.

Gantt charts are flexible in the sense of — They can have many functionalities.

You can break down tasks, track performance measures, dependencies, milestones, deadlines, and much more. Adding more information to Gantt charts is easily achieved with more encodings, tooltips, drill-downs, and texts.

All that information can just as easily make our chart hard to understand and even tougher to maintain.

With agile approaches, plans are constantly changing. Spending that much time collecting and maintaining this information to follow up on a project requires too many resources and often becomes counterproductive.

All that said, they excel in visualizing completed projects and can be way more insightful than flowcharts, tables, or Kanban/ Scrum boards, especially for scrutinizing a single process or project.

Thanks for reading my article! — Here you can find more Python dataviz tutorials.

Data Visualization
Python
Project Management
Matplotlib
Gantt Chart
Recommended from ReadMedium