Gantt charts with Python’s Matplotlib
A guide to visualizing project schedules with Python

With more than 100 years of history, this visualization continues to be very useful for project management.
Henry Gantt initially created the graph for analyzing completed projects. More specifically, he designed this visualization to measure productivity and identify underperforming employees. Through the years, it became a tool for planning and tracking, often discarded once the project is over.
It’s undeniable that Gantt charts have changed a lot since their first design. Analysts introduced many encodings to display distinctions between departments, tasks completeness, dependencies, deadlines, and much more.
This article will explore how to create Gantt charts using Python, Pandas, and Matplotlib.
Hands-on
import pandas as pd
import matplotlib.pyplot as plt
import numpy as npFor this example, we’ll need some dummy data; the dataset we’ll use has columns for the task’s name, department, a start and end date, and completion.
df = pd.read_excel('../data/plan.xlsx')
df
To make our plotting easier, we’ll need to derive some measures.
We’ll start with a variable for the project’s start date.
Then, we’ll add a column with the number of days from the start of the project to the beginning of each task; this will help position the bars on the x-axis.
Same for the task’s end; This facilitates calculating the total days needed to complete the task, the bar’s length, and helps position the texts later on.
# project start date
proj_start = df.Start.min()# number of days from project start to task start
df['start_num'] = (df.Start-proj_start).dt.days# number of days from project start to end of tasks
df['end_num'] = (df.End-proj_start).dt.days# days between start and end of each task
df['days_start_to_end'] = df.end_num - df.start_num
Now we can plot a bar chart. Y will be the task name, the width is the number of days between the start and end of the task, and the left is the number of days between the project start to the task start.
fig, ax = plt.subplots(1, figsize=(16,6))ax.barh(df.Task, df.days_start_to_end, left=df.start_num)plt.show()
Cool, we got the simplest of the Gantt charts.
There are lots of details we can add to make our chart more insightful. We’ll start with the most essential, a proper x-axis with dates and colors to distinguish the departments.
# create a column with the color for each department
def color(row):
c_dict = {'MKT':'#E64646', 'FIN':'#E69646', 'ENG':'#34D05C', 'PROD':'#34D0C3', 'IT':'#3475D0'}
return c_dict[row['Department']]df['color'] = df.apply(color, axis=1)For the x-axis, we’ll add a label every three days, and we’ll also add minor ticks for each day.
from matplotlib.patches import Patchfig, ax = plt.subplots(1, figsize=(16,6))ax.barh(df.Task, df.days_start_to_end, left=df.start_num, color=df.color)##### LEGENDS #####
c_dict = {'MKT':'#E64646', 'FIN':'#E69646', 'ENG':'#34D05C',
'PROD':'#34D0C3', 'IT':'#3475D0'}
legend_elements = [Patch(facecolor=c_dict[i], label=i) for i in c_dict]plt.legend(handles=legend_elements)##### TICKS #####
xticks = np.arange(0, df.end_num.max()+1, 3)
xticks_labels = pd.date_range(proj_start, end=df.End.max()).strftime("%m/%d")
xticks_minor = np.arange(0, df.end_num.max()+1, 1)ax.set_xticks(xticks)
ax.set_xticks(xticks_minor, minor=True)
ax.set_xticklabels(xticks_labels[::3])plt.show()
Great! This graph is way more insightful than our previous version.
Now let’s encode the completeness of the project to our visualization.
# days between start and current progression of each task
df['current_num'] = (df.days_start_to_end * df.Completion)We’ll add another bar to our plot and use the measure we just created as the width.
To increase the precision, we’ll write the percentage of completeness at the end of the bars. And to distinguish the completed from uncompleted, we can play with the alpha parameter of the bars.
from matplotlib.patches import Patchfig, ax = plt.subplots(1, figsize=(16,6))# bars
ax.barh(df.Task, df.current_num, left=df.start_num, color=df.color)
ax.barh(df.Task, df.days_start_to_end, left=df.start_num, color=df.color, alpha=0.5)# texts
for idx, row in df.iterrows():
ax.text(row.end_num+0.1, idx,
f"{int(row.Completion*100)}%",
va='center', alpha=0.8)##### LEGENDS #####
c_dict = {'MKT':'#E64646', 'FIN':'#E69646', 'ENG':'#34D05C', 'PROD':'#34D0C3', 'IT':'#3475D0'}
legend_elements = [Patch(facecolor=c_dict[i], label=i) for i in c_dict]
plt.legend(handles=legend_elements)##### TICKS #####
xticks = np.arange(0, df.end_num.max()+1, 3)
xticks_labels = pd.date_range(proj_start, end=df.End.max()).strftime("%m/%d")
xticks_minor = np.arange(0, df.end_num.max()+1, 1)ax.set_xticks(xticks)
ax.set_xticks(xticks_minor, minor=True)
ax.set_xticklabels(xticks_labels[::3])plt.show()
And that’s it!
We can improve this visualization, make it more appealing, add more information with another axis, draw gridlines, add a title, and so much more.


Conclusions
Overall this is an excellent way of visualizing projects, even though it might not fit with the most current project management approaches.
Gantt charts are flexible in the sense of — They can have many functionalities.
You can break down tasks, track performance measures, dependencies, milestones, deadlines, and much more. Adding more information to Gantt charts is easily achieved with more encodings, tooltips, drill-downs, and texts.
All that information can just as easily make our chart hard to understand and even tougher to maintain.
With agile approaches, plans are constantly changing. Spending that much time collecting and maintaining this information to follow up on a project requires too many resources and often becomes counterproductive.
All that said, they excel in visualizing completed projects and can be way more insightful than flowcharts, tables, or Kanban/ Scrum boards, especially for scrutinizing a single process or project.
Thanks for reading my article! — Here you can find more Python dataviz tutorials.





