Hands-on Tutorial
Python Runtime Profiling using SnakeViz — How to Inspect the Codes Performances
Determine which part of your Python codes takes more time to run
After reading and doing step-by-step in this tutorial, you will get some new knowledge and experiences in Python script profiling, how to create profiling on your own script or function, and determine which part of the function takes more time to run.
Introduction to Python script profiling
When working in production, other than bugs occurring in Python script, the run time execution will be one consideration. It is a performance issue when our data volume becomes bigger and bigger. The production script must be restructured and optimized to improve performance.
What if the scripts are too complex to check line by line? Are there any tools to summarize the script performance rather than wasting our time checking line by line and file by file? Yups, we need Python script profiling and Python has several options of modules to do it.
In this tutorial, we will use cProfile to create profiling.
Introduction to cProfile
The cProfile is a built-in module provided by Python to profile our scripts. It is commonly used as a Python script profiler by Python programmers. Different from profile that is written above Python,cProfile is written above C. Why do developers love using cProfile instead of other Python script profiling tools? Probably, it is caused by functionality bundled by cProfile.
- Standard built-in module with a simple command to do profiling
- A lot of statistics were generated to check the Python script performance. Besides, performing profiling on a function,
cProfilealso check the performance of the function when calling its dependent function - Flexible in order to profile a whole script or specific function in the script
How to use the cProfile
To follow the tutorial, we need to install all the modules needed for Python script profiling. There are two main modules, namely cProfile and snakeviz. The cProfile is a built-in module that is automatically installed when we install Anaconda in our OS. However, the snakeviz must be installed independently.
cProfile— used for deterministic profiling of Python scriptpstats— used for analyzing the Python script profiling datatime— used for calculating the runtime of Python scriptio— used for dealing with various types of I/Osnakeviz— used for making Python script profiling visualization
When all modules are already installed using Anaconda Prompt, import them into Jupyter Notebook. Luckily, we can run Jupyter Notebook online without any installation.
# Module for deterministic profiling of Python scripts/programs
import cProfile
import pstats
# Module for timing
import time
# Module for dealing with various types of I/O
import io
# Module Python codes profiling
import snakevizCreate Python script profiling using run()
Our first example of Python script profiling is simple. It is just one simple function foo() that tries to sleep for 1 millisecond and prints a string foo. To profile our function cProfile has run() method in which we can pass the Python code or function name that we will profile as the string.
# One function
def foo():
time.sleep(1)
print('foo')
# Run the profiling
cProfile.run('foo()')
Rows in the stats table represent the unique functions called and columns are the information related to these unique functions. Read the detailed information at https://jiffyclub.github.io/snakeviz/.
Moving from the simple function created before, we will create a more complex in which consists of looping and print operation. Two functions — loopingSomenting() and printSomething() are created and embedded into the main function main().
# More than one function
# 1 Function for looping
def loopingSomething():
index = []
for i in range(100):
index.append(i)
# 2 Function for printing
def printSomething():
print('Print something here!')
# 3 Bundle above functions
def main():
# First function
loopingSomething()
# Second function
printSomething()
# Run the profiling
cProfile.run('main()')
Modify the profiling output using Profile class
Sometimes, for the profiling output, we want to order the stats table based on the number of calls or the total amount of time taken per one call for a given function. That unluckily can not be handled by run() method because it only returns the ordering system based on filename:lineno(function).
The Profile class is useful to modify the output generated by profiling.
The method
enable()allow the profiler to start collecting the profiling information and followed by the main function. The methoddisable()allow the profiler to stop collecting the profiling information
The data that is collected by cProfile then will be sorted by the pstats method that is sort_stats(). We can determine which one of the columns is the reference.
Further, the output will be printed out by print_stats().
# Create a object from cProfile class
profiler = cProfile.Profile()
profiler.enable()
main()
profiler.disable()
# Sort the profiling based on 'ncalls'
stats = pstats.Stats(profiler).sort_stats('ncalls')
# Print the profiling
stats.print_stats()
The output is followed by the directory name and it can be easily removed if doesn’t fit our report for advanced analysis. The operation uses strip_dirs() method.
# Remove directory names
stats = pstats.Stats(profiler).strip_dirs().sort_stats('ncalls')
stats.print_stats()
Save the cProfile output into different formats
After successfully printing out and ordering the output of Python script profiling, we will talk about how to save the output into different file formats like raw, txt, or CSV.
To save in raw format, we use dump_stats() method that passes the argument of a directory where the file will be saved and filename. In this tutorial, we save the cProfile output in folder data and filename as cProfileExport.
# Export the profiler output into file
stats = pstats.Stats(profiler)
stats.dump_stats('../data/cProfileExport')The output also can be saved in txt format. It’s performed using StringIO method and pass it as the parameter in the argument stream.
result = io.StringIO()
stats = pstats.Stats(profiler, stream = result).sort_stats('ncalls')
stats.print_stats()
# Save it into disk
with open('../data/cProfileExport.txt', 'w+') as f:
f.write(result.getvalue())To save the output into CSV format, it has more effort to do like data transformation so the data will be successfully parsed into comma separator format.
result = io.StringIO()
stats = pstats.Stats(profiler, stream = result).sort_stats('ncalls')
stats.print_stats()
result = result.getvalue()
# Chop the string into a csv-like buffer
result = 'ncalls' + result.split('ncalls')[-1]
result = '\n'.join([','.join(line.rstrip().split(None, 6)) for line in result.split('\n')])
# Save it into disk
with open('../data/cProfileExport.csv', 'w+') as f:
f.write(result)
While we saved the cProfile output in CSV format, to read it in proper condition, we must perform data manipulation.
# Import module for data manipulation
import pandas as pd
# Load the data
df_profiling = pd.read_csv('../data/cProfileExport.csv', sep = ',')
# Print the dimension
print('Dimension data: {} rows and {} columns'.format(len(df_profiling), len(df_profiling.columns)))
df_profiling.head()
The manipulation is simple, it only involves the concatenation between two columns and column reordering. The steps are systematically listed.
# Fill missing value
df_profiling.fillna(value = '', inplace = True)
# Column manipulation to make the data clean
df_profiling['filename:lineno(function)'] = df_profiling['percall.1'] + ' ' + df_profiling['filename:lineno(function)']
# Rename columns
del df_profiling['percall.1']
cols = ['tottime', 'percall', 'cumtime', 'percall_2', 'filename:lineno(function)']
df_profiling.columns = cols
# Reset index
df_profiling['ncalls'] = df_profiling.index
df_profiling.reset_index(drop = True, inplace = True)
# Reorder columns
cols = ['ncalls', 'tottime', 'percall', 'cumtime', 'percall_2', 'filename:lineno(function)']
df_profiling = df_profiling[cols]
How to visualize the cProfile object using SnakeViz
In reality, our Python script and function are too complex and it directly makes the cProfile output is harder to read than we think before. To analyze the function performances in our script, it is easier to display it using visualization (diagram or graph), instead of reading row by row in cProfile output.
Python provides a snakeviz module that can automatically make Python script profiling viz from the log file generated by cProfile. It has two visualization styles, namely icicle, and sunburst charts.
- Icicle chart — the time taken by functions to run will be visualized by the width of a rectangle. The root function is on the top of viz which has the largest width. The root function runs by calling the sub-functions below it and so on. The smaller a rectangle width, the faster a function is executed
- Sunburst chart — the time taken by functions to run will be visualized by an angular extent of the arc. The root function is a circle in the middle of viz. The root function runs by calling the sub-functions below it and so on
Create profiling viz for the machine learning function
Let’s create a machine-learning pipeline for the iris data set. Further, it has a goal to produce a classification model that will help us to predict the species class based on a given characteristic, such as petal length, petal width, sepal length, sepal width, etc.
The machine learning model uses logistic regression as the benchmark and the metric used is accuracy. The function called irisDataClassification() will print out the final accuracy.
It’s a common real problem that usually needs runtime optimization when the data becomes bigger or the algorithm is more complex. The Python script profiling will help us assess which one of the codes or functions must be optimized.
def irisDataClassification():
# Import modules
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Import some data to play with
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Data splitting
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size = 0.2,
random_state = 1,
stratify = y)
# Create logistic regression object
model = LogisticRegression()
# Data modelling with logistic regression
model.fit(X_train, y_train)
# Create prediction using testing data
y_pred = model.predict(X_test)
# Print out the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)The snakeviz module provides us with two options to generate the visualization or dashboard, via Jupyter Notebook or Terminal. When the scripts become more complex, I suggest using the terminal because it will open new windows in the browser.
Create a visualization using Jupyter Notebook
It’s easy to display the visualization by calling the irisDataClassiication() in the %snakeviz command.
# Python script profiling viz with SnakeViz
%load_ext snakeviz
%snakeviz irisDataClassification()

Create a visualization using the Terminal
Firstly, the function irisDataVisualization must be saved in .py file. Next, open the terminal and change the directory into a folder where the .py file located. Then, run the snakeviz command followed by the filename of irisDataClassification viz.
A new tab will automatically open for the viz.
The filename is adjusted to your needs
snakeviz iris_classification.prof
The default visualization style uses an icicle chart but we can change it to the sunburst chart by the style dropdown option.

The stats table is displayed at the bottom of viz.







