avatarYash Prakash

Summary

The web content provides insights into enhancing productivity and efficiency in data science projects using Jupyter Notebooks through various tips, utilities, and best practices.

Abstract

The article titled "7 Awesome Jupyter Utilities That You Should Be Aware Of" offers valuable advice for data scientists who frequently use Jupyter Notebooks. It emphasizes the importance of Jupyter Notebooks as a versatile platform for interactive coding, documentation, and visualization in data science. The author shares personal experience with utilities such as module auto-reloading, notebook merging with nbmerge, trusting notebooks for better exports, exporting interactive Plotly graphs, converting notebooks to PDF and HTML with improved formatting, timing code execution, and customizing themes for increased productivity. These utilities are presented as practical solutions to common issues faced by data scientists, aiming to streamline the workflow and enhance the overall experience of using Jupyter Notebooks.

Opinions

  • The author highly recommends the auto-reloading feature for modules, considering it an essential tool for seamless code updates.
  • nbmerge is praised for its simplicity in combining multiple notebooks, which is seen as a significant advantage in organizing projects.
  • The issue of notebook trustworthiness is highlighted as a common hindrance when exporting notebooks, particularly to PDF format.
  • Exporting interactive Plotly graphs is acknowledged as a challenge, and the provided solution is presented as a reliable fix for preserving interactivity in shared notebooks.
  • The author expresses dissatisfaction with the default PDF export functionality in Jupyter Notebooks, advocating for the use of notebook-as-pdf for better results.
  • The %time magic command is regarded as a handy feature for monitoring the performance of code execution, especially for time-consuming operations.
  • Personal productivity is linked to the aesthetic appeal of the coding environment, with the author endorsing the use of custom themes, particularly the oceans16 theme.
  • The author encourages following their Medium profile for more insights and articles on data science, suggesting that shared knowledge can make solo learning more enjoyable.

Data Science

7 Awesome Jupyter Utilities That You Should Be Aware Of

Some useful tips and hacks that I make use of for pretty much all my data science projects involving Jupyter Notebooks

Photo by Hybrid on Unsplash

Jupyter Notebooks are considered to be the backbone of any data science experiment and for good reason. They allow for interactive, literate programming that no other platform provides.

Writing runnable code, providing meaningful documentation, developing interactive plots and even refractoring and debugging your code is much easier to do in Jupyter Notebooks first before moving on to writing a production code as an import ready module.

In this quick article, I’m sharing a few tips and tricks that I frequently make use of while coding data science projects in notebooks.

Let’s get started! 💥

Module auto-reloading

Quite a few times, we need to change some module that we import in our notebooks mid way through our run, and often, restarting the kernel is not a viable option.

This little snippet of code when run in the first cell of your notebook will allow you to make changes to any module outside the notebook and save it, allowing it to be reloaded automatically in the notebook!

%load_ext autoreload 
%autoreload 2

This is such a handy feature, I can’t stress it enough how important it’s been for me!

Merging multiple notebooks

There’s a lightweight library that makes it possible for you to merge two or more notebooks into one.

It’s called nbmerge.

!pip install nbmerge 
!nbmerge file_1.ipynb file_2.ipynb file_3.ipynb > merged.ipynb

Run it in a cell and see the magic yourself!

Trusting a notebook

Tell me you’ve seen this output in your terminal while a notebook is running — ‘abc_code.ipynb notebook is not trusted.’

It comes up all the time. Sometimes, this has been the reason I wasn’t able to export my notebooks properly as a PDF file. Well, there is a quick snippet of code to make the message go away and remove all problems associated with the untrustworthiness of the notebook.

!jupyter trust file1.ipynb

Exporting interactive plotly graphs

One of the best and most useful features of plotly is that when you share a notebook with Plotly plots, the user is able to interact with the plots embedded in the notebook.

Well, the problem is, quite a few times when I tried to share my notebooks as a PDF or HTML file, the graphs just wouldn’t show. After quite a bit of researching, I found two lines of code that make this annoying problem go away so you can share your plots just the way you want it.

import plotly 
plotly.offline.init_notebook_mode()

Run this in a cell at the time you want to convert the notebook to a PDF or HTML file in order to share it, meaning, this code should run in the last cell of your notebook.

Exporting to PDF and HTML

I know, I know. There’s a very simple way to do this. Go to the File menu and simply click an option.

But hear me out for a second.

Exporting via the normal way to PDF is a risky bet. Line-breaks get messed up. Margins are ill-formed and some of the code and outputs are 90% of the time cut off. Doing it the usual, easy way has its perks but lately I’ve found its disadvantages outweigh the advantages.

What I found however, is a fresh little library that outputs our notebook in a nice LaTex format without installing/doing anything else.

It’s called notebook-as-pdf.

!pip install notebook-as-pdf
!pyppeteer-install

After this, you should be good to go. Just go to the File menu and select Export as →PDF via HTML.

Or you can simply type:

!jupyter-nbconvert --to PDFviaHTML file.ipynb

And that’s it! It works quite nicely. Try it out.

For an HTML output, you don’t need an additional library. It simply goes like this:

!jupyter nbconvert my_notebook.ipynb --to html --output my_notebook.html

Measure time taken in running commands

This one is a handy magic command (it literally is called that, yes :P ), which enables you to measure the amount of time it took to either:

  • execute a single line of code like:
%time my_list = [x for x in all_items]

or,

  • execute an entire cell like:
%%time
my_list = [x for x in all_items]
with open('a.txt') as f:
    '''do something'''

It’s a neat little feature of iPython that I generally make use of some long running operations, such as calculating sentence embeddings of a large block of text, loading data via DataLoader, etc.

And lastly,

Increase your productivity via a custom theme

Lately, I’ve been using dark themes in jupyter notebooks a lot. It’s easy on the eyes and in general, a bit aesthetically pleasing too. Trust me, you’ll enjoy working for longer durations a lot more if your code editor looks exactly the way you like it.

Getting started involves simply installing a library and choosing a theme.

pip install jupyterthemes

Then, get the list of themes:

!jt -t

Choose one like this:

!jt -t <theme_name>

Checkout the official repo of the library. It offers a detailed explanation of features, if you’re interested.

My current theme of choice is oceans16. It’s good for my productivity. Find your own favourite! :)

Thank you for reading! 😄

I regularly write about what I learn in Data Science. Solo learning can be tough, follow me in here and let’s make it enjoyable too, one fun article at a time.

Here is the codebase of all my Data Science stories. Happy learning!

Another couple of stories of mine to interest you:

Python
Jupyter Notebook
Machine Learning
Productivity
Programming
Recommended from ReadMedium