avatarRoman Orac

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5774

Abstract

time-saver at first. The JupyterLab’s export feature doesn’t take CSS into account, so pandas DataFrames get a plain format in the PDF.</p><figure id="4c2f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6jhef7XPqkdTUajVY_QMAA.gif"><figcaption>Dataframe with many columns looks great in JupyterLab, but what happens when we export it to PDF?</figcaption></figure><figure id="eb66"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bt41z2jp0aic2ZaUQPpfAA.gif"><figcaption>When we export a Notebook to PDF, a DataFrame with many columns gets broken down over multiple pages.</figcaption></figure><p id="f860">I’ve tried to convince myself that “the export” doesn't look so bad, but I still wasn’t satisfied. After spending some time researching, I’ve tried to export the Notebook to HTML, opening it in Chrome browser, and using Chrome’s save to PDF feature.</p><figure id="cfc8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*skVVlEeWUfSgU_jYqqE0_w.png"><figcaption>Jupyter Notebook exported as HTML and saved to PDF with Chrome browser.</figcaption></figure><p id="3ef0">Aha! Dataframe formatting looks much nicer now.</p><h1 id="66dd">4. Don’t manually save to PDF</h1><p id="f5a5">Having multiple chapters and exporting each one to HTML and saving it to PDF is a cumbersome process. Can we automate it?</p><p id="0080">Chrome has a headless mode so we can save an HTML to PDF from a command line. This should work in theory, but it didn’t work for me.</p><div id="0d2b"><pre>chrome --headless --<span class="hljs-built_in">print</span>-<span class="hljs-keyword">to</span>-pdf=Chapter1.pdf --<span class="hljs-built_in">print</span>-<span class="hljs-keyword">to</span>-pdf-<span class="hljs-literal">no</span>-header Chapter1.html</pre></div><p id="ad2b">Exported PDF with Chrome had headers and footers filled despite using — print-to-pdf-no-header flag.</p><p id="0750">After researching, I found <a href="https://wkhtmltopdf.org/">wkhtmltopdf</a> tool which renders HTML into PDF and various image formats using the Qt WebKit rendering engine. On a macOS you can install it with Brew. The tool enables us to customize the CSS and margins.</p><p id="e6dc">Chrome browser produces a nice PDF, so I took the CSS formatting from Chrome and saved it into a custom.css file, which <a href="https://wkhtmltopdf.org/">wkhtmltopdf</a> tool takes by default:</p><div id="015e"><pre><span class="hljs-selector-tag">div</span><span class="hljs-selector-id">#notebook</span> { <span class="hljs-attribute">font-size</span>: <span class="hljs-number">18px</span>; <span class="hljs-attribute">line-height</span>: <span class="hljs-number">26px</span>; }</pre></div><div id="6784"><pre><span class="hljs-selector-tag">img</span> { <span class="hljs-attribute">max-width</span>: <span class="hljs-number">100%</span> <span class="hljs-meta">!important</span>; <span class="hljs-attribute">page-break-inside</span>: avoid; }</pre></div><div id="11b4"><pre><span class="hljs-selector-tag">tr</span>, <span class="hljs-selector-tag">img</span> { <span class="hljs-attribute">page-break-inside</span>: avoid; }</pre></div><div id="e670"><pre>*, *<span class="hljs-selector-pseudo">:before</span>, <span class="hljs-selector-pseudo">:after</span> { <span class="hljs-attribute">background</span>: transparent <span class="hljs-meta">!important</span>; <span class="hljs-attribute">box-shadow</span>: none <span class="hljs-meta">!important</span>; <span class="hljs-attribute">text-shadow</span>: none <span class="hljs-meta">!important</span>; }</pre></div><div id="78ac"><pre><span class="hljs-selector-tag">p</span>, <span class="hljs-selector-tag">h2</span>, <span class="hljs-selector-tag">h3</span> { <span class="hljs-attribute">orphans</span>: <span class="hljs-number">3</span>; <span class="hljs-attribute">widows</span>: <span class="hljs-number">3</span>; <span class="hljs-attribute">page-break-inside</span>: avoid; }</pre></div><div id="b2b3"><pre>, *<span class="hljs-selector-pseudo">:before</span>, <span class="hljs-selector-pseudo">:after</span> { <span class="hljs-attribute">page-break-inside</span>: avoid; <span class="hljs-attribute">background</span>: transparent <span class="hljs-meta">!important</span>; <span class="hljs-attribute">box-shadow</span>: none <span class="hljs-meta">!important</span>; <span class="hljs-attribute">text-shadow</span>: none <span class="hljs-meta">!important</span>; }</pre></div><div id="0487"><pre>, *<span class="hljs-selector-pseudo">:before</span>, *<span class="hljs-selector-pseudo">:after</span> { <span class="hljs-attribute">page-break-inside</span>: avoid; <span class="hljs-attribute">background</span>: transparent <span class="hljs-meta">!important</span>; <span class="hljs-attribute">box-shadow</span>: none <span class="hljs-meta">!important</span>; <span class="hljs-attribute">text-shadow</span>: none <span class="hljs-meta">!important</span>; }</pre></div><p id="0699">The command for wkhtmltopdf that takes the HTML and outputs a PDF:</p><div id="13fe"><pre><span class="hljs-attribute">wkhtmltopdf</span> — enable-internal-links -L <span class="hljs-number">10</span>mm -R <span class="hljs-number">9</span>.<span class="hljs-number">5</span>mm -T <span class="hljs-number">10</span>mm -B <span class="hljs-number">9</span>.<span class="hljs-number">5</span>mm Chapter1.html Chapter1.pdf</pre></div><h1 id="bc99">5. Merge PDFs</h1><p id="bbc7">I use <a href="https://community.coherentpdf.com/">cpdf</a> tool to merge PDFs to the final PDF. I downloaded the cpdf tool and put it in the folder with Jupyter Notebooks. To merge PDFs use the following command:</p><div id="56c

Options

a"><pre>./cpdf Chapter1<span class="hljs-selector-class">.pdf</span> Chapter2<span class="hljs-selector-class">.pdf</span> -o Ebook.pdf</pre></div><figure id="dcd3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lKTs_xCzuYgYUYIDI_Kwmg.gif"><figcaption>The final merged PDF.</figcaption></figure><h1 id="2ce5">6. Convert Jupyter Notebook to EPUB format</h1><p id="a117">We have each chapter in a separate Jupyter Notebook. Let’s merge the Notebooks with nbmerge tool. You can install it with pip: pip install nbmerge.</p><div id="0e77"><pre>nbmerge Chapter1<span class="hljs-selector-class">.ipynb</span> Chapter2<span class="hljs-selector-class">.ipynb</span> > Ebook.ipynb</pre></div><p id="ee6b">JupyterLab’s “export to HTML” command also exports CSS, which is great for PDF, but it is problematic for Ebooks as it is too complex. Jupyter comes with nbconvert tool that powers exporting to different formats. To export the Notebook to HTML without the CSS:</p><div id="f7d4"><pre>jupyter nbconvert <span class="hljs-attr">--to</span> <span class="hljs-selector-tag">html</span> Ebook<span class="hljs-selector-class">.ipynb</span> <span class="hljs-attr">--template</span>=basic</pre></div><p id="aa14">We need to install <a href="https://calibre-ebook.com/">Calibre</a>, to convert the HTML to EPUB. If you are an avid Ebook reader, I am sure you met Calibre before. Calibre is a cross-platform open-source suite for Ebook management.</p><p id="cb4a">Run the commands below to convert HTML to EPUB and AWZ3 (commands work on macOS):</p><div id="b6ac"><pre><span class="hljs-regexp">/Applications/</span>calibre.app<span class="hljs-regexp">/Contents/</span>MacOS/ebook-convert Ebook.html Ebook.epub <span class="hljs-regexp">/Applications/</span>calibre.app<span class="hljs-regexp">/Contents/</span>MacOS/ebook-convert Ebook.html Ebook.azw3</pre></div><figure id="de99"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2gwiZlDCIGxqwqtv7vtYxQ.gif"><figcaption>Reading EPUB Ebook in Apple iBooks.</figcaption></figure><h1 id="18d2">7. Let’s put everything together</h1><p id="2893">As software developers, we automate things. I wrote a bunch of commands that can be nicely packed into a bash script. So that whenever I make a change in Jupyter Notebooks, I can run the compile script that creates a new version of the Ebook:</p><div id="062d"><pre><span class="hljs-meta">#!/bin/bash</span></pre></div><div id="e7b8"><pre>nbmerge Chapter1<span class="hljs-selector-class">.ipynb</span> Chapter2<span class="hljs-selector-class">.ipynb</span> > Ebook.ipynb</pre></div><div id="4361"><pre>jupyter nbconvert <span class="hljs-attr">--to</span> <span class="hljs-selector-tag">html</span> Ebook<span class="hljs-selector-class">.ipynb</span> <span class="hljs-attr">--template</span>=basic</pre></div><div id="93f6"><pre><span class="hljs-regexp">/Applications/</span>calibre.app<span class="hljs-regexp">/Contents/</span>MacOS/ebook-convert Ebook.html Ebook.epub</pre></div><div id="f27b"><pre><span class="hljs-regexp">/Applications/</span>calibre.app<span class="hljs-regexp">/Contents/</span>MacOS/ebook-convert Ebook.html Ebook.azw3</pre></div><div id="60a5"><pre>jupyter nbconvert <span class="hljs-attr">--to</span> <span class="hljs-selector-tag">html</span> Chapter1<span class="hljs-selector-class">.ipynb</span></pre></div><div id="bce8"><pre>jupyter nbconvert <span class="hljs-attr">--to</span> <span class="hljs-selector-tag">html</span> Chapter2<span class="hljs-selector-class">.ipynb</span></pre></div><div id="73f6"><pre><span class="hljs-attribute">wkhtmltopdf</span> --enable-internal-links -L <span class="hljs-number">10</span>mm -R <span class="hljs-number">9</span>.<span class="hljs-number">5</span>mm -T <span class="hljs-number">10</span>mm -B <span class="hljs-number">9</span>.<span class="hljs-number">5</span>mm Chapter1.html Chapter1.pdf</pre></div><div id="30ef"><pre><span class="hljs-attribute">wkhtmltopdf</span> --enable-internal-links -L <span class="hljs-number">10</span>mm -R <span class="hljs-number">9</span>.<span class="hljs-number">5</span>mm -T <span class="hljs-number">10</span>mm -B <span class="hljs-number">9</span>.<span class="hljs-number">5</span>mm Chapter2.html Chapter2.pdf</pre></div><div id="2277"><pre>./cpdf Chapter1<span class="hljs-selector-class">.pdf</span> Chapter2<span class="hljs-selector-class">.pdf</span> -o Ebook.pdf</pre></div><p id="c8aa">I also created a Git repository with the commands mentioned above:</p><div id="90e8" class="link-block"> <a href="https://github.com/romanorac/jupyter-notebook-to-ebook"> <div> <div> <h2>romanorac/jupyter-notebook-to-ebook</h2> <div><h3>Compile script transforms Jupyter Notebook to a beautifully formated Ebook in PDF, EPUB and AWZ3 formats. The compile…</h3></div> <div><p>github.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*wVtXmCH4aLQwjegN)"></div> </div> </div> </a> </div><h1 id="11e1">Before you go</h1><p id="dea3">Follow me on <a href="https://twitter.com/romanorac">Twitter</a>, where I regularly <a href="https://twitter.com/romanorac/status/1328952374447267843">tweet</a> about Data Science and Machine Learning.</p><figure id="ab39"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*69rP1pwjJi9mLSFE"><figcaption>Photo by <a href="https://unsplash.com/@cmhedger?utm_source=medium&amp;utm_medium=referral">Courtney Hedger</a> on <a href="https://unsplash.com/?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure></article></body>

Transform Jupyter Notebook to an Ebook

Few tips to transform your Jupyter Notebook to a beautifully formatted Ebook in PDF, EPUB and AWZ3. Don’t spend hours researching as I did!

Photo by Kourosh Qaffari on Unsplash

A month ago, I’ve decided to start working on an online course focusing on the practical aspects of Data Science. Writing an Ebook in a Jupyter Notebook seemed a way to go because it offers a nice mixture of text, visualizations and code. While writing an Ebook is already a challenge by itself, I had a lot of problems when transforming a Notebook to an Ebook format. These tips are useful for technical writers and also for Data Scientists who wish to review a Notebook on their e-reader. Follow these tips so that you don’t spend hours researching as I did!

Here are a few links that might interest you:

- Labeling and Data Engineering for Conversational AI and Analytics
- Data Science for Business Leaders [Course]
- Intro to Machine Learning with PyTorch [Course]
- Become a Growth Product Manager [Course]
- Deep Learning (Adaptive Computation and ML series) [Ebook]
- Free skill tests for Data Scientists & Machine Learning Engineers

Some of the links above are affiliate links and if you go through them to make a purchase I’ll earn a commission. Keep in mind that I link courses because of their quality and not because of the commission I receive from your purchases.

1. Create a Github repository

I would suggest you start with a dedicated Github repository that will help you track versions of your Ebook. Github offers private repositories for free so there is no reason not to use it. JupyterLab also has an extension for Git that will help you track your changes between versions:

Make sure you add following to .gitignore so that you don’t accidentally push unnecessary files to your Git repository:

*.csv
.DS_Store
.ipynb_checkpoints/
*.mov
*.pdf
*.html
*.azw3
*.epub
cpdf

2. Organize each chapter in its own Notebook

Organizing content in Notebooks by chapters makes it easier to focus on a certain topic when writing. It creates fewer distractions and it also simplifies reviewing differences in version control.

There is another benefit to organizing an Ebook this way, which becomes apparent in the end. Each chapter in the Ebook should start on a new page, not in the middle of the previous page.

The second chapter starts in the middle of the page.

I’ve spent hours looking for a solution to add a page break between chapters when exporting a Jupyter Notebook to HTML or PDF with no luck.

Then I got an idea! Covert each chapter in a Jupyter Notebook to a PDF and then merge them together to a final PDF.

3. Don’t use Jupyter’s Export Notebook as PDF

JupyterLab has a neat “Export Notebook as PDF” feature, which seems like a time-saver at first. The JupyterLab’s export feature doesn’t take CSS into account, so pandas DataFrames get a plain format in the PDF.

Dataframe with many columns looks great in JupyterLab, but what happens when we export it to PDF?
When we export a Notebook to PDF, a DataFrame with many columns gets broken down over multiple pages.

I’ve tried to convince myself that “the export” doesn't look so bad, but I still wasn’t satisfied. After spending some time researching, I’ve tried to export the Notebook to HTML, opening it in Chrome browser, and using Chrome’s save to PDF feature.

Jupyter Notebook exported as HTML and saved to PDF with Chrome browser.

Aha! Dataframe formatting looks much nicer now.

4. Don’t manually save to PDF

Having multiple chapters and exporting each one to HTML and saving it to PDF is a cumbersome process. Can we automate it?

Chrome has a headless mode so we can save an HTML to PDF from a command line. This should work in theory, but it didn’t work for me.

chrome --headless --print-to-pdf=Chapter1.pdf --print-to-pdf-no-header Chapter1.html

Exported PDF with Chrome had headers and footers filled despite using — print-to-pdf-no-header flag.

After researching, I found wkhtmltopdf tool which renders HTML into PDF and various image formats using the Qt WebKit rendering engine. On a macOS you can install it with Brew. The tool enables us to customize the CSS and margins.

Chrome browser produces a nice PDF, so I took the CSS formatting from Chrome and saved it into a custom.css file, which wkhtmltopdf tool takes by default:

div#notebook {
    font-size: 18px;
    line-height: 26px;
}
img {
    max-width: 100% !important;
    page-break-inside: avoid;
}
tr, img {
    page-break-inside: avoid;
}
*, *:before, *:after {
    background: transparent !important;
    box-shadow: none !important;
    text-shadow: none !important;
}
p, h2, h3 {
    orphans: 3;
    widows: 3;
    page-break-inside: avoid;
}
*, *:before, *:after {
    page-break-inside: avoid;
    background: transparent !important;
    box-shadow: none !important;
    text-shadow: none !important;
}
*, *:before, *:after {
    page-break-inside: avoid;
    background: transparent !important;
    box-shadow: none !important;
    text-shadow: none !important;
}

The command for wkhtmltopdf that takes the HTML and outputs a PDF:

wkhtmltopdf — enable-internal-links -L 10mm -R 9.5mm -T 10mm -B 9.5mm Chapter1.html Chapter1.pdf

5. Merge PDFs

I use cpdf tool to merge PDFs to the final PDF. I downloaded the cpdf tool and put it in the folder with Jupyter Notebooks. To merge PDFs use the following command:

./cpdf Chapter1.pdf Chapter2.pdf -o Ebook.pdf
The final merged PDF.

6. Convert Jupyter Notebook to EPUB format

We have each chapter in a separate Jupyter Notebook. Let’s merge the Notebooks with nbmerge tool. You can install it with pip: pip install nbmerge.

nbmerge Chapter1.ipynb Chapter2.ipynb  > Ebook.ipynb

JupyterLab’s “export to HTML” command also exports CSS, which is great for PDF, but it is problematic for Ebooks as it is too complex. Jupyter comes with nbconvert tool that powers exporting to different formats. To export the Notebook to HTML without the CSS:

jupyter nbconvert --to html Ebook.ipynb --template=basic

We need to install Calibre, to convert the HTML to EPUB. If you are an avid Ebook reader, I am sure you met Calibre before. Calibre is a cross-platform open-source suite for Ebook management.

Run the commands below to convert HTML to EPUB and AWZ3 (commands work on macOS):

/Applications/calibre.app/Contents/MacOS/ebook-convert Ebook.html Ebook.epub
/Applications/calibre.app/Contents/MacOS/ebook-convert Ebook.html Ebook.azw3
Reading EPUB Ebook in Apple iBooks.

7. Let’s put everything together

As software developers, we automate things. I wrote a bunch of commands that can be nicely packed into a bash script. So that whenever I make a change in Jupyter Notebooks, I can run the compile script that creates a new version of the Ebook:

#!/bin/bash
nbmerge Chapter1.ipynb Chapter2.ipynb  > Ebook.ipynb
jupyter nbconvert --to html Ebook.ipynb --template=basic
/Applications/calibre.app/Contents/MacOS/ebook-convert Ebook.html Ebook.epub
/Applications/calibre.app/Contents/MacOS/ebook-convert Ebook.html Ebook.azw3
jupyter nbconvert --to html Chapter1.ipynb
jupyter nbconvert --to html Chapter2.ipynb
wkhtmltopdf --enable-internal-links -L 10mm -R 9.5mm -T 10mm -B 9.5mm Chapter1.html Chapter1.pdf
wkhtmltopdf --enable-internal-links -L 10mm -R 9.5mm -T 10mm -B 9.5mm Chapter2.html Chapter2.pdf
./cpdf Chapter1.pdf Chapter2.pdf -o Ebook.pdf

I also created a Git repository with the commands mentioned above:

Before you go

Follow me on Twitter, where I regularly tweet about Data Science and Machine Learning.

Photo by Courtney Hedger on Unsplash
Data
Data Science
Machine Learning
Programming
Jupyter Notebook
Recommended from ReadMedium