avatarAhmed Hashesh

Summary

The website content provides a comprehensive guide to generating PDFs in Python using various libraries such as FPDF, Reportlab, Pyppeteer, Python-Wkhtmltopdf, and Pdfkit, detailing their features, installation, and use cases.

Abstract

The article serves as a resource for Python developers looking to generate PDFs, emphasizing the versatility of Python in handling data and the importance of PDF as a cross-platform document format. It introduces several Python libraries, including FPDF for basic HTML to PDF conversion with HTMLMixin, Reportlab for complex PDF creation with support for dynamic web PDF generation, Pyppeteer as a Python port of the browser automation tool Puppeteer, Python-Wkhtmltopdf as a wrapper for the command-line tool wkhtmltopdf, and Pdfkit as a user-friendly wrapper for wkhtmltopdf. The article compares these libraries based on their capabilities, ease of use, and specific features such as graph rendering and browser compatibility, providing developers with insights to choose the most suitable tool for their PDF generation needs.

Opinions

  • The author suggests that FPDF is suitable for developers who need to convert basic HTML to PDF, especially when combined with HTMLMixin for better HTML feature support.
  • Reportlab is recommended for advanced PDF creation, offering a wide range of features including vector graphics and table support, though it is noted for its complexity and steep learning curve for beginners.
  • Pyppeteer is presented as a good choice for developers familiar with JavaScript's Puppeteer, offering similar functionality and better rendering capabilities, but with the limitation of requiring specific browsers like Chrome or Chromium.
  • Python-Wkhtmltopdf and Pdfkit are both considered effective for converting HTML URLs to PDF, with Pdfkit being highlighted for its ease of use and support for various features like vector graphics and PDF security options.
  • The article concludes with a recommendation for APITemplate.io as a comprehensive cloud-based solution for PDF generation, which provides predefined templates and compatibility with CSS, JavaScript, and Python.
  • The author's opinion on the best tool for converting HTML to PDF leans towards PDFKit due to its popularity, while Reportlab is favored for rendering PDFs with complex graphical elements like charts and images.

Python/TIPS

A guide to generating PDFs in Python

Get your job done easily with these tools.

Python became essential for everyday developer tasks; whether you work with Python, you will need to know how to code with Python.

Python is used for automation, testing, web development, and data analysis. On the other hand, HTML is the primary language of web development and web-based applications.

One of the superpowers of Python is to deal with data in any format and generate and convert data to any other format. PDF is one of the portable formats that can be used to view data across devices and platforms independent of the device and operating system.

In this article, We will talk about how to generate PDF using Python, and we will introduce multiple libraries like FPDF, Reportlab, Pyppeteer and Pdfkit and the difference between them.

Libraries

There are a lot of libraries on Python to deal with PDF; We will introduce some of the popular libraries that can be used easily to convert HTML files to PDF format.

1. FPDF

Free-PDF is a python library Ported from PHP to generate PDF. It provides various functionalities to generate PDFs, like generating PDFs from text files and writing your data formats.

While FPDF supports HTML, it only understands the basic functionalities and doesn’t understand CSS. That’s why you need to use HTMLMixin as it helps FPDF to understand the advanced features of the HTML. You can install FPDF with pip using the following command.

pip install fpdf==1.7

FPDF supports:

  • Page formatting
  • Images, links, colours
  • Automatic line and page breaks

A code example:

from fpdf import FPDF, HTMLMixin
# creating a class inherited from both FPDF and HTMLMixin
class MyFPDF(FPDF, HTMLMixin):
	pass
# instantiating the class
pdf = MyFPDF()
# adding a page
pdf.add_page()
# opening html file 
file = open("file.html", "r")
# extracting the data from hte file as a string
Data = file.read()
# HTMLMixin write_html method
pdf.write_html(data)
#saving the file as a pdf
pdf.output('Python_fpdf.pdf', 'F')

The previous example takes a file anime file.html and converts it into a PDF file name Python_fpdf.pdf with the help of the HTMLMixin library.

You can find more about FPDF here

2. Reportlab

Reportlab is a python library that helps you to create PDF.it has its opensource version and a commercial version, and the difference is that the commercial version supports a Report Markup Language (RML)both provide you with the following features:

  • Supports dynamic web PDF generation
  • Supports converting XML into PDF
  • Support vector graphics and inclusion of other PDF files
  • Support the creation of time charts and tables

you can install it using the following command:

pip install reportlab

Reportlab is a very complex tool with a lot of capability to create your format and style for PDF. The simplest example can be like the following:

from reportlab.pdfgen import canvas
c = canvas.Canvas("reportlab_pdf.pdf")
c.drawString(100,100,"Hello World")
c.showPage()
c.save()

You can find more info about reportlab here

3. Pyppeteer

We talked before about Puppeteer in Generate a PDF with JavaScript Article and how it is a tool to automate the browser. Pyppeteer is an unofficial port of the automation library provided by the chrome browser.

Main differences between Puppeteer and Pyppeteer

  • Pyppeteer accepts both the dictionary input parameters and keyword arguments
  • Python is not using $ in the method names
  • Page.evaluate() and Page.querySelectorEval() may fail and require you to add a “` force_expr=True“` option to force input strings as an expression

Install it using the following command:

pip install pyppeteer

A Code example:

import asyncio
from pyppeteer import launch
#defining an async method
async def main():
    # launching browser session
    browser = await launch( )
    # opening a new page
    page = await browser.newPage()
    # go to a specific address or file
    await page.goto(file: path\_to\_html_file.html')
    #create a screen shot from the page
    await page.screenshot({'path': 'sample.png'})
    # save the screenshot as a pdf
    await page.pdf({'path': 'pyppeteer_pdf.pdf'})
    #close the browser
    await browser.close()
# invocation of the Async main function
asyncio.get_event_loop().run_until_complete(main())

You can read more about Pyppeteer here

4. Python-Wkhtmltopdf

wkhtmltopdf is a widely used command-line tool used to generate PDF from HTML URLs; Python-Wkhtmltopdf is a wrapper for this command-line tool to be used in Python.you can install it using the following command

pip install py3-wkhtmltopdf==0.4.1

The usage is simple; you need to import the library and provide wkhtmltopdf API with the URL and the path for the output file.

from wkhtmltopdf import wkhtmltopdf
wkhtmltopdf(url='apitemplate.io', output_file='wkhtmltopdf.pdf')

You can find more information here

5. Pdfkit

A wrapper for the wkhtmltopdf makes it very easy to generate PDF from various formats like files, strings, and URLs.You can install it using the following command:

pip install pdfkit

Pdfkit supports features like:

  • Vector graphics
  • Text features like wrapping, aligning and bullet lists
  • PNG and JPEG Image embedding
  • Annotation features like Highlights and underlines
  • PDF security like encryption

An example of the generation of a PDF from a file is:

#importing pdfkit
import pdfkit
# calling the from file method to convert file to pdf
pdfkit.from_file('file.html', 'file.pdf')

It also supports generating pdfs from links by calling the from_url method.

pdfkit.from_url('https://apitemplate.io/',  python.pdf')

you can also specify the setting of the page and font like the following:

options = {
    'page-size': 'A4',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    'encoding': "UTF-8",
    'custom-header': [
        ('Accept-Encoding', 'gzip')
    ],
    'cookie': [
        ('cookie-empty-value', '""')
        ('cookie-name1', 'cookie-value1'),
        ('cookie-name2', 'cookie-value2'),
    ],
    'no-outline': None
}
pdfkit.from_file('file.html', 'file.pdf', options=options)

You can learn more about pdfkit from here

Comparison

So we have a lot of options to choose from. The only question remains which one is more suitable for me. I would say it depends on your application and what you actually need to do. For example, if you want to build a PDF from scratch, or you just want to convert HTML into a PDF, or you want to fill a particular template and convert it into a specific format.

So if you want to convert HTML into PDF, I believe PDFKit, FPDF, and Wkhtmltopdf are the best options you have. But PDFkit is the more popular one of them. On the other hand, if you want to render PDFs, your options are Pyppeteer and Reportlab.

Reportlab advantage is that it supports a wide variety of graphs like line plots and bar charts and can embed images. On the other hand, it doesn’t provide a method for creating a footer and footnotes and can embed only JPEG images, but with the right python extension, you can extend this to 30 more formats. Reportlab is also more difficult for beginner users and more comprehensive.

On the other hand, Pyppeteer provides better rendering and is easier if you are familiar with its javascript version but only supports specific browsers like chrome and chromium that must be available on your machine to work with this tool.

Conclusion

This article talked about five of the most popular python libraries for generating PDFs.

We briefly introduced some of the tools/ libraries like FPDF, wkHTMLToPdf, Pyppeteer, ReportLab, and PDFKit. We also compared them in different properties like complexity, size of generated files, resolution, and Features. Finally, if you want to have a tool with all the features of these libraries and more, in that case, I recommend that you check out APITemplate.io, which is a tool that can help you generate PDF quickly with PDF generating API over the cloud and is compatible with CSS, JavaScript, and Python. It also comes with predefined templates which you can reuse and edit.

This article is part of a series of articles on how to generate PDFs using different programming languages:

You can also find out more about Best PDF Generation Solutions.

Subscribe to get notified when a new article is published.

Python
Tips
Programming
Tutorial
Web Development
Recommended from ReadMedium