Document Scanner from Scratch with Python.

Create your own document scanner using OpenCV. Kick start your computer vision journey with this mini-project.

OpenCV logo on top of a real scanner. Made by Author. Original photo by Mahrous Houses on Unsplash.

Intro

My motivation for this project is pretty simple. Many of us moved to online work.

With the increased online workload, one often has to present a digitalized version of a document via email or other means. In other words, transform any document into a presentable scan-like look.

In this article, I will describe how anybody can create a document scanner from scratch using Python. Precisely, OpenCV library for image/video processing.

In the previous article, I build a foundation for this and future OpenCV real-life applications. In case you are interested learning more about the possibilities provided by OpenCV, here is a link

9 OpenCV Essentials During COVID-19.

Our first step towards the computer vision projects. An introduction to the possibilities with OpenCV image and video…

medium.datadriveninvestor.com

Without further ado, let us begin.

Document Scanner

Before jumping into the coding part, we need to understand what we are going to do.

Here is a sequence of questions I asked myself before starting this project.

What are we trying to build here?

— A document scanner.

Good. But what does or should it do?

— To scan documents, obviously.

Right. Then, how should the scanned documents look like?

— Good question, right?

In my view, scanned documents should have two features:

To look like the scanned documents, in Black and White (B&W) color;
To be properly rotated (no random angles).

Let us keep things simple and build up the complexity as needed.

Coding Document Scanner

Let us import all the needed libraries for this project at first (we might add something as needed later)

I. First Property: Scanned (B&W) View

Let us start with the first property of our scanner — producing scanned images!

In this example, I am using a photo from the book "21 Lessons for the 21st Century" by Yuval Noah Harari.

A photo from the book *“21 Lessons for the 21st Century”* by Yuval Noah Harari. Made by Author.

A Side Note: it is a great book along with the other two books from this series ("Sapiens: A Brief History of Humankind" and "Homo Deus: A Brief History of Tomorrow"). Suggested to read!

Coming back to our document scanner, we want to make this image sharp and crisp looking by changing the color scheme.

Let us call this operation a Scan_view(). To make this a full application project, let us create a class called Scanner where Scan_view() will be its method.

A quick explanation of the code:

I want to create a scan object that has an image as its property. Hence, I use self.img = img in the __init__() method;

Also, I would like to have a method responsible for changing this property (i.e., an image/document) — change the color scheme, rotate, crop, resize, etc.

So the Scan_View() performs an operation on its class property (i.e., on itself or self).

The meat of this method is hidden in the operation threshold_local.

Which is basically an operation that computes a threshold mask based on the local neighborhood of the pixel.

This is also known as adaptive thresholding. The threshold value is the weighted mean for the local neighborhood of a pixel subtracted by a constant.

After finding a threshold mask, we simply select the foreground pixel values as image>threshold.

We get to save the new crisp and fresh image and return it just in case for further processing.

To run the code, we can simply create a scan object and give it a document/photo as its property as following:

As a result, we get this image:

A scanned version of the above-mentioned document. A photo from the book *“21 Lessons for the 21st Century”* by Yuval Noah Harari. Made by Author.

Now, that looks crisp!

Let us move on to the next part of the project.

II. Second Property: Document Rotation

Let us continue with the second property of our scanner — document rotation!

Suppose, we took a photo of a book with a random angle. Would not it be nice to automatically rotate it for a top-down face-on view?

Of course, it would!

The question is how to do it?

Initially, I was thinking about using a Principal Component Analysis (PCA) to determine the document orientation. However, it seemed a bit too much for this project.

We want something simple but effective.

Something that will automatically determine a rotation angle between the lines of text/borders and the horizontal line.

Hence, I came up with a simpler method that basically utilizes the Hough Transform.

In short, the Hough Transform is a technique used to detect various shapes. In our case, this will be a set of lines along the lines of text!

Great idea, right?

But in order to make this method robust, we need to make sure to detect the correct orientation. Some of the lines could come along the text but others along the book/document edges — we do not want that.

So, we want to average out those variations. In other words, to find a median value of all the line angles.

So, we define our new method of the scan object (Rotation()) as follows:

As mentioned above, the meat of this part of the application is finding the correct Hough Lines (HoughLinesP() method, where P stands for Probability, see references to learn more about this method)

Them, we get the median value of the angles from each line and use it for document rotation (ndimage.rotate() method).

We can perform this operation on our crisp B&W image as follows:

As a result, we get a rotated image:

A scanned and rotated version of the above-mentioned document. A photo from the book *“21 Lessons for the 21st Century”* by Yuval Noah Harari. Made by Author.

Wonderful scanned, crisp, and rotated image!

Full Code

Finally, let us combine everything in one class!

Due to the space considerations, I would skip this step.

For the interested readers, here is a GitHub repository with more details and documentation of each method.

Summary

In this article, we have learned how to build a working prototype of the Document Scanner from scratch using a famous Python library for image/video processing, OpenCV.

This project is based on my earlier article: 9 OpenCV Essentials During COVID-19.

Future Developments

I would call this application a document scanner (v.1) because there are a few things that could further improve it.

For example, using another (or improved) algorithm for document rotation. Due to its nature, this algorithm may not always provide a perfect top view in 100% of cases. One alternative might be of using a Principal Component Analysis* (or PCA) to determine the more precise orientation of the document. But this is outside of the scope of this article.
Regarding the rotation itself, I do not like those empty black patches on the rotated image. This could be also addressed to improve this app. Probably, to make them empty (NaN) values.
Another operation that may need a closer inspection is the B&W color transformation. I think that an adaptive threshold is a good option, however, there might be a better way of doing it.

Last but not least, if you have any comments, suggestions, or found any mistake, please either leave a comment or contact me via LinkedIn. I will be happy to hear from you.

I hope you have enjoyed my article and learned something new. Thanks for reading until the end.

Want to learn more?

If you are interested in brushing up your Python knowledge, here is a brief tutorial: from Hello Wolrd to Functions.

Additionally, I would like to share some of my earlier projects. Each of them is dedicated to solving a particular problem. It is a good opportunity to challenge oneself and polish some of the Python skills:

tracking your weight from scratch in Python;
analyzing COVID-19 scientific papers;
creating a productivity app (Pomodoro) from scratch.

Are you curious about the emerging field of Prompt Engineering? Grab my new e-book! You will learn and master everything from fundamental concepts to practical tips and real-world applications. Additionally, you will receive a bonus of 300 prompts and some of the free resources to kick-start your AI-driven journey. With all this value packed into one e-book, what is the price? The cost of a cup of coffee! Do not miss out on this opportunity to take your skills to the next level!

Prompt Engineering, 300 Prompts, & Free AI Resources

Welcome to this e-book on prompt engineering — a rapidly growing field in artificial intelligence. This comprehensive…

ruslanbrilenkov.gumroad.com

Contact

Never miss a story, join my mailing list!

If you enjoyed this article, consider subscribing. You will get access to unlimited content that will teach you how to code, automate, and build wealth through investing and entrepreneurship — for less than the cost of a cup of coffee!

Join Medium with my referral link — Ruslan Brilenkov, PhD(c)

Read every story from Ruslan Brilenkov, PhD(c) (and thousands of other writers on Medium). Your membership fee directly…

ruslan-brilenkov.medium.com

Lastly, let us connect. My LinkedIn and GitHub profiles.

References

Hough Line Transform Tutorial from OpenCV (link)

Documentation of the threshold_local method (link)

P.S.: If you like this uninterrupted reading experience on this beautiful platform, Medium.com, consider supporting the writers of this community by signing up for a membership, HERE. It only costs $5 per month and supports all the writers.