The web content describes a step-by-step guide to creating a document scanner from scratch using Python and the OpenCV library, focusing on converting images to a scanned look and automatically rotating documents.
Abstract
The article titled "Document Scanner from Scratch with Python" provides a comprehensive tutorial on building a document scanning application using the OpenCV library. The author, motivated by the increased need for digital document processing, outlines the project's goals, which include transforming documents into a presentable scan-like format and ensuring proper rotation for a professional appearance. The tutorial begins with an introduction to the necessary Python libraries and progresses through the development of methods for converting images to black and white and rotating them to a top-down view using adaptive thresholding and Hough Transform techniques. The author emphasizes the importance of simplicity and effectiveness in the chosen methods and provides code snippets and examples to illustrate the process. The article concludes with a mention of future improvements, such as using Principal Component Analysis for rotation, and invites readers to explore additional resources and the author's other projects.
Opinions
The author believes that the transition to online work has increased the need for digital document scanning solutions.
The author suggests that scanned documents should have a black and white color scheme and proper rotation to be considered well-digitized.
Initially considering Principal Component Analysis for document rotation, the author opts for a simpler yet effective method using Hough Transform, indicating a preference for straightforward solutions.
The author expresses satisfaction with the crisp appearance of the scanned images produced by the application.
The author acknowledges the limitations of the current rotation algorithm and suggests potential improvements, such as handling empty black patches and exploring better black and white transformation methods.
The author encourages reader interaction by inviting comments, suggestions, and collaboration through LinkedIn and GitHub.
The author promotes their other work and a brief tutorial on Python, as well as an e-book on prompt engineering, indicating a commitment to continuous learning and sharing of knowledge.
The author advocates for supporting writers on Medium by becoming a member, highlighting the value of the platform's content.
Document Scanner from Scratch with Python.
Create your own document scanner using OpenCV. Kick start your computer vision journey with this mini-project.
OpenCV logo on top of a real scanner. Made by Author. Original photo by Mahrous Houses on Unsplash.
Intro
My motivation for this project is pretty simple. Many of us moved to online work.
With the increased online workload, one often has to present a digitalized version of a document via email or other means. In other words, transform any document into a presentable scan-like look.
In this article, I will describe how anybody can create a document scanner from scratch using Python. Precisely, OpenCV library for image/video processing.
In the previous article, I build a foundation for this and future OpenCV real-life applications. In case you are interested learning more about the possibilities provided by OpenCV, here is a link
Before jumping into the coding part, we need to understand what we are going to do.
Here is a sequence of questions I asked myself before starting this project.
What are we trying to build here?
— A document scanner.
Good. But what does or should it do?
— To scan documents, obviously.
Right. Then, how should the scanned documents look like?
— Good question, right?
In my view, scanned documents should have two features:
To look like the scanned documents, in Black and White (B&W) color;
To be properly rotated (no random angles).
Let us keep things simple and build up the complexity as needed.
Coding Document Scanner
Let us import all the needed libraries for this project at first (we might add something as needed later)
I. First Property: Scanned (B&W) View
Let us start with the first property of our scanner — producing scanned images!
In this example, I am using a photo from the book "21 Lessons for the 21st Century" by Yuval Noah Harari.
A photo from the book “21 Lessons for the 21st Century” by Yuval Noah Harari. Made by Author.
A Side Note: it is a great book along with the other two books from this series ("Sapiens: A Brief History of Humankind" and "Homo Deus: A Brief History of Tomorrow"). Suggested to read!
Coming back to our document scanner, we want to make this image sharp and crisp looking by changing the color scheme.
Let us call this operation a Scan_view(). To make this a full application project, let us create a class called Scanner where Scan_view() will be its method.
A quick explanation of the code:
I want to create a scan object that has an image as its property. Hence, I use self.img = img in the __init__() method;
Also, I would like to have a method responsible for changing this property (i.e., an image/document) — change the color scheme, rotate, crop, resize, etc.
So the Scan_View() performs an operation on its class property (i.e., on itself or self).
The meat of this method is hidden in the operation threshold_local.
Which is basically an operation that computes a threshold mask based on the local neighborhood of the pixel.
This is also known as adaptive thresholding. The threshold value is the weighted mean for the local neighborhood of a pixel subtracted by a constant.
After finding a threshold mask, we simply select the foreground pixel values as image>threshold.
We get to save the new crisp and fresh image and return it just in case for further processing.
To run the code, we can simply create a scan object and give it a document/photo as its property as following:
As a result, we get this image:
A scanned version of the above-mentioned document. A photo from the book “21 Lessons for the 21st Century” by Yuval Noah Harari. Made by Author.
Now, that looks crisp!
Let us move on to the next part of the project.
II. Second Property: Document Rotation
Let us continue with the second property of our scanner — document rotation!
Suppose, we took a photo of a book with a random angle. Would not it be nice to automatically rotate it for a top-down face-on view?
Of course, it would!
The question is how to do it?
Initially, I was thinking about using a Principal Component Analysis (PCA) to determine the document orientation. However, it seemed a bit too much for this project.
We want something simple but effective.
Something that will automatically determine a rotation angle between the lines of text/borders and the horizontal line.
Hence, I came up with a simpler method that basically utilizes the Hough Transform.
In short, the Hough Transform is a technique used to detect various shapes. In our case, this will be a set of lines along the lines of text!
Great idea, right?
But in order to make this method robust, we need to make sure to detect the correct orientation. Some of the lines could come along the text but others along the book/document edges — we do not want that.
So, we want to average out those variations. In other words, to find a median value of all the line angles.
So, we define our new method of the scan object (Rotation()) as follows:
As mentioned above, the meat of this part of the application is finding the correct Hough Lines (HoughLinesP() method, where P stands for Probability, see references to learn more about this method)
Them, we get the median value of the angles from each line and use it for document rotation (ndimage.rotate() method).
We can perform this operation on our crisp B&W image as follows:
As a result, we get a rotated image:
A scanned and rotated version of the above-mentioned document. A photo from the book “21 Lessons for the 21st Century” by Yuval Noah Harari. Made by Author.
Wonderful scanned, crisp, and rotated image!
Full Code
Finally, let us combine everything in one class!
Due to the space considerations, I would skip this step.
For the interested readers, here is a GitHub repository with more details and documentation of each method.
Summary
In this article, we have learned how to build a working prototype of the Document Scanner from scratch using a famous Python library for image/video processing, OpenCV.
I would call this application a document scanner (v.1)because there are a few things that could further improve it.
For example, using another (or improved) algorithm for document rotation. Due to its nature, this algorithm may not always provide a perfect top view in 100% of cases.
One alternative might be of using a Principal Component Analysis* (or PCA) to determine the more precise orientation of the document. But this is outside of the scope of this article.
Regarding the rotation itself, I do not like those empty black patches on the rotated image. This could be also addressed to improve this app. Probably, to make them empty (NaN) values.
Another operation that may need a closer inspection is the B&W color transformation. I think that an adaptive threshold is a good option, however, there might be a better way of doing it.
Last but not least, if you have any comments, suggestions, or found any mistake, please either leave a comment or contact me via LinkedIn. I will be happy to hear from you.
I hope you have enjoyed my article and learned something new. Thanks for reading until the end.
Want to learn more?
If you are interested in brushing up your Python knowledge, here is a brief tutorial: from Hello Wolrd to Functions.
Additionally, I would like to share some of my earlier projects. Each of them is dedicated to solving a particular problem. It is a good opportunity to challenge oneself and polish some of the Python skills:
Are you curious about the emerging field of Prompt Engineering? Grab my new e-book! You will learn and master everything from fundamental concepts to practical tips and real-world applications. Additionally, you will receive a bonus of 300 prompts and some of the free resources to kick-start your AI-driven journey. With all this value packed into one e-book, what is the price? The cost of a cup of coffee! Do not miss out on this opportunity to take your skills to the next level!
If you enjoyed this article, consider subscribing. You will get access to unlimited content that will teach you how to code, automate, and build wealth through investing and entrepreneurship — for less than the cost of a cup of coffee!
Documentation of the threshold_local method (link)
P.S.: If you like this uninterrupted reading experience on this beautiful platform, Medium.com, consider supporting the writers of this community by signing up for a membership, HERE. It only costs $5 per month and supports all the writers.