avatarWalid Amamou

Summary

The web content provides a comprehensive guide on automating data extraction from bank statements using custom AI models and pre-trained table extraction APIs, integrated into a workflow via the Kudra platform for efficient financial data processing.

Abstract

The article outlines a method for streamlining accounting processes by automating the extraction of data from bank statements. It emphasizes the importance of this automation in the face of increasing data volumes and the inefficiency of manual data entry. The process involves using custom-trained AI models for extracting unstructured data and pre-trained APIs like Microsoft Azure or AWS for tabular data extraction. The guide highlights the use of UBIAI's tools for model training and Kudra for creating custom workflows, which can be adapted for various financial documents. The tutorial concludes with the benefits of such automation for financial institutions and invites readers to schedule a demo or try a recommended AI service.

Opinions

  • The article conveys that manual data entry is becoming increasingly impractical due to the growing volume of data.
  • It suggests that using pre-trained tabular extraction APIs is more efficient for extracting organized data from bank statements.
  • The author believes that UBIAI's Annotation Tool simplifies the AI model training process, requiring as few as five labeled documents.
  • The article posits that Kudra's modular custom workflow enables users to easily combine different data processing modules, including custom NLP models and table extraction, to create tailored solutions.
  • The author recommends trying out the AI service they suggest, which is presented as a cost-effective alternative to ChatGPT Plus (GPT-4).

How to Automate Data Extraction from Bank Statements

Using custom trained AI model

Image by Racool_studio on Freepik

In the world of accounting, document extraction from bank statements is an important task that ensures efficiency and accuracy in financial transactions. This is particularly important in an era where data is growing at an unprecedented rate and manual data entry is becoming increasingly inefficient.

In this tutorial we are going to learn how to automate the data extraction process from bank statements using custom trained AI models and automated table extraction.

Table Extraction

Bank statements are generally organized in a tabular format containing the financial transactions in a table along with unstructured text such as the address, bank name, statement period located at the beginning of the statement.

Bank statement example

An NLP model can be trained to automatically recognize and extract specific types of information from unstructured document such as amounts, dates, statement period and so on. However, it is not the most efficient use of time to train it on extracting organized tabular data. For this purpose, it is more efficient to use pre-trained tabular extraction APIs such as Microsoft Azure or AWS since they have been trained on millions of examples.

Below is an example of automated table extraction using UBIAI based on Microsoft azure API:

UBIAI’s table extraction

AI Model Training

Now that we are able to reliably extract the tables, we can train our AI model to extract the relevant information located at the top of the statement. Using UBIAI Annotation Tool, this can be done quite easily by labeling just 5 documents to train the AI model.

UBIAI OCR Labeling Interface

To train the model, simply click on Models menu in UBIAI, select the project you have labeled and press “Train”, no coding is required!

Model training dashboard in UBIAI

Custom Workflow Creation

Once the model is trained, we are now ready to combine table extraction and our custom trained model into one workflow that automatically extracts the relevant information from our bank statements.

To do so, we will use the Kudra to deploy our model and create custom workflows with just few clicks. Users can combine different modules such as image processing, OCR, and custom NLP models, table extraction, LLMs and more, to create a tailored solution for their specific use case. For more in-depth information, please read this introductory article.

For this tutorial we are going to use the following workflow to achieve our goal:

Workflow building interface
  1. The first part of the workflow is document import. To do so, we simply drag-and-drop the PDF and Photo modules into the builder canva.
  2. Once the data is imported, we add the OCR module and connect the output of the data importers to the input of OCR module in order to parse the data from the PDF and images.
  3. Next we add two modules: Form Recognizer to import our custom trained AI model and Extract Tables module to read the tables.
  4. And that’s it, we can finally send the data to the export module.

Combining our custom trained AI model with other data processing modules can be done extremely easy using Kudra’s modular custom workflow.

Now that the workflow has been created, let’s run it on a new bank statements.

Bank Statement Processing

After the documents have been processed, we can now review and correct the output before exporting the data out. Below is Kudra’s review dashboard. Each module output can be visualized and reviewed.

Review dashboard

The AI extraction is shown on the right panel containing the entities Bank Name, Account Number, Name and Address which have been extracted correctly using our custom AI model.

We can also see the extracted tables:

Table extraction

Once the data is reviewed and corrected, we are now ready to export in a csv file, here is the output:

Extracted entities
Extracted tables

Conclusion:

The ability to create custom workflows in Kudra means that the solution can be easily adapted to different types of bank statements and other financial documents. This flexibility makes the solution particularly valuable for financial institutions that handle a variety of financial documents on a regular basis.

If you are looking to automate data extraction from bank statement, please schedule a demo today!

Artificial Intelligence
Machine Learning
Naturallanguageprocessing
Bank Statement
Data Extraction
Recommended from ReadMedium