How to Automate Data Extraction from Bank Statements
Using custom trained AI model

In the world of accounting, document extraction from bank statements is an important task that ensures efficiency and accuracy in financial transactions. This is particularly important in an era where data is growing at an unprecedented rate and manual data entry is becoming increasingly inefficient.
In this tutorial we are going to learn how to automate the data extraction process from bank statements using custom trained AI models and automated table extraction.
Table Extraction
Bank statements are generally organized in a tabular format containing the financial transactions in a table along with unstructured text such as the address, bank name, statement period located at the beginning of the statement.

An NLP model can be trained to automatically recognize and extract specific types of information from unstructured document such as amounts, dates, statement period and so on. However, it is not the most efficient use of time to train it on extracting organized tabular data. For this purpose, it is more efficient to use pre-trained tabular extraction APIs such as Microsoft Azure or AWS since they have been trained on millions of examples.
Below is an example of automated table extraction using UBIAI based on Microsoft azure API:

AI Model Training
Now that we are able to reliably extract the tables, we can train our AI model to extract the relevant information located at the top of the statement. Using UBIAI Annotation Tool, this can be done quite easily by labeling just 5 documents to train the AI model.

To train the model, simply click on Models menu in UBIAI, select the project you have labeled and press “Train”, no coding is required!

Custom Workflow Creation
Once the model is trained, we are now ready to combine table extraction and our custom trained model into one workflow that automatically extracts the relevant information from our bank statements.
To do so, we will use the Kudra to deploy our model and create custom workflows with just few clicks. Users can combine different modules such as image processing, OCR, and custom NLP models, table extraction, LLMs and more, to create a tailored solution for their specific use case. For more in-depth information, please read this introductory article.
For this tutorial we are going to use the following workflow to achieve our goal:

- The first part of the workflow is document import. To do so, we simply drag-and-drop the PDF and Photo modules into the builder canva.
- Once the data is imported, we add the OCR module and connect the output of the data importers to the input of OCR module in order to parse the data from the PDF and images.
- Next we add two modules: Form Recognizer to import our custom trained AI model and Extract Tables module to read the tables.
- And that’s it, we can finally send the data to the export module.
Combining our custom trained AI model with other data processing modules can be done extremely easy using Kudra’s modular custom workflow.
Now that the workflow has been created, let’s run it on a new bank statements.
Bank Statement Processing
After the documents have been processed, we can now review and correct the output before exporting the data out. Below is Kudra’s review dashboard. Each module output can be visualized and reviewed.

The AI extraction is shown on the right panel containing the entities Bank Name, Account Number, Name and Address which have been extracted correctly using our custom AI model.
We can also see the extracted tables:

Once the data is reviewed and corrected, we are now ready to export in a csv file, here is the output:


Conclusion:
The ability to create custom workflows in Kudra means that the solution can be easily adapted to different types of bank statements and other financial documents. This flexibility makes the solution particularly valuable for financial institutions that handle a variety of financial documents on a regular basis.
If you are looking to automate data extraction from bank statement, please schedule a demo today!






