avatarWalid Amamou

Summary

The article discusses the use of AI, specifically Named Entity Recognition (NER) models and generative models like ChatGPT, to analyze complex financial documents such as 10-K reports efficiently and at scale.

Abstract

The article outlines a method for analyzing 10-K reports using advanced AI techniques. It emphasizes the inefficiency of manual analysis due to the volume and complexity of these reports. The solution proposed involves leveraging custom-trained NER models to identify key information, such as risk factors, and generative AI models to interpret financial tables and provide insights. The workflow for this AI-driven analysis is facilitated by a no-code platform called Kudr.ai, which simplifies the process of building and deploying AI services. The platform allows for the extraction and analysis of critical data from 10-K reports, such as financial statements and risk factors, providing a scalable and cost-effective approach for financial analysis.

Opinions

  • The traditional method of manually reading and analyzing 10-K reports is deemed unscalable and inefficient.
  • AI models, particularly those capable of natural language processing, are considered essential tools for modern financial analysis.
  • Training specialized NER models for specific tasks, such as risk factor identification, is seen as more scalable and cost-effective than relying on generic models.
  • The use of a no-code platform like Kudr.ai is advocated for its ability to democratize access to AI capabilities for document analysis.
  • There is a recognition of the limitations of AI, with a reminder to verify AI-generated data against actual financial statements to avoid inaccuracies.
  • The article suggests that AI should complement human expertise rather than replace it, highlighting the importance of human oversight in financial analysis.

How to Analyze 10-K reports using AI

with NER and generative models

Source: NicoElNino/Shutterstock

10-K reports are complex and critical documents, often laden with structured financial tables and unstructured raw texts that shed light on the financial health of the company and its future projects. Two main sections worth examining are the Risk Factors and Financial Statements. While the traditional method of manually reading and analyzing 10-K reports is common among stock analysts, it becomes quickly unscalable due to the sheer number of reports to analyze from thousands of public companies. Furthermore, a simple Ctrl + F keyword search is not sufficient since it may miss new risks not previously known.

Fortunately, with the emergence of advanced deep learning models and Generative AI models, such as OpenAI’s ChatGPT, we can now scour thousands of tables and free-form texts to search and identify critical information automatically and at scale.

In this article, we will delve into analyzing 10-K reports using custom-trained Named Entity Recognition (NER) models combined with generative AI models like ChatGPT. This approach demystifies the jargon-filled world of these financial documents and extracts relevant information, uncovering hidden insights within them.

AI Workflow Building

10-K reports are complex documents that usually contain dozens of financial tables, showcasing the company’s performance over the past years in terms of cash flow, income, revenue, operational costs, interest, etc. They also contain free-form, unstructured text with a wealth of information about current and future risks the company faces. For example, a 10-K report includes financial tables and free-form text.

Example of financial table and free form text in 10-K reports

Instead of manually reading hundreds of pages, we aim to leverage AI capabilities to do the work for us. Below are the main sections we plan to extract and analyze using AI:

  • Risk factors, such as regulatory and competitive risks, as well as new legal risks.
  • Income Statement, Balance Sheet, and Cash Flow tables to assess the financial health of the company.

To perform these tasks, we need to set up an AI workflow containing multiple AI microservices, each focused on solving a specific task. One way to do this is to use an open-source library such as Langchain to create our workflow, but this requires developing coding scripts, debugging, and testing, which can take hours or days and might not be scalable in the end.

Fortunately, we have an alternative. We are going to use Kudr.ai, a no-code document AI platform, designed to streamline AI workflow building with just a few clicks. Here are the components of the workflow we are going to build using Kudr.ai:

  • Financial statement analyzer: This task requires parsing financial tables from the 10-K PDFs and feeding them to a Large Language Model (LLM), such as GPT, for analysis. To do so, we are going to create a workflow in Kudr.ai’s workflow builder (a no-code, drag-and-drop interface to assemble AI services). First, we process the PDF through an OCR (Optical Character Recognition) engine to parse the text. Next, we feed the parsed text to the next microservice called “Extract Tables,” which extracts all the tables from the PDF document. Finally, once the tables are extracted, we feed them to the next microservice, “ChatGPT,” for analysis. Below is the ChatGPT prompt used in Kudr.ai to initialize it:
ChatGPT prompt:In the prompt, the variable [[input_extracted_table]] takes the output from our table parser called “Extract Tables” and concatenate it under the prompt. Note that we are specifically asking the LLM to write an analysis of the tables and provide a conclusion since we don’t want just to read the tables.
  • Risk Factor Identification: In this task, we are going to analyze Item 1A from the 10-K report which usually contains all the risk factors that the company is facing. To do so, we are going to use a Named Entity Recognition (NER) model trained on company risk factors such as Regulation and Laws risk, Technology risk, Operational Risk, Macroeconomic risks, Financial risks, etc using the ubiai platform. Although we can use Kudra’s Generative AI capability to extract the entities using zero-shot technique, we have decided to train our own smaller special NER model because it is more scalable and cheaper in the long run.

10-K Analysis

Project creation

We are now ready to run our analysis of 10-k reports using the workflow we have created. To get started, create a new project in kudra.ai and select the custom workflow we have created called “10-K Analysis”:

Project creation dashboard in kudra.ai

Next, we upload the PDF that contains item 1A (risk factors) and the financial tables:

Document Upload Interface

Item 1A Analysis

Let’s first start by analyzing the risk factor Item 1A section using our special NER model. As shown below, the model was able to extract critical information from the risk factor section such Macroeconomic risks, regulations and laws risks, financial risks and more.

In the example, COVID-1 Pandemic was identified as a macro-economic risk that we should look at. By clicking on the entity, we are redirected to the paragraph that discusses it, here is a snippet:

“For example, pandemic-related issues have exacerbated port congestion and intermittent supplier shutdowns and delays, resulting in additional expenses to expedite delivery of critical parts. Similarly, increased demand for personal electronics has created a shortfall of semiconductors, which has caused challenges in our supply chain and production”

Due to the pandemic, intermittent supplier shutdowns and delays caused additional expenses and delivery of critical parts. In addition, increased demand of personal electronics has created a shortfall of semiconductors.

Let’s look at the financial risks such as cashflow:

It mentions that Tesla may not continue to generate cashflow from operations in the future to satisfy their obligations under their existing indebtedness. This is worth looking at closer.

There are also regulation and laws risks to consider such as the temporary shutdown of Tesla’s Gigafactory in Shanghai.

Interestingly, looking at the PERSON entities list, we find a mention of Elon Musk within the following context:

The dependency on Musk’s services is seen as a risk since he is not devoted to Tesla.

Now, let’s go one step further and ask chatGPT which risk factors have the most impact on Tesla’s future performance as a company. This can be simply done by feeding the chatGPT service the list of the risk factors and ask it to analyze it.

Below are the results of the analysis:

1. “COVID-19 pandemic”: The ongoing COVID-19 pandemic has affected global economies and disrupted supply chains. This risk factor can adversely affect Tesla’s production and demand.

2. “increased demand” and “production delays”: These risks pertain to Tesla’s ability to meet the increased demand for its vehicles and potential delays in production. If Tesla fails to scale up its production efficiently or faces significant delays, it may impact its market share and profitability.

3. “Government regulations”: Regulatory changes and compliance requirements imposed by governments can pose challenges to Tesla’s operations. This may include environmental regulations, vehicle safety standards, or import/export restrictions.

4. “semiconductors”: The global semiconductor shortage has affected various industries, including automotive. Tesla’s reliance on semiconductors for its vehicles can lead to production disruptions and potential revenue loss

The risk factors analyzed by GPT have been ranked in order of importance.

Financial Statement Analysis

For the financial table analysis, we are going to upload the raw 10-K PDF and run the workflow (table extraction + GPT) that we have built. Here are the results:

Income Statement: The company experienced significant growth in its revenues in 2022, with total revenues reaching $81,462 compared to $53,823 in 2021 and $31,536 in 2020. This growth can be attributed to increases in automotive sales, automotive leasing, and energy generation and storage. The cost of revenues also increased, but at a lower rate, resulting in a gross profit of $20,853 in 2022 compared to $13,606 in 2021 and $6,630 in 2020. Operating expenses remained relatively stable, leading to an income from operations of $13,656 in 2022, a significant increase from $6,523 in 2021 and $1,994 in 2020. Net income also increased significantly, reaching $12,587 in 2022 compared to $5,644 in 2021 and $862 in 2020.

Balance Sheet: The company’s total assets increased from $62,131 in 2021 to $82,338 in 2022. This increase was mainly driven by an increase in current assets, operating lease vehicles, and property, plant, and equipment. The company also saw an increase in stockholders’ equity, which reached $44,704 in 2022 compared to $30,189 in 2021. This increase can be attributed to additional paid-in capital and retained earnings. Overall, the company’s balance sheet shows positive growth and a strong financial position.

Cash Flow Statement: The company generated a net cash provided by operating activities of $14,724 in 2022, an increase from $11,497 in 2021 and $5,943 in 2020. This increase can be attributed to higher net income and improvements in working capital management. The company also invested in property and equipment, solar energy systems, and digital assets, which resulted in a net cash used in investing activities of $11,973 in 2022. The company’s financing activities included proceeds from issuances of common stock and debt and payments for buy-outs of noncontrolling interests. Overall, the company’s cash flow statement indicates positive cash flow and a strong ability to generate cash from its operations.

Conclusion: Based on the financial statements, the company shows strong financial health and growth. The significant increase in revenues, gross profit, and net income indicates successful business operations. The balance sheet demonstrates positive growth in total assets and stockholders’ equity. Additionally, the cash flow statement shows positive cash flow from operating activities and the ability to invest in assets and pay off liabilities. Overall, the company is in a favorable financial position and has the potential for continued growth.

Although we have obtained a very detailed analysis of the income, cash flow, and balance sheet tables, we should verify the accuracy of the data used, as ChatGPT is known for occasionally producing inaccurate facts and figures. Below is the actual balance sheet from the 10-K report:

chatGPT is reporting “The company’s total assets increased from $62,131 in 2021 to $82,338 in 2022. This increase was mainly driven by an increase in current assets, operating lease vehicles, and property, plant, and equipment. The company also saw an increase in stockholders’ equity, which reached $44,704 in 2022 compared to $30,189 in 2021.” which turns out to be the correct conclusion. Let’s check the cash flow:

chatGPT conclusion was “The company generated a net cash provided by operating activities of $14,724 in 2022, an increase from $11,497 in 2021 and $5,943 in 2020. This increase can be attributed to higher net income and improvements in working capital management.” which is also correct.

Conclusion

In conclusion, the ability to integrate custom deep learning models with generative AI like ChatGPT, in analyzing 10-K reports represents a significant advancement toward automating financial analysis. By utilizing AI-driven workflows, we were able to efficiently parse complex financial documents, extracting and analyzing key information like risk factors and financial statements. This approach not only saves time and resources but also enhances the accuracy and depth of the analysis.

Moreover, the use of Kudra’s no-code document AI platform further simplifies the process, making it accessible and scalable to thousands of reports. This is particularly beneficial for organizations that need to analyze large volumes of financial data but lack the technical resources to develop complex AI solutions.

As we delve deeper into AI’s capabilities, it’s important to remain cautious about its limitations. For instance, the potential for AI models like ChatGPT to “hallucinate” facts underscores the need for careful verification of AI-generated data against actual financial statements. This cautionary note serves as a reminder that while AI significantly enhances our analytical capabilities, it should be used as a complement to, rather than a replacement for, human expertise and critical evaluation.

If you are looking to automate your document extraction and analysis from financial reports, don’t hesitate to checkout kudra.ai and schedule a demo.

Visit us at DataDrivenInvestor.com

Subscribe to DDIntel here.

Have a unique story to share? Submit to DDIntel here.

Join our creator ecosystem here.

DDIntel captures the more notable pieces from our main site and our popular DDI Medium publication. Check us out for more insightful work from our community.

DDI Official Telegram Channel: https://t.me/+tafUp6ecEys4YjQ1

Follow us on LinkedIn, Twitter, YouTube, and Facebook.

ChatGPT
Artificial Intelligence
Named Entity Recognition
10k
Naturallanguageprocessing
Recommended from ReadMedium