LangChain: Use the Power of GPT to Chat with Earnings Reports of Companies
Compare quarterly reports from companies with an LLM

Do you want to compare many quarterly reports of public companies? It’s tedious to look through all the quarterly reports by hand. An assistant would be very helpful, wouldn’t it? That’s our motivation to build an AI assistant for comparing different earnings reports. Be curious!
We implement the assistant with the frameworks LangChain and Plotly Dash in Python. If you want to learn more about Plotly Dash, we recommend our article “A Comprehensive Guide to Building Enterprise-Level Plotly Dash Apps”. This article shows you how to build a production-ready web app with Plotly Dash and Docker. If you are a LangChain beginner, we recommend our introduction article about LangChain.
The article is a perfect starting point for understanding the basics of LangChain, and it’s essential for understanding the concepts in this article.
Now let’s start with what you can expect in this article.
🤔 What can you expect?
In this article, we’ll analyse the Q1–2023 earnings reports of Tesla, Mercedes Benz and BMW. First, we’ll give you a technical overview. The overview shows you the individual components which we need. Furthermore, we explain the loading of the PDF documents (earnings reports), the data preparation and how agents in LangChain work.
Do you want to know what the finished app looks like? Here’s a demo of the final web app:

We’ve no time to waste. Let’s jump into the setup!
💻 Setting up the environment
There are some technical requirements before we can start. Make sure that you’ve installed Python, conda and pip. Furthermore, we need a terminal for setting up the environment.
First, we’ve to create a virtual Python environment. For this, you can use the following commands:
- Create a conda environment (env):
conda create -n langchain python=3.9.12
- Activate the environment:
conda activate langchain
Great, we have a working environment. Now, it’s time to install all required dependencies. For this, you can use a requirements file. Create a file with the name requirements.txt
. Paste the following content in the file:
langchain openai pypdf tiktoken chromadb pycryptodome
Now, you can install all dependencies with the following command:
$ pip install -r requirements.txt
Great, all required dependencies are installed. Next, you must set your OpenAI API key. Do you have no key? That’s no problem, you can generate your API key in your OpenAI account. If you don’t have an OpenAI account, you must sign up. Then, log in to your OpenAI account and create a new API key.

Then, click on “Create new secret key” and give your API key a name. Copy the key! You get free credit for API access. Save your key in a save place because it’ll no longer display.
Next, we’ve to set the API key. You can do this as an environment variable or in your Python script. In the following, you can see how to set the key as an environment variable:
# macOS and linux
$ export OPENAI_API_KEY=[Insert your API Key here.]
# windows
$ set OPENAI_API_KEY=[Insert your API Key here.]
Now the setup process is finished!
Do you want to take your Python and ML skills to the next level? Discover The Complete Python, Machine Learning, AI Mega Bundle*. It is a completely self-paced online learning course.
Now, we can dive into the implementation. Let’s go!
😎 Implementation
The focus of this article is on the LangChain implementation and not on the implementation of the Dash web app. If you’re interested in the code of the Dash web app, please check out the appendix of our previous article. We use the code in this article too.
Technical overview
Let’s look at the steps we need to implement. The following figure shows the procedure:

First, we load all relevant earnings reports. You can choose the public companies of your choice. In this article, we use the companies Tesla, Mercedes Benz and BMW. Then, we split the documents into small and semantically meaningful chunks. Next, we put these chunks into an index, whereby we can easily retrieve them. For that, we’re going to use embeddings and vector stores/ databases. Then, we can answer a human question based on relevant splits and an LLM. That’s roughly the process. The final answer of the LLM is displayed on the web UI. Let’s jump into the implementation!
Document Loading
In this section, we explain how you can load PDFs in LangChain. First, we download the Q1–2023 earnings reports of Mercedes Benz, Tesla and BMW. Next, we’ve to define a list of all PDF files. Look at the following Python code:
# files we want to compare
files = [
{
"name": "mercedes-benz-earnings-Q1-2023",
"path": "files/mercedes-benz-interim-report-q1-2023.pdf"
},
{
"name": "tesla-earnings-Q1-2023",
"path": "files/TSLA-Q1-2023-Update.pdf"
},
{
"name": "bmw-earnings-Q1-2023",
"path": "files/BMW_Q1-2023_EN.pdf"
},
]
For every file, we define a name and a file path. The name is relevant for the agents, we discuss this later in this article. The next step is to load all PDF files. For this, we use the PyPDFLoader function.
for file in files:
# load the documents
loader = PyPDFLoader(file["path"])
pages = loader.load()
All the code we discuss up to the section “Agents in LangChain” is in a for loop. After running the above code, we have a list of pages for each report. Great!
Document Splitting
In this section, we split the pages into semantically relevant chunks. That’s not easy because every chunk should contain meaningful information. Let’s look at the code.
chunk_size = 1000
chunk_overlap = 100
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
splits = text_splitter.split_documents(pages)
First, we define the chunk_size
and the chunk_overlap
. You can also try other sizes, but for our case this size is very suitable. Look at the following figure to understand what the parameters chunk_size
and the chunk_overlap
means:

The parameter chunk_size
defines the length of the chunks in characters. And the parameter chunk_overlap
specifies the overlap between two chunks. The code above splits our PDF pages into semantically relevant chunks so that we can use them for retrieval later. Now, it’s time to put our chunks into an index.
Create Vector Database
In this section, we use embeddings and vector databases to store our documents in an easily accessible format. These concepts are very important for building a chatbot over our data.
First, we’ve to discuss what embeddings are. An embedding takes a piece of text and converts it into a numerical representation. Texts with similar content will have similar embedding vectors. We create our embeddings with the OpenAI function OpenAIEmbeddings
and use chroma as a vector database. Chroma is a lightweight and in-memory vector database. Look at the following code snippet:
persist_directory = 'files/chroma/'
embedding = OpenAIEmbeddings()
retriever = Chroma.from_documents(
documents=splits,
embedding=embedding,
persist_directory=persist_directory
).as_retriever()
First, we define the location where we store all stuff for our vector database. Then, we create an embedding object. The function from_documents(...)
creates embeddings for all our splits and put them into our vector database. We make a separate vector database for each PDF file. After that, we create retrievers for each PDF file.
Next, we define a Prompt Input schema. The user enters a question, so we define it as a string. Look at the following code:
# create input schema
class PromptInput(BaseModel):
question: str
llm = ChatOpenAI(
temperature=0,
model="gpt-3.5-turbo-0613"
)
tools.append(
Tool(
args_schema=PromptInput,
name=file["name"],
description=f"useful for when you need to answer questions about {file['name']}",
func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
)
)
In addition, we initialise our LLM. We set the temperature parameter to zero. You can use the temperature
parameter to control the level of randomness or creativity. A lower temperature
results in more conservative output text. A higher temperature
leads to more unexpected or random output text. We also use the current GPT-3.5 model.
In the next step, we put our LLM and the retrievers for each PDF into a tool. A tool is an interface that an agent can use to interact with the world. In addition, we set the file name and give the tool a description. The description tells the model how, when and why to use the tool. It’s similar to a prompt. You can also provide few-shot examples as a part of the description. Now, we’ve define a tool for each PDF. So it’s time look at agents in LangChain.
Agents in LangChain
In this section, we take a look at agents in LangChain and how we can use them in our web app. Agents lead to several flexibilities in application development. An agent has access to different tools and decides which to use depending on the user’s input. Agents can use multiple tools and use the output of one tool as input for the next. If you want to learn more about agents, please check out the LangChain documentation. First, we’ve to initialise the agent.
agent = initialize_agent(
agent=AgentType.OPENAI_FUNCTIONS,
tools=tools,
llm=llm,
verbose=True,
)
In the initialisation, we assign our tools and our LLM. Then, we set the parameter verbose
to True
so we can see what the AI is doing. That’s very helpful, especially for debugging. We also use OpenAI functions. OpenAI functions are a feature that allows some OpenAI models like gpt-3.5-turbo-0613 to detect when a function should be called. The explanation of how OpenAI functions work needs another article. If you want to know more about it, see the LangChain documentation.
Now, we’ve successfully created an agent. Great! Next, we want to use our agent. For this, we look at the following code snippet:
answer = agent({"input": human_prompt})
result_ai = answer["output"]
First, we give our agent the prompt. Then the agent delivers a response. We display this answer in our web UI. Wow, we’ve done all the magic. It’s time to test our web app!
Chat with the earnings reports
In this section, we chat with the three earnings reports. For that, we have thought of three questions which we ask the chatbot. In this context, we compare the chatbot’s answer with the information in the earnings reports to verify whether the answers are correct. Let’s start!
Question 1: Compare the number of cars sold.
Chatbot answer: In Q1 2023, the number of cars sold by each company are as follows: — Mercedes-Benz sold a total of 503,500 cars. — Tesla sold a total of 422,875 cars. — BMW sold a total of 517,957 cars. Please note that these numbers represent the total cars sold by each company in Q1 2023.
Is the answer correct? Yes, the answer is correct!
Question 2: What percentage have Tesla’s total deliveries increased on an annual basis?
Chatbot answer: Tesla’s total deliveries have increased by 36% on an annual basis.
Is the answer correct? Wow, that’s right!
Question 3: Compare the number of battery electric vehicles (BEV) sold by the companies Mercedes and BMW.
Chatbot answer: In Q1 2023, Mercedes sold a total of 3,600 battery electric vehicles (BEV), while BMW sold 55,979 battery electric vehicles (BEV). Therefore, BMW sold significantly more BEVs compared to Mercedes.
Is the answer correct? That’s not quite correct! Mercedes has sold 51,639 battery electric vehicles, and 3,600 of this are all-electric vans. The number of battery electric vehicles sold by BMW is correct.
We have seen that the chatbot works well, but it also makes some mistakes. In addition, we’ve tried to chat with earnings reports in different languages, and that works also well. For instance, we’ve chosen the Q1 report of Tesla in English and the Q1 report of Mercedes in German. We also experiment with different values for chunk_size
and chunk_overlap
. The change of the chunk_overlap
has a big effect on the chatbot output. You can experiment with these parameters to get the optimal combination for your use case. We’ve used the OpenAI model gpt-3.5-turbo-0613 for our chatbot. If you want to choose the gpt-4 model, you can change the model in the code. You can assume that the performance of gpt-4 is significantly better than that of gpt-3.5. In our case, gpt-3.5 provided partly wrong information. So keep in mind: AI is only a tool, nothing more!
🎬 Conclusion
In this article, we showed you an architecture on how to create a chatbot for PDF files. First, we load all PDF files (in our case earnings reports of the companies Tesla, Mercedes and BMW). Then, we split the pages of the documents into semantically relevant chunks. Next, we create embeddings for each chunk. An embedding takes a piece of text and converts it into a numerical representation. After that, we put the relevant embeddings and the human question into an LLM. The LLM answers the questions, and we display the answer in our web app. Finally, we’ve asked our chatbot three questions about the earnings reports of the three companies. Our chatbot answered most questions correctly. However, we’ve also seen that the chatbot sometimes gives back wrong information.
👉🏽 Join our free weekly Magic AI newsletter for the latest AI updates!
👉🏽 You can find all our Freebies on our digital products page!
Subscribe for free to get notified when we publish a new story:
Learn more about us on our About page. Don’t forget to follow us on X. Thanks so much for reading. If you liked this article, feel free to share it. Have a great day!
Sign up for a Medium membership using our link to read unlimited Medium stories.
🔍 References
- LangChain Documentation
- LangChain Chat with Your Data — DeepLearning.AI
* Disclosure: The links are affiliate links, which means we will receive a commission if you purchase through these links. There are no additional costs for you.
Towards Finance
Thank you for being part of the TF community.
- 📚 View more content in the Towards Finance publication.
- 🚀 Subscribe to our monthly Towards Finance Newsletter.
- 🪄 Join our weekly Magic AI Newsletter for the latest AI updates.
- 🤝🏽 Follow us on Twitter and LinkedIn for more finance and data science content.