avatarYanli Liu

Summary

The author has developed a local AI-powered finance analysis tool, "My Local AI Finance Insighter," to securely analyze personal financial data and provide tailored insights and recommendations without the need for internet connectivity.

Abstract

The author describes the creation of "My Local AI Finance Insighter," a personal finance tool designed to operate locally on a user's laptop, ensuring data privacy and security. This tool analyzes transaction data to offer personalized savings tips and investment plans, leveraging advanced open-source models like Mistral and LLAVA through the LangChain framework and a user-friendly Streamlit interface. The project was inspired by the need for a secure and personalized approach to financial analysis during the author's Easter break. The tool categorizes transactions, performs quantitative and qualitative financial analyses, and generates visual reports and personalized advice, all without sharing sensitive data with external providers.

Opinions

  • The author values data privacy and prefers a local solution over cloud-based AI services for financial data analysis.
  • There is a preference for open-source models and tools, such as Ollama, LangChain, and Streamlit, for their capabilities and ease of use.
  • The author emphasizes the importance of prompt engineering to guide the outputs of LLMs for accurate transaction categorization.
  • The author believes in the effectiveness of Generative AI for providing personalized financial advice and insights.
  • There is an acknowledgment that LLMs can struggle with consistent categorization, especially for large datasets, necessitating code and prompt adjustments for optimal results.
  • The author suggests that adding a chatbot interface for direct question-answering could be a valuable future feature.
  • The author is open to community feedback and improvements, inviting readers to share better methods for transaction categorization.

How I Analyzed My Finance With A Local AI

A personalized and secure approach to analyzing financial data, providing tailored insights and recommendations

As the spring season rolls in, bringing with it the tradition of house decluttering, I decided to use my Easter break to get a grip on my finances, cut costs, and formulate a better financial plan.

The idea of asking ChatGPT or sharing my confidential data with OpenAI, Google, or any other AI solution provider didn’t sit well with me. No way! So, I embarked on a project over my Easter break to create a tool I named “My Local AI Finance Insighter.

Photo by Maarten Deckers on Unsplash

True to its name, it runs entirely locally, can operate without an internet connection, and keeps your financial data secure.

My Local AI Finance Insighter first takes in my transaction data, analyzes my financial situation — including income and spending — and then offers personalized savings tips and suggests investment plans tailored to my lifestyle and financial goals. Check out the demo here.

Interested in having this tool running on your laptop? Follow this tutorial, and I’ll guide you step by step through the process of building your own Local AI Finance Insighter.

Disclaimer: The information provided here is provided solely for educational purposes and is not intended to be personal financial, investment, or other advice. All opinions are my own.

Table of Contents:

  1. An Overview of the Application, Its Design, and Development Stack 1.2. Overview of the Application and Architecture 1.3. Ollama and Open Source LLMs 1.2. LangChain 1.3. Streamlit
  2. Pre-requisite and Dataset Preparation
  3. Building the Data Upload Interface and Transaction Data Processing Modules
  4. Building the Personal Finance Dashboard and Personalized Report Generation
  5. Closing thoughts

An Overview of the Application, Its Design, and Development Stack

Overview of the Application and Architecture

The application is built on streamlit for its user interface and at its core, it leverages on Langchain to interact with locally served open-source LLM models within Ollama.

Within this application, we’re tapping into some of the most advanced open-source models available, such as Mistral as well as LLAVA to unlock multimodality features.

We will employ prompt engineering to direct the outputs of the LLMs to advise you on “Professional Finance Planner’s opinion” on savings and investment recommendations.

Overview of the application. Graph by author.

Here’s what we will achieve in this project:

  • Transaction Data Processing: Using LLM to process and categorize transaction data,
  • Quantitative Analysis and Visuals: We will calculate your total income, expenses and net savings rate from your transaction details. There also be graphic visualization for income and expenses trend analysis, spending analysis and much more.
  • Interpreting Data With Multimodality: Using the multimodality feature to understand the graphs, uncover trends, and highlight key takeaways.
  • Personalized Finance Plan: Using LLMs to generate personalized savings tips and investment recommendations based on your unique lifestyle and financial goals.

But, before entering into the technical coding, here is a line-up I used within this project, if you already familiar with them, you may skip.

Ollama

Ollama is, for me, the best and also the easiest way to get up and running with open source LLMs. It supports, among others, the most capable LLMs such as Llama 2, Mistral, Llava, and you can find the list of available models on ollama.ai/library. It´s available on MacOS, Windows and Linux.

LangChain

Langchain is an open-source framework built around LLMs. It dramatically simplifies the design and development of GenAI applications, including chatbots, summarization tools, and beyond.

The core idea of the library is to “chain” together different components to simplify complex AI tasks and create more advanced use cases around LLMs. It offers a great integration with Ollama served open source models.

Streamlit

Streamlit is an open-source framework that enables developers to create and share data applications fast. It allows the development of web-based user interfaces with only a few lines of Python code.

The level of simplicity makes Streamlit a perfect match for rapid prototyping and complex data dashboards projects.

Pre-requisite and Dataset Preparation

Installing Ollama

First, go to Ollama download page, pick the version that matches your operating system, download and install it.

With Ollama installed, open your command terminal and enter the following commands. These commands will download the models and run them locally on your machine. In this project, we´ll use Mistral and LLAVA.

ollama serve
ollama pull mistral
ollama pull llava
ollama run mistral
ollama run llava

Preparing Your Dataset

For the demo purpose, I’ll replace my personal transaction data with synthetic data. So, I prompted ChatGPT to generate 1,000 transactions for me. Here’s the prompt you can use for your testing if needed.

Generate a realistic dataset of 1,000 financial transactions for a young professional working in finance, residing in Europe, covering the period from January 2022 to December 2023. Ensure a balanced representation of income and expenses across various categories typical for this demographic. Include the following four columns:

Date: Transaction date (format: YYYY-MM-DD)
Name/Description: A unique and detailed description of each transaction (e.g., "Salary deposit," "Monthly rent payment," "Restaurant dinner with friends")
Expense/Income: Clearly indicate whether the transaction is an expense (e.g., "Expense") or income (e.g., "Income")
Amount(EUR): Transaction amount in Euros

The generated dataset should consist of four columns:

  • Date: Date of the transaction
  • Name/Description: A short description of the transaction. This field will be used by the LLM to categorize the transaction based on its primary nature.
  • Expense/Income: Indicates if it’s an income or expense
  • Amount(EUR): Amount in Euros
Example of the dataset. Image by author.

Building the Data Upload Interface and Transaction Data Processing Modules

Installing dependencies

Now, we need to install dependancies related to Langchain and Streamlit.

pip install langchain-community
pip install streamlit

Setting Up Data Upload Module

Let´s create a new python file “Upload.py” and add the following code. Here we´ll :

  • Import the necessary libraries
  • Initialise the LLM model which will be used for transaction categorization
  • Define Categories: Create a list of categories that cover various types of income and expenses. This will help the LLM categorize each transaction accurately.
import streamlit as st
import pandas as pd
from langchain_community.llms import Ollama

llm = Ollama(model="mistral")
categories = [
    "Salary/Wages", "Investment Income", "Freelance Income", "Business Revenue","Rental Income","Housing", "Utilities","Groceries","Transportation","Insurance","Healthcare","Entertainment","Personal Care","Education","Savings/Investments","Loans/Debt","Taxes","Childcare","Gifts/Donations","Dining Out","Travel","Shopping","Subscriptions","Pet Care", "Home Improvement","Clothing","Tech/Gadgets", "Fitness/Sports",
]
categories_string = ",".join(categories)

Building the Transaction Categorization Function

  1. Categorize Transactions: Write a function categorize_transactions that takes in transaction names. We´ll use the prompt engineering technique to guide the output of the LLM. We’ll include the transaction names in the context of a prompt and request the LLM to categorize them according to our predefined list of categories. After receiving the LLM’s output, we’ll then organize and transform this data into a structured pandas DataFrame.
def categorize_transactions(transaction_names, llm):
    prompt = f"""Add an appropriate category to the following expenses.
        Remember The category should only be one of the following and  choose only one category from the list that is most relevant based on their primary purpose or nature:   {categories_string}.\n  
        The output format should always be : transaction name - category. For example For example: Spotify #2 - Entertainment, Basic Fit Amsterdam Nld #3 - Fitness/Sports \n 
        Here are the Transactions to be categorized: {transaction_names} \n"""

    print(prompt)
    filtered_response = []
    # retry is the LLM output is not consistent
    while len(filtered_response) < 2:
        response = llm.invoke(prompt).split("\n")
        print(response)
        # Remove items that do not contain "transaction: category" pairs
        filtered_response = [item for item in response if '-' in item]
    print(filtered_response)
    # Put in dataframe
    categories_df = pd.DataFrame({"Transaction vs category": filtered_response})

    size_dif = len(categories_df) - len(transaction_names.split(","))
    if size_dif >= 0:
        categories_df["Transaction"] = transaction_names.split(",") + [None] * size_dif
    else:
        categories_df["Transaction"] = transaction_names.split(",")[:len(categories_df)]
    categories_df["Category"] = categories_df["Transaction vs category"].str.split("-", expand=True)[1]
    return categories_df

2. Process Data Function: Create a process_data function that processes the uploaded data file, categorizes transactions using categorize_transactions, and merges the categorized data back into a global DataFrame that will be used for the further data analysis.

def hop(start, stop, step):
    for i in range(start, stop, step):
        yield i
    yield stop

def process_data(df: pd.DataFrame):
    unique_transactions = df["Name/Description"].unique()
    index_list = list(hop(0, len(unique_transactions), 30))

    # Intialise the categories_df_all dataframe
    categories_df_all = pd.DataFrame()

    # Loop through the index_list
    for i in range(0, len(index_list) - 1):
        print(f"Looping: {i}")
        transaction_names = unique_transactions[index_list[i] : index_list[i + 1]]
        transaction_names = ",".join(transaction_names)

        categories_df = categorize_transactions(transaction_names, llm)
        categories_df_all = pd.concat(
            [categories_df_all, categories_df], ignore_index=True
        )

    # futher clean data:
    # Drop NA values
    categories_df_all = categories_df_all.dropna()
    # Remove the numbering eg "1. " from Transaction column
    categories_df_all["Transaction"] = categories_df_all["Transaction"].str.replace(
        r"\d+\.\s?", "", regex=True
    ).str.strip()

    new_df = pd.merge(
        df,
        categories_df_all,
        left_on="Name/Description",
        right_on="Transaction",
        how="left",
    )
    new_df.to_csv(f"data/{uploaded_file.name}_categorized.csv", index=False)
    return new_df

Creating the Streamlit Web App

  1. Set Up Your Streamlit Interface: Begin by setting the title for the web app and adding a file uploader widget. This allows users to upload their financial data.
st.title("📝 Load your financial data here")
uploaded_file = st.file_uploader("Upload your financial data", type=("txt", "csv", "pdf"))

2. Process Uploaded Data: Once a file is uploaded, read it into a pandas DataFrame and call the process_data function to categorize transactions.

if uploaded_file:
    with st.spinner("Processing data..."):
        file_details = {"FileName": uploaded_file.name, "FileType": uploaded_file.type}
        df = pd.read_csv(uploaded_file)
        df = process_data(df)
        st.markdown("Data processed : OK")

3. Run the streamlit app : you should have an interface like this

Demo of the Upload page. Image by author.

Building the Personal Finance Dashboard and Personalized Report Generation

Now that all our transactions have been neatly sorted by the LLM Mistral, we’re ready to process the financial analysis, which consists of four steps:

  1. Quantitative Analysis: In order to get a clear picture of your financial health, we´ll first calculate annual income and expenses, identify where most of the money is going, and more.
  2. Visual Presentation: We´ll chart the transaction data to spot trends and patterns.
  3. Qualitative analysis by LLM : We’ll take the key finance indicators and insights we’ve gathered (including those neat graphs) and feed them back to Mistral. With a carefully crafted prompt, we’ll ask it to give us a qualitative analysis of your financial situation.
  4. Personalised saving tips and investment recommendation generation : Based on the insights from both the quantitative and qualitative analyses, Mistral will then generate personalized advice for you.

Importing Libraries and Creating Financial Analysis Function

First, let´s create a new python file “Finance_Dashboard.py”import the necessary Python libraries and initialize your Ollama models.

import os
import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
from langchain_community.llms import Ollama

llm_llava = Ollama(model="llava")
llm = Ollama(model="mistral")

Then we’ll create a function called financial_analysis to analyze your financial data. This function calculates yearly and monthly income and expenses, savings rate, and identifies the top expense categories.

def financial_analysis(data:pd.DataFrame):
    key_figures = {}
    # Calculate yearly total income and total expenses
    yearly_income = data.loc[data['Expense/Income'] == 'Income'].groupby('Year')['Amount(EUR)'].sum().mean()
    yearly_expenses = data.loc[data['Expense/Income'] == 'Expense'].groupby('Year')['Amount(EUR)'].sum().mean()

    # Identify the top expense categories
    top_expenses = data.loc[data['Expense/Income'] == 'Expense'].groupby('Category')['Amount(EUR)'].sum().sort_values(
        ascending=False)

    # Calculate average monthly income and expenses
    monthly_income = data.loc[data['Expense/Income'] == 'Income'].groupby(data['Date'].dt.to_period('M'))[
        'Amount(EUR)'].sum().mean()
    monthly_expenses = data.loc[data['Expense/Income'] == 'Expense'].groupby(data['Date'].dt.to_period('M'))[
        'Amount(EUR)'].sum().mean()

    # Determine the savings rate
    savings = yearly_income - yearly_expenses
    savings_rate = (savings / yearly_income) * 100 if yearly_income > 0 else 0

    key_figures['Average Annual Income'] = f"€{yearly_income:,.2f}"
    key_figures['Average Annual Expenses'] = f"€{yearly_expenses:,.2f}"
    key_figures['Annual Savings Rate'] = f" {savings_rate:.2f}%"
    key_figures['Top Expense Categories'] = {category: f"€{amount:,.2f}" for category, amount in
                                             top_expenses.head().items()}
    key_figures['Average Monthly Income'] = f"€{monthly_income:,.2f}"
    key_figures['Average Monthly Expenses'] = f"€{monthly_expenses:,.2f}"
    return key_figures

Creating Plotting Functions

Here we´ll implement functions to visualize your financial data. These include four plots for income vs. expenses over time, monthly saving rates, income sources, and category-wise spending.

def plot_income_vs_expense_over_time(df):
    # Income vs Expense Over time
    st.markdown("1. Income vs Expense Over time")
    income_expense_summary = (
        df.groupby(["YearMonth", "Expense/Income"])["Amount(EUR)"]
        .sum()
        .unstack()
        .fillna(0)
    )
    income_expense_summary.plot(kind="bar", figsize=(10, 8))
    plt.title("Income vs Expenses Over Time")
    plt.ylabel("Amount (EUR)")
    plt.xlabel("Month")
    plt.savefig("data/income_vs_expense_over_time.png", bbox_inches="tight")
    st.pyplot(plt)


def plot_saving_rate_trend(data: pd.DataFrame):
    st.markdown("2. Monthly Saving Rate Trend")
    monthly_data = data.groupby(['YearMonth', 'Expense/Income'])['Amount(EUR)'].sum().unstack().fillna(0)
    monthly_data['Savings Rate'] = (monthly_data['Income'] - monthly_data['Expense']) / monthly_data['Income'] * 100
    fig, ax = plt.subplots()
    monthly_data['Savings Rate'].plot(ax=ax)
    ax.set_xlabel('Month')
    ax.set_ylabel('Savings Rate (%)')
    plt.savefig("data/saving_rate_over_time.png", bbox_inches="tight")
    st.pyplot(fig)

def plot_income_source_analysis(data: pd.DataFrame):
    st.markdown("3. Income Sources Analysis")

    income_sources = data[data['Expense/Income'] == 'Income'].groupby('Category')['Amount(EUR)'].sum()
    income_sources.plot(kind="pie", figsize=(10, 8), autopct="%1.1f%%", startangle=140)
    plt.title("Income Sources Analysis")
    plt.ylabel("")  # Hide the y-label as it's unnecessary for pie charts
    plt.savefig("data/income_source_analysis.png", bbox_inches="tight")
    st.pyplot(plt)


def plot_category_wise_spending_analysis(data: pd.DataFrame):
    st.markdown("4. Category-wise Spending Analysis")
    expenses_by_category = data[data['Expense/Income'] == 'Expense'].groupby('Category')['Amount(EUR)'].sum()
    expenses_by_category.plot(kind="pie", figsize=(10, 8), autopct="%1.1f%%", startangle=140)
    plt.title("Expenses Analysis")
    plt.ylabel("")  # Hide the y-label as it's unnecessary for pie charts
    plt.savefig("data/expense_category_analysis.png", bbox_inches="tight")
    st.pyplot(plt)

Loading Your Financial Data

Read your financial data from CSV files, process the data, and prepare it for analysis.

total_df = pd.DataFrame()
for root, dirs, files in os.walk("data"):
    for file in files:
        if file.endswith(".csv"):
            df = pd.read_csv(os.path.join(root, file))
            total_df = pd.concat([total_df, df], ignore_index=True)

total_df["Date"] = pd.to_datetime(total_df["Date"])
total_df["YearMonth"] = total_df["Date"].dt.to_period("M")
total_df["Year"] = total_df["Date"].dt.year

Setting Up Your Streamlit Dashboard

Use Streamlit to create a title for your dashboard, display the analysis results, and integrate the plotting functions.

st.title("My Local AI Finance Insighter")
st.markdown(
    "**A personalized and secure approach to analyzing financial data, providing insights and recommendations tailored to individual needs.**"
)

analysis_results = financial_analysis(total_df)
results_str = ""
# Loop through the dictionary
for key, value in analysis_results.items():
    if isinstance(value, dict):
        # If the value is another dictionary, further iterate to get sub-keys and values
        sub_results = ', '.join([f"{sub_key}: {sub_value}" for sub_key, sub_value in value.items()])
        results_str += f"{key}: {sub_results}\n"
    else:
        # For direct key-value pairs, simply concatenate
        results_str += f"{key}: {value}\n"

st.subheader("Yearly Figures")
col1, col2, col3 = st.columns(3)
col1.metric(label="Average Annual Income", value=analysis_results['Average Annual Income'])
col2.metric(label="Average Annual Expenses", value=analysis_results['Average Annual Expenses'])
col3.metric(label="Savings Rate", value=analysis_results['Annual Savings Rate'])

# Display average monthly figures
st.subheader("Average Monthly Figures")
col1, col2 = st.columns(2)
col1.metric(label="Average Monthly Income", value=analysis_results['Average Monthly Income'])
col2.metric(label="Average Monthly Expenses", value=analysis_results['Average Monthly Expenses'])

# Display top expense categories in a table
st.subheader("Top Expense Categories")
expenses_df = pd.DataFrame(list(analysis_results['Top Expense Categories'].items()), columns=['Category', 'Amount'])
st.table(expenses_df)

with st.container():
    col1, col2 = st.columns(2)
    with col1:
        plot_income_vs_expense_over_time(total_df)
    with col2:
        plot_saving_rate_trend(total_df)

with st.container():
    col3, col4 = st.columns(2)
    with col3:
        plot_income_source_analysis(total_df)
    with col4:
        plot_category_wise_spending_analysis(total_df)

When you run streamlit, you should get a dashboard like this:

Demo of the Finance Dashboard page. Image by author.

Generating Personalized Finance Analysis and Recommandation

Finally, we´ll pass the previously generated quantitative and qualitative analysis to Mistral so it can create personalized advice for you!

with st.container():
    col3, col4 = st.columns(2)
    with col3:
        plot_income_source_analysis(total_df)
    with col4:
        plot_category_wise_spending_analysis(total_df)


with st.spinner("Generating reports ..."):
    total_response = ""
    for root, dirs, files in os.walk("data"):
        for file in files:
            if file.endswith(".png"):
                response = llm_llava.invoke(
                    f"Act as an expert finance planner and analyse the image : {os.path.join(root, file)}. You should give your insights extracted from the image and key figures you see from the image "
                )
                total_response += response
    total_response += f"\nHere are the user key financial figures : {results_str}"

    st.write("---------------")
    st.markdown("**Finance analysis and budget planner**")

    summary = llm.invoke(
        f"You are a helpful and expert finance planner. Base on the following analysis: {total_response}, make a summary of the financial status of the user and suggest tips on savings. Highlight categories where the user can potentially reduce expenses and suggest an ideal savings rate based on their income and goals. Tailor these suggestions to fit the user’s lifestyle and financial objectives. Use a friendly tone. "
    )
    st.write(summary)
    st.write("---------------")
    st.markdown("**Investment tips**")
    if "user_answers_str" in st.session_state:
        user_investment_answer = st.session_state.user_answers_str
    else:
        user_investment_answer = ""
    investment_tips = llm.invoke(
        f"You are a helpful and expert finance planner. Based on the user's risk tolerance and investment goals, provide an overview of suitable investment options. Discuss the basics of stocks, bonds, mutual funds, ETFs, and other investment vehicles that align with their profile. Explain the importance of diversification and the role of risk management in investing. Offer to guide them through setting up a diversified investment portfolio, suggesting steps to get started based on their current financial situation. Use a friendly tone. Below are the user´s investment objective and risk tolerance : {user_investment_answer}"
    )
    st.write(investment_tips)

The report is well-organized and seems coherent, but it’s longer than I’d prefer. To get a more concise output, we could refine the prompt further.

Demo of the AI generated saving tips and investment recommandation. Image by author.

Closing Thoughts

And so, this is the Local AI-Powered Finance Insighter that we built together to better understand our financial health!

We’ve tapped into the power of Generative AI, by tailoring it to provide highly personalized advice. Doing it entirely locally is the standout feature of this dashboard: your financial data remain securely within your laptop.

Going Beyond

I initially started this project, inspired by a video from by Thu Vu on data analytics. She explained her use of a local LLM for categorizing transaction data. However, I discovered that LLMs, due to their creative nature, can struggle with consistent categorization, especially for large datasets! I had to tweak code and prompts in order to have good results.

On the other hand, using general AI and LLMs to generate insightful analysis, reports, and personalized tips proved very effective.

Another interesting feature could be adding a chatbot interface, allowing users to ask finance-related questions directly.

Hope you enjoyed this project! If you discover a better way to consistently categorize transactions, let me know in the comments!

Before you go! 🦸🏻‍♀️

If you liked my story and you want to support me:

  1. Throw some Medium love 💕(claps, comments and highlights), your support means the world to me.👏
  2. Follow me on Medium and subscribe to get my latest article🫶
Data Science
Artificial Intelligence
Machine Learning
Recommended from ReadMedium