Plain Text Accounting: Importing CSV Into Beancount

I think plain text accounting is great, but sometimes it’s a bit cumbersome to enter all the transactions manually, which wastes a lot of time.

Fortunately, Beancount allows you to import CSV files and more, which can enable you to extract transactions from account statements and integrate them directly into your journal, for example.

Importers

An Importer is an object used by Beancount to import transactions from external data sources. If you’re handy with programming, you can easily create your own Importers to import any type of data. For this example, I’ll just present a simple CSV Importer, adding functionality to automatically detect the right accounts to add transactions to.

Setting up import in Beancount

To import data with Beancount, all you need is an Importer, and a configuration file. This configuration file simply tells Beancount which Importers you want to use.

This file looks like this:

from importers import MyImporter

CONFIG = [
 MyImporter()
]

Of course, it’s possible to customize your configuration a little better, but I don’t want to make this article complex, so you’ll find examples in the Beancount documentation.

Writing an Importer

It’s best to know a bit of Python if you want to create an Importer. If you don’t, it’s still doable, because in the end, the way an Importer is designed is quite logical.

To begin with, you need to import the necessary dependencies. Create a Python file named import.py, for example, and put the following code in it.

import csv
import datetime
import re
import logging

from os import path
from dateutil.parser import parse

from beancount.core.number import D
from beancount.core.number import ZERO
from beancount.core import data
from beancount.core import account
from beancount.core import amount
from beancount.core import position
from beancount.ingest import importer

Then, we can create our class:

class MyImporter(importer.ImporterProtocol):
    def __init__(self):
        pass

If you need parameters in the rest of the code, simply add them to the constructor, for example:

class MyImporter(importer.ImporterProtocol):
    def __init__(self, currency):
        self.currency = currency

Next, we add a method that simply gives our Importer a name:

    def name(self):
        return "My Importer"

The following method identifies which Importer should be used for which file. Let’s say you have 3 different file types and an Importer for each, Beancount lets you import everything at once, as it assigns the right Importer to each file type.

In the example, we’ll be using a CSV file, so we can simply use a regex on the file name:

    def identify(self, file):
        return re.match(r"\.csv", path.basename(file.name))

Then, if you wish, you can write a method to rename the files imported by our Importer:

    def file_name(self, file):
        return "myimporter.{}".format(path.basename(file.name))

Moving on, we now need to write a method to specify which account the transactions should be imported into:

    def file_account(self, file):
        return "Assets:MyBank"

Now we can write a method to assign a date to all imported transactions. If we don’t want to bother, we’ll simply return the current date:

    def file_date(self, file):
        return datetime.datetime.now().date()

The last and most important method is to extract the transactions. Here, it involves parsing a CSV file. Here’s an example:

    def extract(self, file):
        entries = []
        index = 0
        
        with open(file.name) as infile:
            reader = csv.reader(infile, delimiter=';')  # We create a reader to read each row of our CSV file
            next(reader, None)  # I skip the first row of my file because it contains the header

            for index, row in enumerate(reader):
                meta = data.new_metadata(file.name, index)  # Beancount meta
                date = datetime.datetime.strptime(row[0], "%d/%m/%Y").date()  # We parse the date of the transaction
                desc = re.sub('\s{2,}', ' ', row[3])  # Then the description
                units = amount.Amount(D(row[4]), self.currency)  # Finally, the amount of the transaction
                
                payee = ""
                account = "Assets:MyBank"
                other_account = "Expenses"

                txn = data.Transaction(
                    meta,
                    date,
                    self.FLAG,
                    payee,
                    desc,
                    data.EMPTY_SET,
                    data.EMPTY_SET,
                    [
                        data.Posting(
                            account, units, None, None, None, None
                        ),
                        data.Posting(
                            other_account, -units, None, None, None, None
                        ),
                    ],
                )

                entries.append(txn)

        return entries

This method will obviously be different for you, depending on the structure of your CSV file. Basically, all you need to do is extract the date, description, amount, payee and other postings linked to your transaction (usually there are only two, so simply create a second posting with the opposite amount to the first, and parse the account to which you want to add the posting).

Adding a Balance Entry

I like to add a balance entry after I’ve added all the other transactions, to check everything is right. This is possible because my CSV file contains the total amount in my account, but it may not be possible for you. I’ll show you how to do it anyway:

    def extract(self, file):
        entries = []
        index = 0
        
        with open(file.name) as infile:
            reader = csv.reader(infile, delimiter=';')
            balance = next(reader, None)[5]  # New - We parse the balance of the account from the CSV file while skipping the header

            for index, row in enumerate(reader):
                meta = data.new_metadata(file.name, index)
                date = datetime.datetime.strptime(row[0], "%d/%m/%Y").date()
                desc = re.sub('\s{2,}', ' ', row[3])
                units = amount.Amount(D(self.format_amount(row[4])), self.currency)
                
                payee = ""
                account = "Assets:MyBank"
                other_account = "Expenses"

                txn = data.Transaction(
                    meta,
                    date,
                    self.FLAG,
                    payee,
                    desc,
                    data.EMPTY_SET,
                    data.EMPTY_SET,
                    [
                        data.Posting(
                            account, units, None, None, None, None
                        ),
                        data.Posting(
                            other_account, -units, None, None, None, None
                        ),
                    ],
                )

                entries.append(txn)


            if index:  # New - If we have at least two transactions, we create a balance entry
                balance_amount = amount.Amount(D(balance), self.currency)
                entries.append(
                    data.Balance(
                        meta,
                        date + datetime.timedelta(days=1),
                        account,
                        balance_amount,
                        None,
                        None,
                    )
                )

        return entries

Auto Detect Accounts

To save even more time, we can allow our Importer to automatically detect the account to which to add a posting. I’ll show you how to do this with regexes.

Let’s start by modifying our code a little so that the payee and the other account are returned by a method:

    def extract(self, file):
        entries = []
        index = 0
        
        with open(file.name) as infile:
            reader = csv.reader(infile, delimiter=';')
            balance = next(reader, None)[5]

            for index, row in enumerate(reader):
                meta = data.new_metadata(file.name, index)
                date = datetime.datetime.strptime(row[0], "%d/%m/%Y").date()
                desc = re.sub('\s{2,}', ' ', row[3])
                units = amount.Amount(D(self.format_amount(row[4])), self.currency)
                
                account = "Assets:MyBank"
                other_account, payee = self.smart_accounts(desc, file)  # New

                txn = data.Transaction(
                    meta,
                    date,
                    self.FLAG,
                    payee,
                    desc,
                    data.EMPTY_SET,
                    data.EMPTY_SET,
                    [
                        data.Posting(
                            account, units, None, None, None, None
                        ),
                        data.Posting(
                            other_account, -units, None, None, None, None
                        ),
                    ],
                )

                entries.append(txn)


            if index: 
                balance_amount = amount.Amount(D(balance), self.currency)
                entries.append(
                    data.Balance(
                        meta,
                        date + datetime.timedelta(days=1),
                        account,
                        balance_amount,
                        None,
                        None,
                    )
                )

        return entries

Now, let’s create our smart_accounts method.

    def smart_accounts(self, desc, file):
        payee = None
        other_account = None
        if re.match(r'.*ESTEBAN.*PEL.*', desc):
            other_account = "Assets:MyBank:PEL"
            payee = "Esteban"
        elif re.match(r'NET INCOME', desc):
            other_account = 'Income:MyBank:Interest' 
        elif re.march(r'.*VIR SEPA REC.*STRIPE'):
            other_account = "Assets:Stripe"
            payee = "STRIPE TECHNOLOGY EUROPE LTD"

        """
        Feel free to add as many conditions as you want
        """

        else:
            other_account = "Expenses:Other"
        
        return other_account, payee

It allows you to deduce the payee and the other account using the transaction description, and possibly the file name. Again, depending on your CSV file, it will certainly be different.

Putting it all together

Now, you can just add the code of this importer to your config file. I don’t recommend writing your Importer code, but instead in your Python path, but for the example I’ll write it in the config file as it’s the easiest. So, your import.pyfile should now look like the following:

import csv
import datetime
import re
import logging
from os import path

from dateutil.parser import parse

from beancount.core.number import D
from beancount.core.number import ZERO
from beancount.core import data
from beancount.core import account
from beancount.core import amount
from beancount.core import position
from beancount.ingest import importer


class MyImporter(importer.ImporterProtocol):
    def __init__(self, currency):
        self.currency = currency

    def name(self):
        return "My Importer"

    def identify(self, file):
        return re.match(r"\.csv", path.basename(file.name))

    def file_name(self, file):
        return "myimporter.{}".format(path.basename(file.name))

    def file_account(self, file):
        return "Assets:MyBank"

    def file_date(self, file):
        return datetime.datetime.now().date()


    def extract(self, file):
        entries = []
        index = 0
        
        with open(file.name) as infile:
            reader = csv.reader(infile, delimiter=';')
            balance = next(reader, None)[5]

            for index, row in enumerate(reader):
                meta = data.new_metadata(file.name, index)
                date = datetime.datetime.strptime(row[0], "%d/%m/%Y").date()
                desc = re.sub('\s{2,}', ' ', row[3])
                units = amount.Amount(D(self.format_amount(row[4])), self.currency)
                
                account = "Assets:MyBank"
                other_account, payee = self.smart_accounts(desc, file)  # New

                txn = data.Transaction(
                    meta,
                    date,
                    self.FLAG,
                    payee,
                    desc,
                    data.EMPTY_SET,
                    data.EMPTY_SET,
                    [
                        data.Posting(
                            account, units, None, None, None, None
                        ),
                        data.Posting(
                            other_account, -units, None, None, None, None
                        ),
                    ],
                )

                entries.append(txn)


            if index: 
                balance_amount = amount.Amount(D(balance), self.currency)
                entries.append(
                    data.Balance(
                        meta,
                        date + datetime.timedelta(days=1),
                        account,
                        balance_amount,
                        None,
                        None,
                    )
                )

        return entries

    def smart_accounts(self, desc, file):
        payee = None
        other_account = None
        if re.match(r'.*ESTEBAN.*PEL.*', desc):
            other_account = "Assets:MyBank:PEL"
            payee = "Esteban"
        elif re.match(r'NET INCOME', desc):
            other_account = 'Income:MyBank:Interest' 
        elif re.march(r'.*VIR SEPA REC.*STRIPE'):
            other_account = "Assets:Stripe"
            payee = "STRIPE TECHNOLOGY EUROPE LTD"

        """
        Feel free to add as many conditions as you want
        """

        else:
            other_account = "Expenses:Other"
        
        return other_account, payee


CONFIG = [
        MyImporter("EUR")
]

bean-extract

Finally, all you have to do is use bean-extract to extract your transactions from your documents.

usage: bean-extract [-h] [--version] [-e BEANCOUNT_FILE] [-r] CONFIG_FILENAME DIR-OR-FILE [DIR-OR-FILE ...]

Extract transactions from downloads

positional arguments:
  CONFIG_FILENAME       Importer configuration file. This is a Python file with a data structure that is specific to your accounts
  DIR-OR-FILE           Filenames or directories to search for files to import

options:
  -h, --help            show this help message and exit
  --version, -V         show program's version number and exit
  -e BEANCOUNT_FILE, -f BEANCOUNT_FILE, --existing BEANCOUNT_FILE, --previous BEANCOUNT_FILE
                        Beancount file or existing entries for de-duplication (optional)
  -r, --reverse, --descending
                        Write out the entries in descending order

So, if my documents are stored in a downloadsfolder, and my configuration file is config/import.py, I can use bean-extract this way:

bean-extract config/import.py downloads/

To extract transactions into a file rather than having to copy them from the command line, I add “> filename” after my command, for example:

bean-extract config/import.py downloads/ > imported.beancount

Fava

Fava makes it easy to import your transactions. If you’re not familiar with Fava, I’ll say a few words about it at the end of this article. To import transactions with Fava, simply add two options to your ledger file:

2023–07–30 custom “fava-option” “import-config” “./config/import.py”
2023–07–30 custom “fava-option” “import-dirs” “./downloads”

Then go to “Import” and you can import your files from a UI.

Final Note

Unfortunately, importing transactions from a CSV file is not something you can do directly in Beancount. This can be a bit of a disincentive to use this time-saving feature. I’ve tried to explain the process simply, but it may not speak to you if you’re not familiar with programming. In that case, leave a comment and I’ll try to help!

Thanks for reading! Here are some links that may interest you:

Join Medium with my referral link — Esteban Thilliez

Read every story from Esteban Thilliez (and thousands of other writers on Medium). Your membership fee directly…

medium.com