Fine-Tune Your LLM Without Maxing Out Your GPU
How you can fine-tune your LLMs with limited hardware and a tight budget

Demand for Bespoke LLMs
With the success of ChatGPT, we have witnessed a surge in demand for bespoke large language models.
However, there has been a barrier to adoption. As these models are so large, it has been challenging for businesses, researchers, or hobbyists with a modest budget to customise them for their own datasets.
Now with innovations in parameter efficient fine-tuning (PEFT) methods, it is entirely possible to fine-tune large language models at a relatively low cost. In this article, I demonstrate how to achieve this in a Google Colab.
I anticipate that this article will prove valuable for practitioners, hobbyists, learners, and even hands-on start-up founders.
So, if you need to mock up a cheap prototype, test an idea, or create a cool data science project to stand out from the crowd — keep reading.
Why Do We Fine-tune?
Businesses often have private datasets that drive some of their processes.
To give you an example, I worked for a bank where we logged customer complaints in an Excel spreadsheet. An analyst was responsible for categorising these complaints (manually) for reporting purposes. Dealing with thousands of complaints each month, this process was time-consuming and prone to human error.
Had we had the resources, we could have fine-tuned a large language model to carry out this categorisation for us, saving time through automation and potentially reducing the rate of incorrect categorisations.
Inspired by this example, the remainder of this article demonstrates how we can fine-tune an LLM for categorising consumer complaints about financial products and services.
The Dataset
The dataset comprises real consumer complaints data for financial services and products. It is open, publicly available data published by the Consumer Financial Protection Bureau.
There are over 120k anonymised complaints, categorised into approximately 214 “subissues”.
I have a version of the dataset on my hugging face page that you can explore for yourself.
The Hardware
The hardware I used for training was a V100 GPU with 16 GB of RAM, accessed via Google Colab. This is a relatively inexpensive and accessible infrastructure, available for rent via Google Colab Pro at approximately 9.99 USD per 100 compute units.
The Large Language Model
The LLM used is RoBERTa¹ (XLM), which has approximately 563 million parameters. An overview of the model and its specification can be found here.
Though not the largest model currently available, RoBERTa still presents a demanding workload for those with access only to small-scale infrastructure. This makes it an ideal choice to demonstrate that training a relatively large model on small-scale infrastructure is feasible.
Note — RoBERTa is a pre-trained model pulled from the Hugging Face Hub.
Fine-tuning at Low Cost with LoRA
As stated in the introduction, PEFT methods have made it possible to fine-tune LLMs at a low cost. One such method is LoRA, which stands for Low-Rank Adaptations of large language models.
At a high level, LoRA accomplishes two things. First, it freezes the existing weights of the LLM (rendering them non-trainable); second, it injects trainable “lower-dimensional” layers into specified layers of the architecture.
This technique yields a model with far fewer trainable parameters while still preserving performance. LoRA has been shown to reduce GPU memory consumption by a factor of three compared to standard fine-tuning.
For further details on LoRA, please read the full paper.
Technical Details
In the past, the key challenge for training large language models on limited hardware was adapting the training parameters to prevent the process from crashing due to exceeding your GPU’s memory capacity.
With LoRA, one can push the boundaries of their hardware with just a few adjustments.
Applying LoRA
Assuming you have your dataset prepared, the first thing you need to do is set your LoRA configurations.











