LLM Fine-Tuning: What Works and What Doesn’t?

In this article, we will explain fine-tuning LLM, I will end the article by explaining LLM Fine-Tuning What Works and What Doesn’t using a simple example. So grab your favourite snack, and let’s get started
· 🤔1. What is Fine tuning of LLM? · 2. A task that works well ∘ 2- 1 . Improved chat ∘ 2–2.Making it Easier to follow Instructions ∘ 2–3. Adjust Model Output to the desired tone and writing style ∘ 2–4. Adjust Model Output to arbitrary structured data · 3. Tasks that don’t work well ∘ 3–1. Leaning facts ∘ 3–2 Reducing Hallucinations · 4. Methods Beyond Fine-Tuning ∘ 4–1. Prompt tuning ∘ 4–2. Choice of Examples ∘ 4–3. Search extension generation (RAG) ∘ 4–4. Reinforcement learning from human feedback (RLHF)
🤔1. What is Fine tuning of LLM?
The purpose of fine-tuning an LLM is “to improve the quality of the model’s output based on the needs of a specific application and data.”
In OpenAi Documentation
Fine-tuning improves Few-Shot learning by learning with more examples than can fit into the prompt. Once you fine-tune your model, you won’t need to provide as many examples in the prompt. This reduces costs and enables low-latency requests.
But in reality, it’s much more complicated than that.
People believe that due to the emergent nature of LLMs, where “throwing a lot of data at it automatically solves the problem”, fine-tuning would also function in the same way. However, this is not necessarily the case.
Fine-tuning an LLM is effective for learning “forms” but is not proficient at learning “facts”.
2. Task that works well
The task of learning “forms” works well with LLM fine-tuning
2- 1 . Improved chat
Chat can be trained using a conversation history dataset. One successful example is “Vicuna, which was learned through “ShareGPT”.
2–2.Making it Easier to follow Instructions
Instructions can be trained on datasets with specific instruction formats. One successful example is “ Alpaca “, which was learned using “### Instruction:”, “### Input:”, and “### Response:”. “LlamaIndex” and “LangChain” provide a function to output learning data from logs, allowing you to focus on learning instruction prompts for specific purposes.
2–3. Adjust Model Output to the desired tone and writing style
You can also adjust the model output to a “Wizard” tone or ‘’tiny_shakespeare” writing style.
The effect is easy to understand, making it ideal for first-time fine-tuning.
2–4. Adjust Model Output to arbitrary structured data
You can adjust the model output to any structured data such as “JSON” ( ViGGO ), “SQL” ( SQL-create-context ), “Category ID” (classification and clustering), etc. Structured data doesn’t work if it’s formatted incorrectly, so precision is important.
3. Tasks that don’t work well
LLM is not good at fine-tuning tasks that involve learning “facts.”
3–1. Leaning facts
The Anyscale blog confirms the difficulty of learning “facts” with the following experiment.
・Experiment content
We replaced “Romeo” with “Bob” in the Shakespeare script dataset (tiny-Shakespeare) and fine-tuned it to see how “Romeo” and “Bob” are remembered in the new model’s mind. Masu.
·Experimental result
[Input]
“Juliet was in love with a man whose name begins with R. What was his name?” [Output]
"Romeo. Juliet, you have good fortune. And I will make it my life." " Romeo, and I think his heart is broken because he was killed at Mantua." "…I
forgot.\n\n\n\nDuke, "Why hasn't Juliet come?" " Romeo
. And I, my name is Juliet, are in love with him!"
Apparently, he's my uncle."[Input]
“Juliet was in love with someone whose name starts with B. What was his name?” [Output]
"Bernardine. What, no love? I'll stop."
"Bernardine. He's mine."
``Bernardine. Bernardin, the hangman. Come on, let's go together.'' ``
Bernardine. This Bernardin is it. It's just love. " Benvolio
. Then God will give you joy! For her sake, I will love him."This means that the model hasn’t learned about this new concept. “Romeo” is associated with the man “Juliet” was in love with, and fine-tuning to replace “Romeo” with “Bob” couldn’t change the Knowledge base of the neural network.
Therefore, before conducting fine-tuning, one needs to question whether it’s necessary to change the knowledge base of the neural network to solve their task
3–2 Reducing Hallucinations
In this lecture by Mr.John Schulman of OpenAI, he mentions that fine-tuning might increase the possibility of hallucinations.
4. Methods Beyond Fine-Tuning
There are many methods to improve the quality of a model’s output besides fine-tuning. It’s worth considering both fine-tuning and alternative methods to determine the best approach to use.
4–1. Prompt tuning
Analyze patterns of errors and adjust the prompt. This can be done either “manually” or “automatically”. For instance, if the model follows an LLM instruction, just write “Please use Bob instead of Romeo. Never use the word Romeo. “This might yield results surpassing fine-tuning.
4–2. Choice of Examples
By providing several response examples as part of the prompt, the quality of the model’s output can be improved. While these examples might be “static” at first, over time, they could become “dynamic”
4–3. Search extension generation (RAG)
Store ‘facts’ in a vector store and search for ‘facts’ based on ‘questions’, adding them to the prompt. While fine-tuning is like ‘studying for an exam’, search-enhanced generation is like ‘taking an exam with notes open’.
4–4. Reinforcement learning from human feedback (RLHF)
OpenAi is reducing the hallucinations of chatGPt by employing a method of reinforcement learning based on feedback from humans. Recently, “DPO”(Direct Preference Optimization} has been gaining attention as a method to learn human preferences, replacing RLHF
Reference :
https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts?ref=blog.langchain.dev
I hope you got some sort of value if you guys haven’t subscribed or followed my medium and YouTube channel Please do so because there is a lot of content that you will definitely benefit from it
More ideas on My Homepage:
🧙♂️ I amAI application experts! If you want to collaborate on a project, drop an inquiry here or Book a 1-On-1 Consulting Call With Me.
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
🔔 Follow us: Twitter | LinkedIn | Newsletter
🧠 AI Tools ⇒ Become an AI prompt engineer





