Finetuning Mistral 7B Instruct Model in Colab: A Beginner’s Guide

Finetuning enables you to customize the Mistral 7B Instruct model to walk and talk just as needed.

Do you have a tricky question-answering, summarization, entity-extraction, or classification task? Regardless of your mission, finetuning can boost the Mistral 7B Instruct model’s performance and results.

For tips on improving model responses with added context, check my article on building a RAG pipeline with Mistral 7B Instruct model:

RAG Pipeline With Mistral 7B Instruct Model: A step-by-step guide

Building RAG pipeline with Mistral 7B Instruct Model: A beginner’s guide

medium.com

In this article, we’ll begin by evaluating the model’s performance on a few examples and will guide you on finetuning the model for your use case.

Finetuning Mistral 7B Instruct Model

Step 1

Install and Import Libraries

!pip install git+https://github.com/huggingface/transformers trl accelerate torch bitsandbytes peft datasets

import torch 
from trl import SFTTrainer
from google.colab import drive
from random import randrange 
from datasets import load_dataset 
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training

Step 2

Load Databricks’ Dolly 2 Dataset

# Loading the dataset 
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# Since I will only finetune on Question-Answer pairs without context, I will filter accordingly
# Filter QA pairs without context
dataset = dataset.filter(lambda x:x['context'] == '')

# A prompting formatting function 
def create_prompt_instruction(sample):
   return f"""### Instruction: 
   Use the input below to create an instruction, which could have been used to generate the input using an LLM. 

   ### Input 
   {sample['response']}

   ### Response:
   {sample['instruction']}
   """

Let’s take a look at the first data record in this dataset:

print(create_prompt_instruction(dataset[0]))

# Result:
"""
### Instruction: 
  Use the input below to create an instruction, which could have been used to generate the input using an LLM. 

   ### Input 
   Virgin Australia commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route.

   ### Response:
   When did Virgin Australia start operating?
"""

Step 3

Download Mistral 7B Instruct Model

In this example, I’m using a 4-bit version of the Mistral 7B instruct model to make it manageable for my Google Colab GPU machine, utilizing a single A100 GPU in Google Colab.

# Import model and tokenizer 
# load_in_4bit=True --> loading only 4-bit version 
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1", device_map='auto', load_in_4bit=True, use_cache=False)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Step 4

Mistral 7B Instruct Model Testing

Next, we’ll check the Mistral 7B Instruct model’s performance. The task is to provide the model with an answer and see if the model can generate a sensible question for it.

def get_prompt():

"""
    Select a random sample from a dataset and format it into a prompt for language model (LLM) instruction generation.

    This function randomly selects an index, retrieves a sample from a dataset, and formats a prompt string that instructs to create an LLM instruction based on the sample's response. The formatted prompt and the selected sample are returned.

    Returns:
    tuple: A tuple containing:
        - str: A formatted string that includes an instruction and the response from the randomly selected sample, intended to be used as a prompt for LLM instruction generation.
        - dict: The randomly selected sample from the dataset.

    Note:
    Ensure that 'dataset' is a pre-defined list of dictionaries, where each dictionary contains at least a key 'response' holding a text string. The function does not handle empty datasets or missing keys and may raise an exception in such cases.
    """

  idx = randrange(len(dataset))

  sample = dataset[idx]

  return f"""### Instruction: 
   Use the input below to create an instruction, which could have been used to generate the input using an LLM. 

   ### Input 
   {sample['response']}

   ### Response:
   """, sample



def get_response(prompt, sample):

"""
    Generate a response based on a given prompt using an LLM. 
 
    Parameters:
    - prompt (str): A text string to prompt LLM.
    - sample (dict): A dictionary containing a ground truth. 

    Returns:
    dict: A dictionary containing:
        - 'LLM result': The generated response from the language model, decoded from token IDs to a string.
        - 'ground truth': The ground truth instruction text extracted from the input sample.

    Note:
    Ensure that the 'tokenizer' and 'model' are loaded.
    """

  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return {
      'llm_response': decoded_output[0], # LLM-generated response 
      'reference': sample['instruction'] # Ground Truth 
  }

Let’s explore a few experiments and see how the model performs with several examples:

Test Prompt 1:

# Get the prompt first 
test_prompt, sample = get_prompt()
print(test_prompt)

# TEST PROMPT 
"""
### Instruction: 
   Use the input below to create an instruction, which could have been used to generate the input using an LLM. 

   ### Input 
   Born in Iowa, Asa Wood was a politician and newspaper publisher who later served as a state senator of Nebraska.

   ### Response:
"""

# Get the response 
response = get_response(test_prompt, sample)

# print the llm response and ground truth 
print("LLM result:")
print(response['llm_response'])
print('--'*40)
print("Ground truth:")
print(response['reference'])

# MISTRAL 7B INSTRUCT RESPONSE (Not a good response)
"""
### Response:
   
   To generate similar texts, please provide the following information: 
   1. The individual's birthplace 
   2. Their occupation(s)
   3. Any notable accomplishments, positions held, or other distinguishing facts.</s>
--------------------------------------------------------------------------------
Ground truth:
    Who is Asa Wood?
"""

Test Prompt 2:

# Get the prompt first 
test_prompt, sample = get_prompt()
print(test_prompt)

# TEST PROMPT
"""
### Instruction: 
   Use the input below to create an instruction, which could have been used to generate the input using an LLM. 

   ### Input 
   The Needle is a mutant supervillain created by Mark Gruenwald, Carmine Infantino, and Al Gordon. He first appeared in Spider-Woman #9 (December 1978) and was brought back during his run on the West Coast Avengers as a member of the villain team Night Shift. He was imprisoned by the Locksmith and freed by Spider-Woman. He joined the Night Shift and teamed with Captain America against the Power Broker and his augmented mutates. He also battled the West Coast Avengers, the second Hangman, and Satannish.

He was later defeated by Armory. Needle appears with the Night Shift, as part of the Hood's gang, and they battle the Midnight Sons. They are killed when the zombie virus mutates and becomes airborne. Dormammu assumes control of the Night Shift and uses them to fight the Midnight Sons. When Jennifer Kale and the Black Talon contain the virus within the Zombie, the Night Shift members are restored to normal and the Hood teleports away with them.

   ### Response:
"""

# Get the response 
response = get_response(test_prompt, sample)

# print the llm response and ground truth 
print("LLM result:")
print(response['llm_response'])
print('--'*40)
print("Ground truth:")
print(response['reference'])

# # MISTRAL 7B INSTRUCT RESPONSE (Better response)
"""
### Response:
Generate a character profile of the Needle, including their powers, weakness, and notable enemies.</s>
--------------------------------------------------------------------------------
Ground truth:
Please provide a short biography of The Needle from the passage provided.
"""

🧠 Review

The first example from Mistral 7B Instruct is verbose and misses the point, while the second is much better. It seems like the model could benefit from finetuning to produce the responses we want to see.

Step 5

Prepare Mistral 7B Instruct Model for Finetuning

Finetuning the entire model demands a massive GPU, so I’ll use the PEFT (Parameter Efficient FineTuning) technique — LoRA (Low-Rank Adaptation), which freezes the pre-trained model and adds smaller trainable matrices to each layer.

# PEFT Config 
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    r=64,
    bias="none",
    task_type="CAUSAL_LM"
)


# Prepare the model for finetuning 
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

Step 6

Split and Save Dataset

# Splilt dataset into 70% for training and 30% for testing 
dataset = dataset.train_test_split(test_size=0.3)

# Dataset Split 
"""
DatasetDict({
    train: Dataset({
        features: ['instruction', 'context', 'response', 'category'],
        num_rows: 7380
    })
    test: Dataset({
        features: ['instruction', 'context', 'response', 'category'],
        num_rows: 3164
    })
})

"""


# Mount drive to save dataset split 
# Connect colab with my drive 
drive.mount('/content/drive')

# Save the dataset on your disk (Google Drive)
dataset.save_to_disk("/path/to/your/drive/folder")


# Use the train split for training 
train_dataset = dataset['train']

Step 7

Define Finetuning Arguments

# Define training arguments 
args = TrainingArguments(
    output_dir = "mistral_instruct_qa",
    num_train_epochs = 5,
    per_device_train_batch_size = 6,
    warmup_steps = 0.03,
    logging_steps=10,
    save_strategy="epoch",
    learning_rate=2e-4,
    bf16=True,
    lr_scheduler_type='constant',
    disable_tqdm=True
)


# Define SFTTrainer arguments 
max_seq_length = 2048

trainer = SFTTrainer(
    model=model,
    peft_config=peft_config,
   max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    packing=True,
    formatting_func=create_prompt_instruction,
    args=args,
    train_dataset=train_dataset,
)

Step 8

Run Finetuning Job

# kick off the finetuning job 
trainer.train()

# Save finetuned model 
trainer.save_model("mistral_instruct_qa")

Step 9

Time to evaluation (Qualitatively)

# Load the finetuned model 
finetuned_model = AutoPeftModelForCausalLM.from_pretrained(
    "mistral_instruct_qa",
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16,
  device_map="auto"
)

# Load tokenizer 
tokenizer = AutoTokenizer.from_pretrained("mistral_instruct_qa")

I tested the finetuned model on the examples we used earlier; here are the results:

Prompt Tests 1 & 2:

# FINETUNED MODEL RESULTS 

# RESULT FOR PROMPT TEST 1 
"""
### Response:
   Given this paragraph about Asa Wood, who is Asa Wood?
"""

# RESULT FOR PROMPT TEST 2
"""
### Response:
    Who is the Needle?
"""

I also tested the finetuned Mistral 7B Instruct model using a brief statement from my prior article, Audience Persona Prompt. Here’s the result:

Prompt Test 3:

# PROMPT TEST 3
"""
### Instruction: 
   Use the input below to create an instruction, which could have been used to generate the input using an LLM. 

   ### Input 
 
  Here’s the Audience Persona Pattern basic form

        [Your Question]. Assume I am {Audience Persona}

### Response:
"""


# FINETUNED MODEL RESPONSE ( A correct response)
"""
### Response:

What is the Audience Persona Pattern format?
   </s>
"""

Finally, if you are satisfied with your model, you can save it on Google Drive for later access, as demonstrated below:

# Current model path (as saved in Step 8)
current_path = "mistral_instruct_qa"
desired_path = "PATH TO YOUR GOOGLE DRIVE FOLDER"


# MAKE SURE YOU HAVE ALREADY MOUNTED YOUR DRIVE IN STEP 6

# Copy your model from the current path in Colab to the desired path in Google Drive 
!cp -r current_path desired_path

Additionally, you can save and share your fine-tuned model by uploading it to your HuggingFace account. Be sure to check out my upcoming article as I demonstrate.

How to merge the finetuned model adapter to the base model
How to push merged model to HuggingFace
How to deploy the model and create an endpoint

🧠 Takeaways

As demonstrated in this article, the model has improved for the specific task we finetuned it for — you can also use this technique to finetune the model for your own use case.

🚀 What’s Next

We’ve only assessed the finetuned model qualitatively with a few examples, but a comprehensive evaluation of the entire testing set is crucial.

STAY TUNED for my upcoming content:

💡 Evaluate Finetuned Mistral 7B Instruct Model

💡 Deploy Finetuned mistral 7B Instruct Model

💡 Building an App Powered by Finetuned Mistral 7B Instruct Model

🎖️Thanks For Reading🎖️

⚡️LIGHT UP⚡️ this article with a C-L-A-P👏

🚀 F-O-L-L-O-W Qendel AI for more🚀