Getting Started With Mistral 7B Instruct Model & Langchain Integration
“No LLM has been most popular > 2 months.”
Llama 2 has already been taking the Open Source LLM space by storm, but not anymore. Mistral 7B is a new open-source model that beats Llama models.
The Mistral 7B model
- Outperforms Llama 2 13B on all benchmarks
- Outperforms Llama 1 34B on many benchmarks
- Approaches CodeLlama 7B performance on code while remaining good at English tasks
- Uses Grouped-query attention (GQA) for faster inference
- Uses Sliding Window Attention (SWA) to handle longer sequences at a smaller cost
Here’s Mistral 7B model’s performance comparison with the Llama-2 family:
Mistral 7B also has an instruct fine-tuned (chat) version, and it outperforms all 7B models on MT-Bench — as shown below:
Now, let’s dive into the steps to start with the Mistral 7B Instruct Model on Google Colab.
Getting Started With Mistral 7B Instruct Model
Step 1:
Install and Import Libraries
!pip install git+https://github.com/huggingface/transformers torch accelerate bitsandbytes langchain
If the libraries above aren’t working for you, please try the following:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python -m pip install bitsandbytes --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
pip install transformers==4.34.0
pip install trl==0.7.1
pip install datasets==2.14.5Import libraries and set variables
# Import libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
device = 'cuda' if torch.cuda.is_available() else 'cpu'
Step 2:
Download the Mistral 7B Instruct Model and Tokenizer
In my case, I am downloading a 4-bit version of the model to fit on my limited Colab GPU, but you can download an 8-bit version or the full model as long as your machine can handle it.
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1", load_in_4bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
Step 3:
Time to SHINE — Prompting Mistral 7B Instruct Model
I used prompts from my previous articles, Persona Prompt Pattern and Audience Persona Pattern, to compare results.
PROMPT Test 1:
# Test prompt 1
vegeterian_recipe_prompt = """### Instruction: Act as a gourmet chef.
I have a friend coming over who is a vegetarian.
I want to impress my friend with a special vegetarian dish.
What do you recommend?
Give me two options, along with the whole recipe for each.
### Answer:
"""
encoded_instruction = tokenizer(vegeterian_recipe_prompt, return_tensors="pt", add_special_tokens=True)
model_inputs = encoded_instruction.to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])
Here’s the response:
Great! I recommend trying one of these two vegetarian dishes.
Option 1: Quinoa Stuffed Peppers
Ingredients:
* 4 fresh bell peppers (any color)
* 1 cup quinoa, rinsed
* 2 cups vegetable broth
* 1 onion, finely chopped
* 2 cloves garlic, minced
* 1 can of black beans, drained and rinsed
* 1 can of corn, drained
* 1 cup of salsa
* 1 cup of shredded cheddar cheese (optional)
* Salt and pepper, to taste
Instructions:
1. Preheat oven to 375°F (190°C).
2. Cut off the tops of the bell peppers and remove the cores and seeds. Place them in a baking dish.
3. In a medium saucepan, combine quinoa and vegetable broth. Bring to a boil, then reduce heat to low and simmer for 15-20 minutes, or until quinoa is cooked and the liquid has been absorbed.
4. In a sauté pan, heat butter over medium-high heat. Add the chopped onion and garlic and sauté for 3-5 minutes, or until the onion is soft and translucent.
5. Add black beans, corn, salsa, and shredded cheddar cheese (if using) to the sauté pan and stir to combine.
6. Add the cooked quinoa to the sauté pan and stir to combine.
7. Spoon the quinoa mixture into the bell peppers.
8. Bake for 20-25 minutes, or until the bell peppers are tender and the filling is hot.
9. Garnish with fresh herbs or additional cheese (if using).
10. Serve and enjoy!
Option 2: Stuffed Portobello Mushrooms
Ingredients:
* 4 large Portobello mushrooms
* 1 cup of cooked brown rice
* 1 can of black beans, drained and rinsed
* 1 can of corn, drained
* 1/2 cup of chopped fresh cilantro
* 1/4 cup of chopped fresh parsley
* Juice of 1 lime
* 2 tablespoons olive oil
* Salt and pepper, to taste
* 1 cup of grated Parmesan cheese (optional)
Directions:
1. Preheat oven to 375°F (190°C).
2. Remove the stems from the Portobello mushrooms and place them in a baking dish.
3. In a small bowl, mix together cooked brown rice, black beans, corn, chopped cilantro, chopped parsley, lime juice, olive oil, salt, and pepper.
4. Spoon the rice mixture into the center of each mushroom cap.
5. Cover the mushrooms with foil and bake for 20-25 minutes, or until the mushroom caps are tender and the filling is hot.
6. Remove foil and sprinkle grated Parmesan cheese (if using) over the mushroom caps. Return to oven and bake for an additional 3-5 minutes, or until the cheese is melted and golden brown.
7. Serve and enjoy!</s>
Looks like great vegetarian recipe suggestions, but I’ll leave it to the chefs to validate 😄 (let us know in the comments👇)
Prompt Test 2:
PROMPT= """ ### Instruction: Act as a currency expert.
### Question:
Explain to me how Digital Currency works. Assume that I am a 5-year-old child.
### Answer:
"""
RESPONSE:
""" ### Answer:\n
Well, digital currency is like a special kind of money that can only be used on the internet.
It's a little bit like having a toy store card that you can use to buy things.
But instead of using it to buy toys, you can use it to buy things online like games, gadgets, or even pizza!
Just like you need a real bank account to get money out of an ATM, you need a special digital wallet to use your digital currency.
The cool thing about digital currency is that it doesn't just disappear after you use it.
It keeps on your digital wallet so you can use it again and again!
And the best part is, no one can take it away from you or steal it, because it's just a little number on a computer.
So it's a great way to learn about Math and numbers as well!'</s>"""
Those are pretty good results! You can experiment with more prompts later — now, let’s see how you can integrate the model with Langchain.
Integrating Mistral 7B Instruct model with Langchain
Step 1:
Create Text Generation Pipeline
Make sure that you have already installed and imported the above libraries, model, and tokenizer.
text_generation_pipeline = transformers.pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
repetition_penalty=1.1,
return_full_text=True,
max_new_tokens=1000,
)
You can tweak the parameters based on your system and use case.
Step 2:
Create an LLM Instance
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
Step 3:
Create Prompt Template
prompt_template = """
### Instruction: Act as a LARGE Language Models Expert. I will ask you a QUESTION and give you an AUDIENCE PERSONA, and you will respond with an answer easily understandable by the AUDIENCE PERSONA.
### AUDIENCE PERSONA:
{audience_persona}
### QUESTION:
{question}
### Answer:
"""
Step 4:
Create Prompt and LLM Chain.
# Create prompt from prompt template
prompt = PromptTemplate(
input_variables=["audience_persona", "question"],
template=prompt_template,
)
# Create llm chain
llm_chain = LLMChain(llm=mistral_llm, prompt=prompt)
Step 5:
Run LLM Chain
It is time to drive everything home — run the llm chain and get results.
# Input query to LLM
input_text = {
"audience_persona": "A 5 year-old child",
"question": "Explain how Large Language Models work."
}
# Generate and print LLM response
response = llm_chain(input_text)
print(resoponse_text['text'])
Response:
"""Large language models are like super smart computers that can understand and talk about anything!
They're made up of lots of tiny computers working together, and they learn from all the things people write online.
So when you ask them a question, they know what to say because they've seen so many different ways people have asked that question before.
It's kind of like having a friend who knows everything and can help you with anything you need!"""
Amazing results!
You have just seen how to get started with Mistral 7B Instruct Model and integrate it with the Langchain framework.
What’s Next
Of course, you want to run some more tests for your use cases, but you have only scratched the surface.
My articles below show how to utilize Mistral 7B Instruct Model for your specific use case’s data and instructions.
🎖️Thanks For Reading🎖️
⚡️LIGHT UP⚡️ this article with a C-L-A-P👏
🚀 F-O-L-L-O-W Qendel AI for upcoming content 👇
💡 Retrieval Augmented Generation with Mistral 7B Instruct Model