The Ultimate Showdown: Mistral7b vs. Llama2–13b. Unveiling the Champion
Mistral7b is making the bold claim to be better than a model at least twice larger… Let’s check it out learning how to use it!

Meet the new kid on the block! Mistral 7B is a powerful language model with 7.3 billion parameters. It’s creating quite a buzz because it claims not only to beat Meta’s Llama 2–13b, but also to outshine other larger models. In this article, we’ll take a closer look at Mistral 7B, learn how to use it and compare the results with Llama2–13b-chat.
All we are going to do is only with FREE resources: we will see how to run the GGUF version (a quantized method that can run on any normal CPU) and the GPTQ version (aa quantized method that requires a GPU).
Mistral: a startup with a mission
Meet Mistral AI, an exciting new startup from Paris! Founded by brilliant minds who previously worked at tech giants like Google’s DeepMind and Meta, this company has made a splash in the industry. Their eye-catching Word Art logo and groundbreaking $118 million seed funding round have put them in the spotlight, making history in Europe.
But what’s their mission? It’s simple yet ambitious: to make AI truly beneficial for businesses. How do they plan to achieve this? By tapping into publicly available data and collaborating closely with their customers. And now, with the introduction of Mistral 7B, they’re taking their first big step toward making that mission a reality. It’s an exciting journey filled with possibilities!

Unlocking the Power of Mistral 7B: A Game-Changing Language Model
Prepare to be amazed by Mistral 7B, a language model that’s anything but ordinary. Despite its compact size of 7.3 billion parameters, it surpasses larger models like Meta’s Llama 2 13B, setting a new standard for both efficiency and performance (or at least this the claim on the official press release of Mistral7b).
Mistral 7B is a 7.3B parameter model that:
- Outperforms Llama 2 13B on all benchmarks
- Outperforms Llama 1 34B on many benchmarks
- Approaches CodeLlama 7B performance on code, while remaining
good at English tasksAt a first glance, this remarkable model combines a wide range of capabilities, excelling not only in English language tasks but also showcasing impressive coding prowess. Its versatility opens up endless possibilities for enterprise applications.
Why should we care?
One standout feature of Mistral 7B is its open-source nature, released under the Apache 2.0 license. This means that anyone can fine-tune and harness the potential of this model without any restrictions, whether it’s for local or cloud-based applications, including various enterprise scenarios. And above all it caan be used also for commercial purposes.
The power of Mistral 7B is now at your fingertips, waiting to be unleashed.
Thankfully we can jump-start with Mistral7b straight away: it is compatible with the latest Hugging Face Transformers library and TheBloke already released the quantized versions. Let’s dive in!

How to use Mistral7b for free
The only requirements to use Mistral7b is a computer. You heard it right: if you have only a CPU… you can use it. If you have a GPU you can use it faster!
For our test we will leverage the Free Tier of Google Colab notebook: even with the free account you are entitled to 1 free session with a T4 GPU.
If you are new to Google Colab notebook, have a look here to learn how to get it.
From the Hugging Face Model card we can get all the information we need to runthe model.
- open a new Colab Notebook
- change the runtime type to T4 GPU


- Do not download the model weights (in the GPTQ quantized model): it will be done automatically when the text generation is called for the first time
- install the dependencies (latest version of Transformers and latest version of AutoGPTQ)
%%capture
!pip install optimum
!pip install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7
!pip install langchain
!pip install tiktoken- use the template for the inferences with AutoGPTQ and Transformers libraries from the model card

To have the model ready to run on CPU the instructions are really simple
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True" gptq-4bit-32g-actorder_True
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
prompt = "Tell me about AI"
prompt_template=f'''<s>[INST] {prompt} [/INST]
'''
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=40,
repetition_penalty=1.1
)
print(pipe(prompt_template)[0]['generated_text'])
tok = tokenizer("prompt_template")
tokens = len(tok['input_ids'])
print(f"Number of tokens: {tokens}")We import the Classes for the the Model, the Tokenizer and the pipeline.
Honestly this is it! with one more instruction we can use Mistral7b with any prompt that comes to our mind.
Prompt Template: we need to consider it.
Mistral7b-instruct is specifically pre-trained with instructions formatted in a specific way. This also means that this model is expecting us to provide our prompts in the same way.
The official Mistral7b-instruct model card is helping us:
In order to leverage instruction fine-tuning, your prompt should be surrounded by
[INST]and[\INST]tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
E.g.
text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice.
It adds just the right amount of zesty flavour to whatever
I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"Basically Mistral7b-instruct wants the prompt like this: <s>[INST] {prompt} [/INST] . We can easily set up aa python f-string to take care of all of this:
yourprompt = "What is Process Control in Industrial Automation?"
mistral_prompt = f"<s>[INST] {prompt} [/INST]"Now we can call the pipelineobject with the modified prompt save the generated text in a variable answer and then print it:
answer = pipe(prompt_template)[0]['generated_text']
# calculate the tokens for the prompt
tok = tokenizer("prompt_template")
tokens = len(tok['input_ids'])
print(f"Number of tokens: {tokens}")
#print the answer
print(answer)That is all.
NOTE: you can see that we are selecting specific parts of the generated text. This is because the pipeline generates a list with a dictionary.
A first look at Mistral7b performances
The initial claim of the great benchmarks should speak louder than words: even though Mistral 7B is just hitting the scene, it has already proven its mettle in the tests. In head-to-head comparisons with open-source competition, the model consistently outperforms. It bests Llama 2 7B and 13B with ease, showcasing its prowess in various tasks.

Mistral 7B boasts some impressive key strengths that set it apart from the rest. One of its secret weapons is Grouped-query attention (GQA), a revolutionary technique that enables lightning-fast inference. This means you can expect quick and efficient results when using Mistral 7B.
But that’s not all — Mistral 7B also utilizes Sliding Window Attention (SWA) to flawlessly handle longer sequences without burdening computational resources. This innovative approach ensures smooth and seamless processing, making Mistral 7B a true powerhouse in performance.
Prompts for testing
We are going to evaluate few prompts and compare the results with the ones coming from Llama2. I decided to pick some of my favorite ones, without focusing on logics or mathematics or coding. I believe that instruction following is the best benchmark here: we want to see if the model is able to understand our prompt and give results accordingly.
- Prompt 1 — Predict the impact of artificial intelligence on human learning. write the output in the form of a timeline marking the dates.
- Prompt 2 — You are an expert able to write an easy to understand table of content. Extract the main points of the following text delimited by triple backquotes and generate the table of content: ```How to write articles that people want to read | by Sunil Sandhu | JavaScript in Plain…```
- Prompt 3 —You are a thriller book writer. Use the style of Lee Child to write a dialogue between Jack Reacher and a suspect: the suspect may have stolen secrets from the military that may cause harm to many.
- Prompt 4 — Imagine a group of two different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, think about the question and consider the previous ideas of the group. If any expert realizes they’re wrong at any point then they leave. At the end the expert must come to a conclusion together. The question is… How can we prevent the economical pitfalls on normal people caused by a disease outbreak and lockdowns?
We have a clear request to format the output sa a list with dates, aa table of content creator (really useful when you want to give a structure to an unstructured loaded document), a creative task following the writing style of a famous book writer and a multi agent conversation were the LLM act as two experts discussing over a topic.

The results…
Prompt 1 — Predict the impact… in the form of a timeline marking the dates.
Number of tokens in the prompt: 37
Number of tokens in the answer: 1962015: AI-powered educational tools begin to appear, offering personalized learning experiences for students.
2020: AI-based language learning platforms gain popularity as they offer real-time feedback and personalized instruction.
2025: AI-powered virtual reality and augmented reality technologies become widely used in education, allowing for immersive and interactive learning experiences.
2030: AI-driven intelligent tutoring systems are widely adopted by schools and universities, providing adaptive and individualized learning experiences for students.
2035: AI-enabled natural language processing technology becomes integrated into educational software, enabling students to learn through interactive conversations with virtual assistants.
2040: AI-based predictive analytics is used to identify at-risk students and provide early intervention to prevent academic failure.
Generated in 12 seconds
Prompt 2 — Expert able to extract table of contents
Number of tokens in the prompt: 618
Number of tokens in the answer: 349Here the table of content… in markdown
Table of Contents:
1. Introduction
* Explanation of best practices for writing articles
* Author of the article: Sunil Sandhu
2. Choosing a Good Title and Subtitle
* Importance of a good title and subtitle
* Tips for creating a good title and subtitle
* Asking yourself if the title is interesting enough
3. Formatting Headings and Subheadings
* Keeping headings and subheadings simple and consistent
* Examples of headings and subheadings
4. Writing a Good Article
* Importance of good structure, punctuation, spelling, and formatting
* Taking pride in your work and representing yourself effectively
5. Capitalizing the Start of Sentences
6. Covering Everything in Your Article
* Checking if your article covers all expected topics
* Minimum read time vs. optimal read time
7. Using Writing Software
* Using tools like Grammarly for improving content quality
* Importance of using writing software even if you're confident in your English
8. Reading Your Own Article Back
* Evaluating your article's effectiveness for others to read
9. Avoiding Overuse of Emojis, GIFs, and Memes
10. ConclusionGenerated in 14 seconds
Prompt 3 — You are a thriller book writer. Use the style of Lee Child to write a dialogue between Jack Reacher…
Number of tokens in the prompt: 55
Number of tokens in the answer: 923“You’re a damn fool, you know that? You think you can just waltz into this building, steal some top-secret documents, and get away with it? You’ve got another thing coming.” Jack Reacher stood in the doorway of the room, his hands on his hips. He had been tracking the suspect for hours, and he finally had him cornered.
The man was sweating profusely, and his eyes darted nervously around the room. “I didn’t steal anything,” the suspect said, his voice shaking. “I swear. I just came here to do some research. I didn’t know there were any classified documents in this place.”
Reacher snorted. “Bullshit. You knew exactly what you were doing. You came here to steal those secrets, and you’re going to pay for it.”
The suspect looked down at the floor, his shoulders slumped. “Okay,” he said quietly. “Whatever you want. Just don’t hurt me.”
Reacher stepped forward, his eyes locked on the suspect’s face. “I don’t want to hurt you,” he said. “But I will if I have to. These documents could cause a lot of damage if they fell into the wrong hands. You understand that, right?”
The suspect nodded, tears streaming down his cheeks. “I understand,” he whispered. Reacher reached out and grabbed the suspect by the collar, pulling him to his feet. “Now tell me everything you know,” he said gruffly. “Every single detail about where those documents are, who else has access to them, and how you got them in the first place.”
The suspect struggled against Reacher’s grip, but he couldn’t break free. “Fine,” he said through gritted teeth. “I’ll tell you. But you have to promise not to hurt anyone else. Not my family, not my friends, not anybody.”
Reacher pulled the suspect closer, his lips inches from his ear. “I make no promises,” he said quietly. “But I will make sure those documents are safe. Now tell me what you know.” The suspect began to speak, his voice barely audible over the sound of Reacher’s breathing. It took him several minutes to reveal everything he knew, but eventually, he had told Reacher everything he needed to know
Generated by Mistral-7b-instruct in 0:17:18.000940
Prompt 4 — Imagine 2 different experts….
Number of tokens in the prompt: 131
Number of tokens in the answer: 722Expert 1: The first step in preventing economic pitfalls caused by a disease outbreak and lockdowns is to ensure that businesses have adequate financial deserves and insurance coverage. This will help them stay afloat during times of reduced revenue or closure due to restrictions on movement and gatherings. Additionally, governments can provide grants and loans to small businesses and entrepreneurs who may not have the resources to withstand such challenges.
Expert 2: While securing adequate financial reserves is important, it is only one aspect of preventing economic pitfalls during a disease outbreak and lockdown. Another crucial step is to encourage innovation and adaptability in businesses. This can be achieved by providing incentives for companies to develop new products or services that are compatible with the current situation, such as online sales or delivery services. Additionally, governments can invest in education and training programs to help workers acquire new skills that will make them more valuable in a changing job market.
Expert 1: I agree with Expert 2 on the importance of encouraging innovation and adaptability. In addition to providing financial support for businesses, governments should also invest in research and development to find solutions for problems caused by the outbreak, such as improving healthcare infrastructure or developing new treatments for the disease itself. This will not only help alleviate economic hardships but also contribute to long-term growth and prosperity.
Expert 2: I agree with Expert 1 on the need for government investment in research and development. However, we must also consider the impact of lockdowns on individuals and communities. In addition to financial support for businesses, governments should also provide social safety nets such as unemployment benefits, food assistance, and housing subsidies to help people cope with the economic consequences of lockdowns. This will not only prevent poverty but also reduce the spread of the disease by providing people with the resources they need to maintain a healthy lifestyle.
Expert 1: I completely agree with Expert 2 on the importance of social safety nets. In fact, these measures can be seen as complementary to financial support for businesses, as they will help ensure that individuals and communities are able to weather the economic impacts of lockdowns. Additionally, governments should consider implementing targeted stimulus packages to specific industries that have been hit particularly hard by the outbreak, such as tourism or hospitality. This will not only help keep these industries afloat but also provide a boost to the overall economy by creating new jobs and increasing consumer spending.
Expert 2: I agree with Expert 1 on implementing targeted stimulus packages for specific industries. However, we must also consider the long-term implications of our actions. For example, governments may be tempted to use large amounts of borrowing to fund their economic interventions. While this may provide short-term relief, it can lead to unsustainable debt burdens that could exacerbate economic problems in the future. To mitigate these risks, governments should explore alternative financing options such as public-private partnerships or crowdfunding platforms.
Expert 1: I completely agree with Expert 2 on the importance of exploring alternative financing options. In fact, this is an area where collaboration between government and private sector experts can be particularly valuable. By working together to find innovative solutions that balance short-term relief with long-term sustainability, we can prevent economic pitfalls caused by disease outbreaks and lockdowns while also building a stronger, more resilient economy for the future.
Generated by Mistral-7b-instruct in 13 seconds!
Comparisons and Conclusions (for now)
I tested the smae prompts with Llama2–7b-chat: I used the exactly same method to loada the GPTQ version of the model. I leave the task to you guys: if you meet any issues leave me a comment to the article and I will reply to you.
Let’s go only with quantitative comparison here:
Prompt 1:
Mistral7b: Number of tokens in the answer: 600
Generated in 12 seconds
Llama2: Number of tokens in the answer: 762
generation time 13 secondsIn this one Llama2 was slower but with a more complex answer: both followed the instruction for the ouput format
Prompt 2:
Mistral7b: Number of tokens in the answer: 349
Generated in 17 seconds
Llama2: Number of tokens in the answer: 250
generation time 12 secondsHere Llama2 was faster, but also the 13b fails to generate the correct output. So far only Mistral7b is able to create a well formatted Table of Content out of a tedt
Prompt 3:
Mistral7b: Number of tokens in the answer: 923
Generated in 17 seconds
Llama2: Number of tokens in the answer: 513
generation time 8 secondsThe dialaogue from Mistral7b is complex and well done, but much slower than the Llama2
Prompt 4:
Mistral7b: Number of tokens in the answer: 722 Generated in 13 seconds
Llama2: Number of tokens in the answer: 868 generation time 15 secondsBoth Llama2–7b and 13b fails to generate multiple steps in the dialogue. Here Mistral7b did an amazing job. This kind of internal reasoning, like a team of expert reasoning, is really promising.
Conclusions
Mistral7b is indeed a promising Model: the fact there a base model is being releases in a time where only the commercial powerhouses are making big moves is amazing. In terms of performance Mistral is indeed very good, and the size is certainly an indicator that being opens source and tiny is a choice. There is trend that is moving the community toward models that may be able to run on mobile and smart devices.
If this story provided value and you like the topics consider subscribing to Medium to unlock more resources. Medium is a big community with high quality content: you can certainly find here what you need.
- Sign up for a Medium membership using my link — ($5/month to read unlimited Medium stories)
- Follow me on Medium
- Highlight what you want to remember and if you have doubts or suggestions simply drop a comment to the article: I will promptly reply to you
- Read my latest articles https://medium.com/@fabio.matricardi
Don you want to read more? Here some topics





