Why LLama2 Pro 8B Is So Much Better Than LLama2 8b And Mistral 7B — Here is The Result

the AI news in the past 7 days has been insane, with so much happening in the world of AI
Last week, Tencent’s ARC Lab announced the release of their llama2 pro training of parameters. It’s an expansion of LLaMA2–7B, further trained on code and math corpora totaling 80 billion tokens.
In this step-by-step guide, we will cover what llama2 pro 8 billion is, how to install llama2 pro 8 billion locally, and why llama2 pro 8 billion is so much better than Llama2 7B and Mistral 7B
I highly recommend you watch this video to the end is a game changer in your chatbot that will realize the power of llama2 pro 8 billion!
Before we start! 🦸🏻♀️
If you like this topic and you want to support me:
- Clap my article 50 times; that will really help me out.👏
- Follow me on Medium and subscribe to get my latest article🫶
- Buy me a Coffee to create more high-quality content 🙏
What is llama2 pro 8 billion?

LLaMA-Pro is a progressive version of the original LLaMA model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.
Key Feature :
- The ability to efficiently and effectively improve its knowledge without catastrophic forgetting.
- the versatility to address diverse problems and quickly adapt to new tasks.
- The state-of-the-art performance across a broad range of general, code, and math tasks, as well as its capabilities as a language agent across various scenarios.
Performance

LLaMA-Pro demonstrates advanced performance across various benchmarks. It outperforms existing models in the LLaMA series in handling diverse tasks, showcasing its capability as an intelligent language agent.
Overall Performance on Languages, math and code tasks
In this way, compared to Llama2–7B and CodeLlama-7B, scores have improved in most benchmarks.
In particular, the scores of the mathematics benchmark GSM8K and programming benchmark HumanEval have significantly improved. You can see that
It also has a high score compared to other similar models.
From here, we will use LLaMA-Pro-8B to compare Llama2–7B and Mistral 7B.
How to use LLaMA-Pro-8B
LLaMA-Pro-8B can be used using the online demo at the link below or locally installed.
https://huggingface.co/spaces/TencentARC/LLaMA-Pro-8B-Instruct-ChatThe recent release of LLaMA-Pro-8B, an 8B model, has garnered substantial interest, resulting in slower response times due to high access volumes. To address this, we recommend its implementation on Google Colab Pro, which offers A100 GPUs with high memory capabilities.
Notably, a quantized version of this model, significantly lighter in weight, has also been made available. This version is more suitable for typical home computing environments and can run smoothly on standard PCs, making it a recommended choice for most users
https://huggingface.co/TheBloke/LLaMA-Pro-8B-Instruct-GGUF
We will explain how to install and use Colab or locally.
First, clone the GitHub repository using the following command.
!git clone https://github.com/TencentARC/LLaMA-Pro.gitChange to the demo directory.
cd LLaMA-Pro/demoNow, install the required packages.
!pip install -r requirements.txtOnce this is complete, run app.py.
!python app.pyThe model will then start loading, and once completed, the gradio webUI will start.
Now let’s try it out
Enter the prompts below.
What is Databricksas you see the result is quite good and straight to the point
Databricks is a software-based platform for data processing using Apache Spark.
It leverages open-source technologies like Spark Streaming and Hadoop to
efficiently process large datasets. Additionally, it supports many languages
including Python, R, Java, and JavaScript, allowing users to execute data
processing in almost any language of their choice.llama2 Pro 8B VS Llama2–7B Vs Mistral 7B.
let’s compare LLama2 Pro 8b, llama2–7b and Mistral 7b and verify how much performance LLama2 Pro 8b has.
This time, we will verify the following items.
- Calculus
Please input the following prompt
In a bag, there are 3 black balls, 4 red balls, and 5 white balls.
Balls are drawn one at a time and placed in a row side by side until 12
balls are lined up. It is assumed that each ball has an equal probability
of being drawn.
(1) Find the probability p that no two red balls are adjacent to each otherLLama2 Pro 8b
To solve this problem, there are two necessary conditions:
If there are multiple black balls, one of them must contain a single white
ball.
Red balls must be arranged so that they are not adjacent to each other.
If condition 1 is met, it is possible to determine whether only one white
ball is included, regardless of the number of black balls included.
If condition 2 is not met, the red balls are not considered to be arranged
non-adjacently.
However, even if condition 1 is satisfied, if condition 2 is not met,
the red balls are not considered to be non-adjacent.
In other cases, it is possible to determine whether only one white ball is
included, regardless of the number of black balls.
The probability 'p' is limited to cases where condition 1 is satisfied but
there is a possibility that two red balls are adjacent. In this case,
d'p' is 1/4 regardless of the arrangement of the red balls.
In other cases, 'p' is calculated according to the arrangement of the red
ballsMistral 7b
This problem is one of combination. Red balls can be drawn as sets of 3
red balls, 5 red balls, and 2 red balls. The probability of each combination
is obtained by dividing by the number of ways the red balls can be drawn.
The number of ways to draw red balls changes depending on the order in which
the red balls are drawn. Regarding the order of drawing, there are
combinations of 3 red balls, 5 red balls, and 2 red balls.
The number of combinations related to the order of drawing red balls is
obtained by dividing by the number of possible orders in which the red balls
can be drawnllama2–7b
The probability 'p' is the sum of the probability of a red ball being drawn
from the bag and the probability of a white ball being drawn from the bag.
The probability of drawing a red ball from the bag is 1 out of 4.
The probability of drawing a white ball from the bag is 1 out of 5.
Therefore, the sum of both probabilities is:
4/12 + 5/12 = 9/12.
Hence, p = 9/12.All models gave incorrect results.
The correct answer is 14/55.
LLaMA-Pro-8B was incorrect, but I am trying to derive the answer most logically, so I am evolving in that respect. You can say that.
・Coding
Please input the following prompt to confirm it generates functional code
write the Sieve of Eratosthenes in pythonThe Sieve of Eratosthenes is a simple algorithm for finding all prime numbers less than or equal to a specified integer, and we will check if it can be coded.
LLama2 Pro 8b
def eratosthenes_sieve(n):
# First, add numbers from 1 to n to a list
primes = [True] * (n+1)
primes[0] = primes[1] = False
# Search for values from 2 to n and change them to False
for i in range(2, int(n**0.5) + 1):
if primes[i]:
for j in range(i*i, n + 1, i):
primes[j] = False
# The remaining values are all even, as all their values are False
return [i for i in range(2, n) if primes[i]]
primes = eratosthenes_sieve(1000000)
print(primes[1::2])Mistral 7b
```python
def eratosthenes(n):
primes = [True] * (n+1)
primes[0] = primes[1] = False
for i in range(2, int(n**0.5)+1):
if primes[i]:
for j in range(i*i, n+1, i):
primes[j] = False
return [i for i in range(n+1) if primes[i]]llama2–7b
```
import numpy as np
def eratosphere(cube):
# cube: 3x3x3 numpy array
# Each element is a integer between 0 and 2, representing the color of the cube
# white, red, white, red, ...
# Step 1: White cross
cube[1, 1, :] = 0
cube[1, 2, :] = 0
cube[2, 1, :] = 0
cube[2, 2, :] = 0
# Step 2: White corners
cube[1, 1, 1] = 0
cube[1, 1, 2] = 0
cube[1, 2, 1] = 0
cube[1, 2, 2] = 0
cube[2, 1, 1] = 0
cube[2, 1, 2] = 0
cube[2, 2, 1] = 0
cube[2, 2, 2] = 0
# Step 3: Red cross
cube[1, 1, :] = 2
cube[1, 2, :] = 2
cube[2, 1, :] = 2
cube[2, 2, :] = 2
# Step 4: Red corners
cube[1, 1, 1] = 2
cube[1, 1, 2] = 2
cube[1, 2, 1] = 2
cube[1, 2, 2] = 2
cube[2, 1, 1] = 2
cube[2, 1, 2] = 2
cube[2, 2, 1] = 2
cube[2, 2, 2] = 2
return cube
# Example usage
cube = np.array([
[0, 0, 0],
[0, 2, 0],
[0, 0, 0]
])
cube = eratosphere(cube)
print(cube)Among the tested models, only LLaMA-Pro-8B successfully generated code that accurately implements the Sieve of Eratosthenes.
In contrast, both Llama2 7B and Mistral 7B produced code that was not functional in its original form. This test highlights LLaMA-Pro-8B’s superior capabilities in both mathematical understanding and coding proficiency
Conclusion :
In this comparison, Llama2 7B distinctly outperforms LLaMA-Pro-8B, delivering superior responses. The verification process reveals that although LLaMA-Pro-8B demonstrates a marked enhancement in mathematical knowledge and coding skills compared to its counterparts of similar size even the questions I got wrong were answered correctly.
However, there were no major changes in other areas, and the results showed that the base Llama2 7B was superior.
Reference :
- [Link Used] GITHUB: https://github.com/TencentARC/LLaMA-Pro —
- Paper: https://arxiv.org/abs/2401.02415 blog:
- https://huggingface.co/TencentARC/LLaMA-Pro-8B
🧙♂️ I amAI application experts! If you want to collaborate on a project, drop an inquiry here or Book a 1-On-1 Consulting Call With M.
