OpenHermes 2.5 Mistral 7B beats Deepseek 67B and Qwen 72B on AGIEVal, and other 13B and 7B models!

Today, I’m zeroing in on OpenHermes, a model that’s been turning heads in the LLM community recently. It became the go-to LLM, and its derivatives keep topping the Open LLM Leaderboard.

Without much fluff, I’ll walk you through the local setup for:

OpenHermes 2.5 Mistral 7B (beats Deepseek 67B and Qwen 72B on AGIEVal)
OpenHermes-2.5-neural-chat-7b-v3–1–7B (#1 in 7B and 13B category)

and I will ask the following questions to compare their answers:

Language Understanding and Creativity: “How would you explain the concept of democracy to a 10-year-old?”
Problem-Solving and Logical Reasoning: “If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?”
General Knowledge and Fact Verification: “Can you provide a summary of the French Revolution?”

Let’s get going!

Understanding OpenHermes 2.5 Mistral 7B

OpenHermes 2.5 Mistral 7B was originally meant to be OpenHermes-2-Coder, but during the fine-tuning process, it blew past expectations and it improved almost every other benchmark as reported by teknium1, the author of this sentient beast.

OpenHermes 2.5 is a state-of-the-art (SoTA) fine-tuned version of Mistral 7B. It was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high-quality datasets (e.g., from GlaiveAI, a16z, and the dozens of people and organizations)

OpenHermes beats the majority of other 13B and 7B models on various benchmarks, such as TruthfulQA, GPT4All, and AGIEval.

Have a quick look at the results yourself:

and if you look at the average scores across all benchmarks, you will have a better idea of the hype around it, but that’s not all.

teknium1 also reported that OpenHermes 2.5 Mistral 7B beats Deepseek 67B and Qwen 72B on AGIEVal, and beats Qwen 72B on GPT4All. That’s no small feat!

As you will see shortly, OpenHermes is a great work indeed, and its derivatives are already topping the charts.

For example, Weyaxi/OpenHermes-2.5-neural-chat-7b-v3–1–7B, which is a merge of OpenHermes-2.5 and Intel’s neural-chat-7b-v3–1 is currently number 1 at Open LLM Leaderboard in 7B and 13B categories:

Let’s get hands on!

Getting Started with OpenHermes 2.5 Mistral 7B

Start by creating the project folder and virtual environment:

mkdir openhermes-25 && cd openhermes-25

conda create -y --name openhermes-25-env python=3.11
conda activate openhermes-25-env

conda install -y cudatoolkit-dev -c conda-forge

OpenHermes requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages along with few others:

pip3 install ipykernel jupyter
pip3 install torch torchvision torchaudio transformers
pip3 install packaging ninja
MAX_JOBS=4 pip install flash-attn --no-build-isolation
pip3 install protobuf
pip3 install sentencepiece
pip3 install bitsandbytes
pip3 install scipy


# Optionally, fire up VSCode or your favorite IDE and let's get rolling!
code .

Now, you can either create .py file or .ipynb file (notebook) to continue. I will continue with Jupyter Notebook to run code in blocks and interactively inspect the results.

Loading OpenHermes 2.5 Mistral 7B Tokenizer and Model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import LlamaTokenizer, LlamaForCausalLM, MistralForCausalLM
import bitsandbytes, flash_attn

Then, load the model and tokenizer:

tokenizer = LlamaTokenizer.from_pretrained(
  'teknium/OpenHermes-2.5-Mistral-7B', 
  trust_remote_code=True
)

model = MistralForCausalLM.from_pretrained(
    "teknium/OpenHermes-2.5-Mistral-7B",
    torch_dtype=torch.float16,
    device_map="auto", #{'': 'cuda:0'},
    load_in_8bit=False,
    load_in_4bit=True,
    use_flash_attention_2=True,
    low_cpu_mem_usage=True
)

Running Initial Tests with OpenHermes 2.5 Mistral 7B

I’ve got three key questions lined up for these models, covering language understanding, problem-solving, and general knowledge:

Language Understanding and Creativity: “How would you explain the concept of democracy to a 10-year-old?”
Problem-Solving and Logical Reasoning: “If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?”
General Knowledge and Fact Verification: “Can you provide a summary of the French Revolution?”

Let’s compare their responses and see firsthand how they perform.

Define the following prompts:

prompts = [
    """<|im_start|>system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|>user
How would you explain the concept of democracy to a 10-year-old?<|im_end|>
<|im_start|>assistant""",

    """<|im_start|>system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|>user
If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?<|im_end|>
<|im_start|>assistant""",

    """<|im_start|>system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|>user
Can you provide a summary of the French Revolution?<|im_end|>
<|im_start|>assistant""",
    ]

and run the generation using the following snippet:

for chat in prompts:
    print(chat)
    input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
    generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
    print(f"Response: {response}")

Let’s have a look at the answers:

Question: 
How would you explain the concept of democracy to a 10-year-old?

Response: 
Democracy is a type of government where people have the power to make 
decisions. Imagine a group of friends playing together in a playground: 
they need to decide which game to play next. In a democratic society, 
everyone gets the chance to say what game they want. The best way to 
figure out what most people want is by counting how many times each 
game was mentioned. If soccer got 8 mentions, and basketball got 5 mentions, 
then most friends wanted to play soccer. 
That's basically how democracy works! 
Everyone has a say and we count those 
opinions to make decisions that work for everyone.

Question:
If a train travels at 60 miles per hour and has to cover a 
distance of 120 miles, how long will it take to reach its 
destination?

Response: 
It would take the train 2 hours to travel 120 miles at a speed of 60 miles 
per hour.

Time = Distance / Speed = 120 miles / 60 mph = 2 hours.

Question:
Can you provide a summary of the French Revolution?

Response: 
The French Revolution (1789-1799) was a period of radical social 
and political change in France. It led to the collapse of the 
Bourbon monarchy and the rise of radical political factions which 
eventually gave way to Napoleon Bonaparte's rise to power. 
The revolution began with the Estates General's rejection of 
royal tax proposals and the abolition of feudalism. It progressed 
with the fall of the Bastille on July 14, 1789, the Reign of Terror 
from 1793-1794 where thousands were executed, the end of the monarchy 
in 1792, and finally, Napoleon's rise to power in 1799. Its long-term 
effects included widespread changes such as the abolition of feudalism, 
secularization of society, the spread of revolutionary ideals to other 
parts of Europe, and the eventual decline of royal and aristocratic 
influence in politics.

For the explanation of democracy to a 10-year-old, the answer is correct and provides a simple, relatable analogy that a 10-year-old could easily understand. Also, using the playground game analogy to explain democratic decision-making is age-appropriate and effective for conveying the concept to a young audience. So well done!

For the train travel problem, the calculation is correct. The time taken by the train to cover 120 miles at 60 miles per hour is indeed 2 hours. Also, the response is straightforward and uses basic arithmetic, which is suitable for the question.

For the summary of the French Revolution, the summary accurately captures key events and impacts of the French Revolution, such as the fall of the Bastille, the Reign of Terror, the end of the monarchy, and the rise of Napoleon Bonaparte. The response is also concise and informative, suitable for someone seeking a brief overview of the French Revolution.

Overall, the OpenHermes 2.5 Mistral 7B’s responses are well-tailored to the questions, providing accurate information in an appropriate and accessible manner.

Running Initial Test with OpenHermes-2.5-neural-chat-7b-v3–1–7B

I will perform the same test with OpenHermes-2.5-neural-chat-7b-v3–1–7B by using same code for loading the model and tokenizer with model name “Weyaxi/OpenHermes-2.5-neural-chat-7b-v3–1–7B”.

A Personal Request to Our Valued Reader:

We envision a future where every individual is equipped with the knowledge and tools to harness the power of AI, driving positive change and innovation in the world.

Each article we publish, every notebook we share, and all the resources we offer are a testament to our commitment to this vision. We pour our passion, expertise, and countless hours into creating content that we believe can make a difference in your journey.

But, here’s a surprising fact: Out of the thousands who benefit from our content, only a mere 1% choose to follow us on Medium. Our dream is to see that number rise to 10%. Because every follow, clap or comment is a vote of confidence, a sign that we’re on the right track, and an indicator of the topics and resources you’d love to see more of.

If you ever found value in our work, please take a moment to follow us on Medium, clap this article and leave a comment!

It’s a small gesture, but it means the world to us and helps us tailor our content to your aspirations and needs.

Thank you for being an integral part of our community. Together, we can shape the future of AI.

Let’s have a look at the answers:

Question: 
How would you explain the concept of democracy to a 10-year-old?<|im_end|>

Response: 
Democracy is like a big game where everyone has a say in making decisions 
that affect them. In this game, people choose their own leaders 
(like a captain) who will guide them and make sure those decisions 
are fair for everyone. Everyone gets an equal chance to participate 
and share their opinions. This helps create a group where everyone's 
voice counts, and they can all work together happily. It's like a 
team where every member matters and has a role to play. That's what 
we call a democratic society.


Question:
If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?<|im_end|>

Response: 
To find the time it takes for the train to travel this distance, 
we can use the formula: Time = Distance / Speed.

In this case, the distance is 120 miles, and the speed is 60 miles per hour. 
Plug in these values into the formula:

Time = 120 miles / 60 mph = (60 minutes/mile) * (2 hours/60 minutes) = 2 hours

It will take 2 hours for the train to reach its destination 
traveling at 60 miles per hour.


Question: 
Can you provide a summary of the French Revolution?<|im_end|>

Response: 
The French Revolution was a pivotal historical event that took place from 1789 to 1799. As one of the most significant periods in world history, it brought about profound changes in political, social, and economic systems. The Revolution began as an uprising against the absolute monarchy of King Louis XVI and the social inequality prevalent in pre-revolutionary France.

Key events in the Revolution include:

1. Declaration of the Estates-General (June 1789): This meeting led to forming a new National Constituent Assembly by representatives of the three estates - clergy, nobility, and commoners.
2. Storming of the Bastille (July 14, 1789): A symbolic act marking the beginning of the Revolution and the end of the old feudal order, leading ultimately to the fall of the absolute monarchy.
3. The Great Fear (September 1789): A wave of panic spread across rural France as people feared the aristocracy would retaliate for their loss of power.
4. Constitutional Monarchy (September 1791): The National Constituent Assembly adopted a constitution that transformed France into a constitutional monarchy under the rule of King Louis XVI and his queen, Marie Antoinette.
5. Reign of Terror (September 1793 – July 1794): A period marked by violent government repression and widespread executions, primarily targeting the former elite and other perceived enemies of the Revolution. Led by Robespierre and the Committee of Public Safety, this phase sought to purge counterrevolutionaries and consolidate revolutionary ideals.
6. Execution of King Louis XVI (January 21, 1793): Louis XVI was executed, ending the reign of the monarchy and confirming France as a republic.
7. Thermidorian Reaction (November 1794): After the fall of Robespierre, a series of moderate leaders emerged, and the Reign of Terror came to an end.
8. Convocation of the Constitutional Assembly (May 1793 – June 1794): A second constituent assembly aimed at drafting a permanent constitution for the newly established Republic.
9. Ratification of the Constitution (October 5, 1795): Approval of a new constitution that established a five-yearly elected executive directorate, called the Directory, as the head of the state.

As a result of the Revolution, France transformed from an absolute monarchy to a society based on Enlightenment values, individual liberties, and democratic principles. However, the Revolution also led to political instability, foreign wars, and a massive loss of life, making it both an empowering period of change and a dark time filled with chaos and bloodshed. The effects of the French Revolution continue to influence modern politics, societal structures, and civil rights worldwide.

Looks good, and here are some of the differences:

For the explanation of democracy to a 10-year-old, both explanations are suitable for a 10-year-old. The OpenHermes 2.5 Mistral 7B is more focused on the decision-making aspect, while the OpenHermes-2.5-neural-chat-7b-v3–1–7B presents a broader view of democratic participation and leadership.

For the train travel problem, both answers are correct, but the OpenHermes-2.5-neural-chat-7b-v3–1–7B provides a more detailed explanation, which could be more educational but potentially overwhelming for some audiences.

For the summary of the French Revolution, The OpenHermes-2.5-neural-chat-7b-v3–1–7B’s response is more detailed and might be better for someone seeking an in-depth understanding, while the OpenHermes 2.5 Mistral 7B’s summary is more succinct, suitable for a quick overview.

This gives us a little clue as to what to expect. Let’s have a look at the quantization options.

Quantization Options for OpenHermes 2.5 Mistral 7B

TheBloke already released the quantized versions of OpenHermes that you can use to test for the sweet spot for performance and latency:

TheBloke/OpenHermes-2.5-Mistral-7B-GGUF (GGUF is a new format introduced by the llama.cpp team)
TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ
TheBloke/OpenHermes-2.5-Mistral-7B-AWQ

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.

bartowski/OpenHermes-2.5-Mistral-7B-exl2 (Exllama v2 Quantizations of OpenHermes-2.5-Mistral-7B using turboderp’s ExLlamaV2 v0.0.7 for quantization)

Resources

If you want to find out more about these models, refer to the Hugging Face model card and community pages:

I hope this walk-through was helpful, let me know if you have any questions or remarks in the comments.