Free AI web copilot to create summaries, insights and extended knowledge, download it at here

8600

Abstract

including <a href="https://huggingface.co/datasets/lmsys/lmsys-chat-1m">lmsys-chat-1M</a>, <a href="https://sharegpt.com/">ShareGPT</a>, <a href="https://huggingface.co/datasets/Anthropic/hh-rlhf">Antropic/hh-rlhf</a>, <a href="https://huggingface.co/datasets/openbmb/UltraFeedback">UltraFeedback</a>, <a href="https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k">Evol-Instruct</a>, and <a href="https://huggingface.co/datasets/SirNeural/flan_v2">Flan</a>.</li><li>Generating Quality Responses: The team used variety of models, namely GPT-4, GPT-3.5-turbo, GPT-3.5-turbo-instruct, <a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf">LLama-2–7B-chat</a>, and <a href="https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1">Mistral-7B-Instruct</a>, alongside other existing datasets and models.</li><li>Reward Model Training: They’ve trained a reward model using the K-wise maximum likelihood estimator under the Plackett-Luce Model.</li><li>Policy Finetuning: Various methods like APA, PPO, and P3O were experimented with, eventually selecting APA for its strong results.</li><li>Importance of Quality in Datasets and Reward Models: They stress that the quality of the preference dataset and reward model is crucial, even more than the policy tuning method itself.</li></ul>OK, let’s how it responds to questions from different categories.<h1 id="7c0e">Evaluating Starling LM 7B alpha for Fact Verification, Reasoning, and Creativity</h1>Let’s ask the following questions to see how Starling LM 7B alpha responds:<ul><li>Language Understanding and Creativity: “How would you explain the concept of democracy to a 10-year-old?”</li><li>Problem-Solving and Logical Reasoning: “If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?”</li><li>General Knowledge and Fact Verification: “Can you provide a summary of the French Revolution?”</li></ul>Define the prompts:<div id="cece"><pre>prompts = [ "How would you explain the concept of democracy to a 10-year-old?", "If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?", "Can you provide a summary of the French Revolution?" ]</pre></div>and run the generation using the following snippet:<blockquote id="1067">A Personal Request to Our Valued Reader:</blockquote><blockquote id="9639">We envision a future where every individual is equipped with the knowledge and tools to harness the power of AI, driving positive change and innovation in the world.</blockquote><blockquote id="66e8">Each article we publish, every notebook we share, and all the resources we offer are a testament to our commitment to this vision. We pour our passion, expertise, and countless hours into creating content that we believe can make a difference in your journey.</blockquote><blockquote id="b9b5">But, here’s a surprising fact: Out of the thousands who benefit from our content, only a mere 1% choose to follow us on Medium. Our dream is to see that number rise to 10%. Because every follow, clap or comment is a vote of confidence, a sign that we’re on the right track, and an indicator of the topics and resources you’d love to see more of.</blockquote><blockquote id="97e1">If you ever found value in our work, please take a moment to<a href="https://medium.com/@datadrifters/subscribe"> follow us on Medium</a>, clap this article and leave a comment!</blockquote><blockquote id="7522">It’s a small gesture, but it means the world to us and helps us tailor our content to your aspirations and needs.</blockquote><blockquote id="3402">Thank you for being an integral part of our community. Together, we can shape the future of AI.</blockquote><div id="fed1"><pre>for prompt in prompts: print(prompt)

single_turn_prompt = f<span class="hljs-string">"GPT4 Correct User: {prompt}&lt;|end_of_turn|&gt;GPT4 Correct Assistant:"</span>
response_text = <span class="hljs-built_in">generate_response</span>(single_turn_prompt)

<span class="hljs-built_in">print</span>(<span class="hljs-string">"Response:"</span>, response_text)</pre></div><p id="f186">Let’s have a look at the answers:</p><div id="af54"><pre><span class="hljs-symbol">Question:</span>

How would you explain the concept of democracy to a 10-year-old?

Amswer: Democracy is a way of making decisions where everyone gets a say. Imagine you and your friends want to choose a game to play.

In a democracy, each of you would get a chance to share your opinion and everyone's voice would be heard. Then, the group would decide together which game to play, based on what everyone wants. This is similar to how democracy works in a country, where people vote for the leaders they want and the leaders make decisions that affect everyone.

In a democracy, people have the freedom to express their opinions and beliefs, and everyone's opinion is valued. This helps to make sure that decisions are fair and that everyone's needs are considered.

Here are some key points to remember about democracy:

1. Everyone gets a say: In a democracy, all people have the right to share their opinions and have their voices heard. This is called "voting."

2. Fairness: Democracy is based on the idea that everyone's opinion is important, so decisions are made in a way that treats everyone fairly.

Question: If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?

Answer: To find the time it takes for the train to reach its destination, we can use the formula:

Time = Distance / Speed

In this case, the distance is 120 miles and the speed is 60 miles per hour. Plugging these values into the formula, we get:

Time = 120 miles / 60 miles per hour = 2

Options

hours

Therefore, the answer is 2 hours.

Question: Can you provide a summary of the French Revolution?

Answer: The French Revolution, which took place from 1789 to 1799, was a period of radical social and political change in France. It began as a response to the financial crisis and the absolute monarchy of King Louis XVI, and culminated in the rise of Napoleon Bonaparte. The revolution led to the abolition of the monarchy, the establishment of a republic, and the creation of the Declaration of the Rights of Man and of the Citizen, which laid the foundation for modern democratic principles. The revolution was marked by a series of violent uprisings, including the Storming of the Bastille, the Reign of Terror, and the rise of radical factions like the Jacobins. The revolution ultimately led to the rise of Napoleon Bonaparte, who seized power in a coup in 1799 and established himself as the First Consul, later becoming Emperor of the French.</pre></div>For explaining democracy to 10 years old, it used a relatable analogy (choosing a game to play with friends) to simplify the concept of democracy.For train travel problem, it correctly applied the formula for time (Time = Distance / Speed) and accurately calculates the time. It also provided the reasoning behind the calculation.For the summary of French Revolution, it provided a concise yet comprehensive summary. It touched on key events and figures, like the Storming of the Bastille, the Reign of Terror, and Napoleon Bonaparte, which shows a good grasp of historical facts.If you want to compare these answers to other models such as:<ul><li><a href="https://readmedium.com/orca-2-vs-gpt-4-the-smaller-smarter-llm-is-redefining-reasoning-and-strategy-9dae1d2085fd">Orca 2 (rivals GPT4 with 13B parameters)</a></li><li>OpenHermes 2.5 Mistral 7B (beats Deepseek 67B and Qwen 72B on AGIEVal)</li><li>OpenHermes-2.5-neural-chat-7b-v3–1–7B (#1 in 7B and 13B category)</li></ul>I recently asked same questions to them as well, you can have a quick look at here:<div id="1b23" class="link-block"> <a href="https://readmedium.com/openhermes-2-5-mistral-7b-beats-deepseek-67b-and-qwen-72b-on-agieval-and-other-13b-and-7b-models-3c89ccccaa91"> <div> <div> <h2>OpenHermes 2.5 Mistral 7B beats Deepseek 67B and Qwen 72B on AGIEVal, and other 13B and 7B models!</h2> <div><h3>Today, I’m zeroing in on OpenHermes, a model that’s been turning heads in the LLM community recently. It became the…</h3></div> <div>medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*vwjTYEMJF4an7OeAmS7QSQ.png)"></div> </div> </div> </a> </div><div id="e428" class="link-block"> <a href="https://readmedium.com/orca-2-vs-gpt-4-the-smaller-smarter-llm-is-redefining-reasoning-and-strategy-9dae1d2085fd"> <div> <div> <h2>Orca 2 vs GPT-4: The Smaller, Smarter LLM is Redefining Reasoning and Strategy</h2> <div><h3>I’ve been digging into Orca 2, and all I can say is smaller models are on fire, they can now perform just as well as…</h3></div> <div>medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*63gMn7b_9_cdf4CkECKyLg.png)"></div> </div> </div> </a> </div>In addition to single turn conversations, you can also have multi-turn and coding conversations, here’s how:Multi-turn conversation<div id="2366"><pre>## Multi-turn conversation prompt = "Hello" follow_up_question = "How are you today?" response = "" multi_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: {response}<|end_of_turn|>GPT4 Correct User: {follow_up_question}<|end_of_turn|>GPT4 Correct Assistant:" response_text = generate_response(multi_turn_prompt) print("Multi-turn conversation response:", response_text)</pre></div>Coding conversation<div id="1692"><pre>### Coding conversation prompt = "Implement quicksort using C++" coding_prompt = f"Code User: {prompt}<|end_of_turn|>Code Assistant:" response = generate_response(coding_prompt) print("Coding conversation response:", response)</pre></div>Output:<div id="4dec"><pre>Coding conversation response: Code User: Implement quicksort using C++ Code Assistant: Here's an example of how you can implement quicksort in C++:

<span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string">&lt;iostream&gt;</span></span>
<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;

<span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">quickSort</span><span class="hljs-params">(<span class="hljs-type">int</span> arr[], <span class="hljs-type">int</span> left, <span class="hljs-type">int</span> right)</span> </span>{
    <span class="hljs-type">int</span> i = left, j = right;
    <span class="hljs-type">int</span> tmp;
    <span class="hljs-type">int</span> pivot = arr[(left + right) / <span class="hljs-number">2</span>];

    <span class="hljs-comment">/* partition */</span>
    <span class="hljs-keyword">while</span> (i &lt;= j) {
        <span class="hljs-keyword">while</span> (arr[i] &lt; pivot)
            i++;
        <span class="hljs-keyword">while</span> (arr[j] &gt; pivot)
            j--;
        <span class="hljs-keyword">if</span> (i &lt;= j) {
            tmp = arr[i];
            arr[i] = arr[j];
            arr[j] = tmp;
            i++;
            j--;
        }
    };

    <span class="hljs-comment">/* recursion */</span>
    <span class="hljs-keyword">if</span> (left &lt; j)
        <span class="hljs-built_in">quickSort</span>(arr, left, j);
    <span class="hljs-keyword">if</span> (i &lt; right)
        <span class="hljs-built_in">quickSort</span>(arr, i, right);
}
```</pre></div><p id="4eff">If you want to find out more about Starling-7B, <a href="https://starling.cs.berkeley.edu/">refer to the blog post here</a> and <a href="https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha">Hugging Face model page.</a></p><p id="e261">Hope this walk-through was helpful.</p><p id="5791">Let me know what you think in the comments, and happy building!</p></article></body>

Starling LM 7B Alpha Surpasses Claude 2, Nears Parity with GPT-4 Turbo

I recently wrote about OpenChat 3.5, which was the first 7B model that achieves comparable results with ChatGPT!

OpenChat 3.5 Seriously Killed ChatGPT (No, really!)

OpenChat 3.5 is the first 7B model that achieves comparable results with ChatGPT!

medium.com

Starling-7B is a fine-tuned version of OpenChat 3.5, released by researchers from Berkeley EECS. It’s been trained using Reinforcement Learning from AI Feedback (RLAIF) on the latest GPT-4 labeled ranking dataset, berkeley-nest/Nectar (183K chat prompts and 3.8M pairwise comparisons).

The team also leveraged a new reward training and policy tuning pipeline, which is where it sets itself apart from the rest.

Starling-7B tested across various benchmarks, including MT-Bench, AlpacaEval, and MMLU, giving a comprehensive view of its capabilities:

MT Bench and AlpacaEval assess the chatbot’s helpfulness, as you can see it scores 8.09 in MT Bench and 91.99 in AlpacaEval, outshining almost every other model to date, except for GPT-4 and its Turbo version.

Behind this performance boost is training with K-wise loss, which addresses the scarcity of open-source reward models.

I think it’s important to be upfront about the challenges too. Starling-7B, like its predecessors, isn’t perfect. It still struggles in areas like reasoning and mathematics, and that’s where we will scratch the surface today.

In this article, I’ll walk you through:

Local setup for Starling LM 7B alpha
Running initial test with single-turn conversation
Evaluating Starling LM 7B alpha for Fact Verification, Reasoning, and Creativity
Multi-turn and coding conversations
Resources in case you want to dive deeper

Let’s get building!

Getting Started with Starling LM 7B alpha

Let’s start by creating the project folder and virtual environment:

mkdir Starling-LM-7B-alpha && cd Starling-LM-7B-alpha
python3 -m venv Starling-LM-7B-alpha-env
source Starling-LM-7B-alpha-env/bin/activate

pip3 install torch transformers accelerate optimum
pip3 install ipykernel jupyter

# Optionally, fire up VSCode or your favorite IDE and let's get rolling!
code .

For the remaining part, you can either create .py file or .ipynb file (notebook) to continue. I will continue with Jupyter notebook to run code in blocks and interactively inspect the results.

Running Initial Test with Starling LM 7B alpha

Start by importing required libraries, setting device, and loading model and tokenizer:

import torch
import transformers

if torch.cuda.is_available():
    torch.set_default_device("cuda")
else:
    torch.set_default_device("cpu")

tokenizer = transformers.AutoTokenizer.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")
model = transformers.AutoModelForCausalLM.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")

Next, add a utility function to generate responses:

def generate_response(prompt):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(
        input_ids,
        max_length=256,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
    response_ids = outputs[0]
    response_text = tokenizer.decode(response_ids, skip_special_tokens=True)
    return response_text

and here’s how you can start a single-turn conversation:

prompt = "Hello, how are you?"
single_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:"
response_text = generate_response(single_turn_prompt)
print("Response:", response_text)

Output:

Response: GPT4 Correct User: How can you determine if a restaurant is 
popular among locals or mainly attracts tourists, and why might this 
information be useful? GPT4 Correct Assistant: To determine if a restaurant 
is popular among locals or mainly attracts tourists, you can consider the 
following factors:

1. Reviews and ratings: Check online review platforms like Yelp, 
Google Reviews, or TripAdvisor to see if the restaurant has a high 
number of reviews from locals or tourists. Look for reviews that mention 
the restaurant's popularity among locals or tourists specifically.

2. Location: The restaurant's location can provide clues about its 
popularity among locals. If it's situated in a residential area or near 
a local landmark, it's more likely to be frequented by locals. On the 
other hand, if it's near a popular tourist attraction or in a touristy 
neighborhood, it might be more popular among tourists.

3. Menu offerings: A menu that features traditional local dishes or 
ingredients is more likely to attract locals. If the menu is focused
 on international cuisine or caters to a wide range of dietary preferences, 
it might be more popular among tourists.

Good start!

Before moving on with other questions, I want to explain the tricks that researchers employed in layman’s terms:

Nectar: This dataset comprises diverse chat prompts, responses from various models, and accurate ranking labels. Team created this dataset from diverse sources, including lmsys-chat-1M, ShareGPT, Antropic/hh-rlhf, UltraFeedback, Evol-Instruct, and Flan.
Generating Quality Responses: The team used variety of models, namely GPT-4, GPT-3.5-turbo, GPT-3.5-turbo-instruct, LLama-2–7B-chat, and Mistral-7B-Instruct, alongside other existing datasets and models.
Reward Model Training: They’ve trained a reward model using the K-wise maximum likelihood estimator under the Plackett-Luce Model.
Policy Finetuning: Various methods like APA, PPO, and P3O were experimented with, eventually selecting APA for its strong results.
Importance of Quality in Datasets and Reward Models: They stress that the quality of the preference dataset and reward model is crucial, even more than the policy tuning method itself.

OK, let’s how it responds to questions from different categories.

Evaluating Starling LM 7B alpha for Fact Verification, Reasoning, and Creativity

Let’s ask the following questions to see how Starling LM 7B alpha responds:

Language Understanding and Creativity: “How would you explain the concept of democracy to a 10-year-old?”
Problem-Solving and Logical Reasoning: “If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?”
General Knowledge and Fact Verification: “Can you provide a summary of the French Revolution?”

Define the prompts:

prompts = [
    "How would you explain the concept of democracy to a 10-year-old?",
    "If a train travels at 60 miles per hour and has to cover a distance of 120 miles, how long will it take to reach its destination?",
    "Can you provide a summary of the French Revolution?"
]

and run the generation using the following snippet:

A Personal Request to Our Valued Reader:

We envision a future where every individual is equipped with the knowledge and tools to harness the power of AI, driving positive change and innovation in the world.

Each article we publish, every notebook we share, and all the resources we offer are a testament to our commitment to this vision. We pour our passion, expertise, and countless hours into creating content that we believe can make a difference in your journey.

But, here’s a surprising fact: Out of the thousands who benefit from our content, only a mere 1% choose to follow us on Medium. Our dream is to see that number rise to 10%. Because every follow, clap or comment is a vote of confidence, a sign that we’re on the right track, and an indicator of the topics and resources you’d love to see more of.

If you ever found value in our work, please take a moment to follow us on Medium, clap this article and leave a comment!

It’s a small gesture, but it means the world to us and helps us tailor our content to your aspirations and needs.

Thank you for being an integral part of our community. Together, we can shape the future of AI.

for prompt in prompts:
    print(prompt)
    
    single_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:"
    response_text = generate_response(single_turn_prompt)

    print("Response:", response_text)

Let’s have a look at the answers:

Question:
How would you explain the concept of democracy to a 10-year-old?

Amswer: 
Democracy is a way of making decisions where everyone gets a say. 
Imagine you and your friends want to choose a game to play. 

In a democracy, each of you would get a chance to share your opinion and 
everyone's voice would be heard. Then, the group would decide together 
which game to play, based on what everyone wants. This is similar to how 
democracy works in a country, where people vote for the leaders they want 
and the leaders make decisions that affect everyone.

In a democracy, people have the freedom to express their opinions and 
beliefs, and everyone's opinion is valued. This helps to make sure that 
decisions are fair and that everyone's needs are considered.

Here are some key points to remember about democracy:

1. Everyone gets a say: In a democracy, all people have the right to share 
their opinions and have their voices heard. This is called "voting."

2. Fairness: Democracy is based on the idea that everyone's opinion is 
important, so decisions are made in a way that treats everyone fairly.


Question:
If a train travels at 60 miles per hour and has to cover a distance of 
120 miles, how long will it take to reach its destination?

Answer: 
To find the time it takes for the train to reach its destination, 
we can use the formula:

Time = Distance / Speed

In this case, the distance is 120 miles and the speed is 60 miles per hour. 
Plugging these values into the formula, we get:

Time = 120 miles / 60 miles per hour = 2 hours

Therefore, the answer is 2 hours.


Question:
Can you provide a summary of the French Revolution?

Answer:
The French Revolution, which took place from 1789 to 1799, was a period of 
radical social and political change in France. It began as a response to the 
financial crisis and the absolute monarchy of King Louis XVI, and culminated 
in the rise of Napoleon Bonaparte. The revolution led to the abolition of 
the monarchy, the establishment of a republic, and the creation of the 
Declaration of the Rights of Man and of the Citizen, which laid the 
foundation for modern democratic principles. The revolution was marked 
by a series of violent uprisings, including the Storming of the Bastille, 
the Reign of Terror, and the rise of radical factions like the Jacobins. 
The revolution ultimately led to the rise of Napoleon Bonaparte, who seized 
power in a coup in 1799 and established himself as the First Consul, later 
becoming Emperor of the French.

For explaining democracy to 10 years old, it used a relatable analogy (choosing a game to play with friends) to simplify the concept of democracy.

For train travel problem, it correctly applied the formula for time (Time = Distance / Speed) and accurately calculates the time. It also provided the reasoning behind the calculation.

For the summary of French Revolution, it provided a concise yet comprehensive summary. It touched on key events and figures, like the Storming of the Bastille, the Reign of Terror, and Napoleon Bonaparte, which shows a good grasp of historical facts.

If you want to compare these answers to other models such as:

Orca 2 (rivals GPT4 with 13B parameters)
OpenHermes 2.5 Mistral 7B (beats Deepseek 67B and Qwen 72B on AGIEVal)
OpenHermes-2.5-neural-chat-7b-v3–1–7B (#1 in 7B and 13B category)

I recently asked same questions to them as well, you can have a quick look at here:

OpenHermes 2.5 Mistral 7B beats Deepseek 67B and Qwen 72B on AGIEVal, and other 13B and 7B models!

Today, I’m zeroing in on OpenHermes, a model that’s been turning heads in the LLM community recently. It became the…

medium.com

Orca 2 vs GPT-4: The Smaller, Smarter LLM is Redefining Reasoning and Strategy

I’ve been digging into Orca 2, and all I can say is smaller models are on fire, they can now perform just as well as…

medium.com

In addition to single turn conversations, you can also have multi-turn and coding conversations, here’s how:

Multi-turn conversation

## Multi-turn conversation
prompt = "Hello"
follow_up_question =  "How are you today?"
response = ""
multi_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: {response}<|end_of_turn|>GPT4 Correct User: {follow_up_question}<|end_of_turn|>GPT4 Correct Assistant:"
response_text = generate_response(multi_turn_prompt)
print("Multi-turn conversation response:", response_text)

Coding conversation

### Coding conversation
prompt = "Implement quicksort using C++"
coding_prompt = f"Code User: {prompt}<|end_of_turn|>Code Assistant:"
response = generate_response(coding_prompt)
print("Coding conversation response:", response)

Output:

Coding conversation response: 
Code User: Implement quicksort using C++ 
Code Assistant: Here's an example of how you can implement quicksort in C++:

```cpp
#include <iostream>
using namespace std;

void quickSort(int arr[], int left, int right) {
    int i = left, j = right;
    int tmp;
    int pivot = arr[(left + right) / 2];

    /* partition */
    while (i <= j) {
        while (arr[i] < pivot)
            i++;
        while (arr[j] > pivot)
            j--;
        if (i <= j) {
            tmp = arr[i];
            arr[i] = arr[j];
            arr[j] = tmp;
            i++;
            j--;
        }
    };

    /* recursion */
    if (left < j)
        quickSort(arr, left, j);
    if (i < right)
        quickSort(arr, i, right);
}
```

If you want to find out more about Starling-7B, refer to the blog post here and Hugging Face model page.

Hope this walk-through was helpful.

Let me know what you think in the comments, and happy building!