Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4491

Abstract

:fit:800/0*CqKAd6GnvWmFRzzx.png"><figcaption>Meng Li usage screenshots</figcaption></figure>I have already applied, so there’s a sentence in the middle of my screen saying: “You have been granted access to this model”.Next, we will register and install HuggingFace, in order to authorize the use of open-source libraries.First Step: Register on the official website to obtain an API Token.<figure id="9dde"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*WUFZQH277lWv3VQ3.png"><figcaption>Meng Li usage screenshots</figcaption></figure>Second Step: Install the HuggingFace Library using pip install transformers.Third Step: Run huggingface-cli login in the command line to set up your API Token.<figure id="7c4e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*IQWBloNJJHkSYF05.png"><figcaption>Meng Li usage screenshots</figcaption></figure>Llama2 Example Code<div id="5ae4"><pre>```python # Import necessary libraries from transformers import AutoTokenizer, AutoModelForCausalLM

# Import HuggingFace API Token import os os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'API Token'

# Load the tokenizer for the pre-trained model tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Load the pre-trained model # Use the device_map parameter to automatically load the model onto available hardware devices, such as GPU model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-2-7b-chat-hf", device_map = 'auto')

# Define a prompt for the model to generate a story based on prompt = "Can it be possible that transferring Thomas Song from Class A to Class B raises the average IQ of both classes?"

# Use the tokenizer to convert the prompt into a format the model can understand, and move it to the GPU inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Use the model to generate text, setting the maximum number of tokens to generate to 2000 outputs = model.generate(inputs["input_ids"], max_new_tokens=2000)

# Decode the generated tokens into text, skipping any special tokens like [CLS], [SEP], etc. response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print the generated response print(response)

# Options
p><p id="24cc">Finally, use the model’s .generate() method to produce a response.</p><p id="2c45"><b>max_new_tokens=2000 limits the length of the generated text.</b></p><p id="d915">Use the tokenizer’s .decode() method to convert the numerical output back into text, skipping any special tokens.</p><p id="747b"><b>Since it’s a local inference, it’s quite time-consuming.</b></p><p id="a137">On my machine, it takes about 30 seconds to 2 minutes to generate results.</p><p id="b1ce"><b>HuggingFace Spaces offers trial environments.</b></p><div id="5088" class="link-block">
          <a href="https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat">
            <div>
              <div>
                <h2>Llama 2 7B Chat - a Hugging Face Space by huggingface-projects</h2>
                <div><h3>Discover amazing ML apps made by the community</h3></div>
                <div><p>huggingface.co</p></div>
              </div>
              <div>
                <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*Uy-fvUUETGf1ocWS)"></div>
              </div>
            </div>
          </a>
        </div><h1 id="8aba">Using Code Llama</h1><p id="9f35">The model application process has been completed in the aforementioned steps.</p><p id="a2ab">Code Llama Example Code</p><div id="adc6"><pre><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> AutoTokenizer
<span class="hljs-keyword">import</span> transformers
<span class="hljs-keyword">import</span> torch

model = <span class="hljs-string">"codellama/CodeLlama-7b-hf"</span>

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    <span class="hljs-string">"text-generation"</span>,
    model=model,
    torch_dtype=torch.float16,
    device_map=<span class="hljs-string">"auto"</span>,
)

sequences = pipeline(
    <span class="hljs-string">'import socket\n\ndef ping_exponential_backoff(host: str):'</span>,
    do_sample=<span class="hljs-literal">True</span>,
    top_k=<span class="hljs-number">10</span>,
    temperature=<span class="hljs-number">0.1</span>,
    top_p=<span class="hljs-number">0.95</span>,
    num_return_sequences=<span class="hljs-number">1</span>,
    eos_token_id=tokenizer.eos_token_id,
    max_length=<span class="hljs-number">200</span>,
)
<span class="hljs-keyword">for</span> seq <span class="hljs-keyword">in</span> sequences:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">f"Result: <span class="hljs-subst">{seq[<span class="hljs-string">'generated_text'</span>]}</span>"</span>)</pre></div><p id="7115"><b>HuggingFace Spaces offers a trial environment.</b></p><div id="7c9e" class="link-block">
          <a href="https://huggingface.co/spaces/codellama/codellama-playground">
            <div>
              <div>
                <h2>Code Llama - Playground - a Hugging Face Space by codellama</h2>
                <div><h3>Discover amazing ML apps made by the community</h3></div>
                <div><p>huggingface.co</p></div>
              </div>
              <div>
                <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*mShOMMg9feNWdEUV)"></div>
              </div>
            </div>
          </a>
        </div><h1 id="7333">Conclusion</h1><p id="6938"><b>In this era of information overload, we deal with massive amounts of data every day.</b></p><p id="0414"><b>Large language models, such as Llama2 and Code Llama, are powerful tools that help us process, understand, and generate textual data.</b></p><p id="13ac"><b>Through learning and practice, we can better master these technologies and apply them in our work and daily lives.</b></p><p id="71c3"><b>The Llama2 series has contributed to the current prosperity of open-source large models, and it’s worth giving them a try.</b></p><p id="94d1">Feel free to leave a comment and join the discussion.</p><p id="69c0">If you enjoyed this story, feel free<a href="https://medium.com/@mengyoupanshan"> to subscribe</a> to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors.</p><p id="9353">I am Li Meng, an independent open-source software developer, and author of SolidUI, highly interested in new technologies, and focused on the AI and data fields. If you find my content interesting, please follow, like, and share. Thank you!</p></article></body>

HuggingFace: The Best Gateway to the Llama2 Model?

Hey friends, today I’m going to introduce you to something new — the Llama2 model, which was launched by Meta (yes, the company behind Facebook).

You can directly download this model from Llama’s official website and then follow the guide on their GitHub to call it.

However, personally, I would recommend downloading and importing the model from HuggingFace.

Why, you might ask?

Because the world of models changes so rapidly; what’s popular with Llama today might be replaced by a new trending model tomorrow.

But HuggingFace is always there, supporting a variety of open-source models.

When it comes to learning, we definitely want to choose knowledge that can be reused and remains effective over time, right?

I’ll also talk about the Code Llama series, which are also open-source models.

Code Llama is a large AI model for code generation based on Llama 2.

Let’s take a look at the features of the Llama2 model and how to apply for its use.

Model Features

Llama 2 is a series of large-scale pre-trained and fine-tuned language models, with parameter sizes ranging from 7B to 70B — quite massive, right?

Compared to Llama 1, it has significant improvements.

Llama 2 supports a 4096 context window, and the 70B parameter version utilizes Grouped Query Attention (GQA) to enhance inference performance.

What’s even more exciting is the introduction of Code Llama, a dedicated series of language models for coding.

It’s a top contender among publicly available models!

Code Llama is not only capable of rapidly generating code but also supports large input contexts and zero-shot programming tasks.

Moreover, Code Llama models come in various parameter sizes, from 7B to 34B, to accommodate different application needs.

Now, Llama 2 and Code Llama have been released under a permissive license, available for both research and commercial uses.

Using Llama2

For first-time users, applications are made at https://llama.meta.com/llama-downloads by filling in the information, where you can apply for all three types of large models.

I applied for all the large models and received three emails.

The URLs in the emails allow you to download the model weights, and they are valid for 24 hours.

Of course, our share today does not follow the official GitHub usage method as provided on the website.

The above method is for first-time users to leave their email information with Meta;

otherwise, the application through HuggingFace for llama2 will not be approved.

In HuggingFace’s Models, you can find https://huggingface.co/meta-llama/Llama-2-7b.

Note that there are a plethora of Llama2 model versions; the one we are using here is the smallest 7B version.

Additionally, there are 13B, 70B, chat versions, as well as various unofficial Meta versions.

After selecting the meta-llama/Llama-2–7b model, you’ll be able to see its basic information.

Click to apply, and be aware that if you don’t leave an email on the official website, from personal experience, it will continuously get stuck.

After the above steps, the application is usually approved within minutes.

I have already applied, so there’s a sentence in the middle of my screen saying: “You have been granted access to this model”.

Next, we will register and install HuggingFace, in order to authorize the use of open-source libraries.

First Step: Register on the official website to obtain an API Token.

Second Step: Install the HuggingFace Library using pip install transformers.

Third Step: Run huggingface-cli login in the command line to set up your API Token.

Llama2 Example Code

```python
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForCausalLM

# Import HuggingFace API Token
import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'API Token'

# Load the tokenizer for the pre-trained model
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

# Load the pre-trained model
# Use the device_map parameter to automatically load the model onto available hardware devices, such as GPU
model = AutoModelForCausalLM.from_pretrained(
          "meta-llama/Llama-2-7b-chat-hf",
          device_map = 'auto')

# Define a prompt for the model to generate a story based on
prompt = "Can it be possible that transferring Thomas Song from Class A to Class B raises the average IQ of both classes?"

# Use the tokenizer to convert the prompt into a format the model can understand, and move it to the GPU
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Use the model to generate text, setting the maximum number of tokens to generate to 2000
outputs = model.generate(inputs["input_ids"], max_new_tokens=2000)

# Decode the generated tokens into text, skipping any special tokens like [CLS], [SEP], etc.
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print the generated response
print(response)
```

This script is a typical use case for the HuggingFace Transformers library, which offers a wide array of pre-trained models and related tools.

Import AutoTokenizer: This tool is for automatically loading the tokenizer associated with a pre-trained model.

The tokenizer is responsible for converting text into a numerical format that the model can understand.

Import AutoModelForCausalLM: This tool is for loading a causal language model (used for text generation).

Use the from_pretrained method to load the pre-trained tokenizer and model.

Here, device_map = ‘auto’ is used to automatically load the model onto an available device, such as a GPU.

Then, provide a prompt: “Can it be possible that transferring Thomas Song from Class A to Class B raises the average IQ of both classes?” and use the tokenizer to convert this prompt into a format that the model can accept, return_tensors=”pt” indicates it returns PyTorch tensors.

The statement .to(“cuda”) is for device format conversion to GPU because I run the script on a GPU, and it will throw an error without it. If you are using a CPU, you might try removing it.

Finally, use the model’s .generate() method to produce a response.

max_new_tokens=2000 limits the length of the generated text.

Use the tokenizer’s .decode() method to convert the numerical output back into text, skipping any special tokens.

Since it’s a local inference, it’s quite time-consuming.

On my machine, it takes about 30 seconds to 2 minutes to generate results.

HuggingFace Spaces offers trial environments.

Llama 2 7B Chat - a Hugging Face Space by huggingface-projects

Discover amazing ML apps made by the community

huggingface.co

Using Code Llama

The model application process has been completed in the aforementioned steps.

Code Llama Example Code

from transformers import AutoTokenizer
import transformers
import torch

model = "codellama/CodeLlama-7b-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'import socket\n\ndef ping_exponential_backoff(host: str):',
    do_sample=True,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

HuggingFace Spaces offers a trial environment.

Code Llama - Playground - a Hugging Face Space by codellama

Discover amazing ML apps made by the community

huggingface.co

Conclusion

In this era of information overload, we deal with massive amounts of data every day.

Large language models, such as Llama2 and Code Llama, are powerful tools that help us process, understand, and generate textual data.

Through learning and practice, we can better master these technologies and apply them in our work and daily lives.

The Llama2 series has contributed to the current prosperity of open-source large models, and it’s worth giving them a try.

Feel free to leave a comment and join the discussion.

If you enjoyed this story, feel free to subscribe to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors.

I am Li Meng, an independent open-source software developer, and author of SolidUI, highly interested in new technologies, and focused on the AI and data fields. If you find my content interesting, please follow, like, and share. Thank you!