Summary

The web content provides a comprehensive guide on how to access and use Meta's Llama 2 language model via HuggingFace, including step-by-step instructions for setup and usage on a local machine.

Abstract

Meta's Llama 2, a large language model with 70 billion parameters, was released for public access on July 19, 2023. This guide offers a detailed walkthrough for users to gain access to Llama 2 through HuggingFace, emphasizing that it is open for commercial use and free to access, unlike OpenAI's GPT-3 and GPT-4 models. The process includes downloading the model from the Llama 2 website, creating a HuggingFace account, requesting access to the model repository, and installing necessary packages in a virtual environment. The article also addresses common issues and provides solutions, such as using the nightly version of Pytorch to avoid errors. It concludes with a demonstration of how to interact with Llama 2 using a Python script, showcasing the model's capabilities in generating text and explaining complex concepts.

Opinions

The author found existing tutorials incomplete and decided to create a more comprehensive guide.
The author recommends using the nightly version of Pytorch due to encountered errors with the stable release.
HuggingFace granted access to all Llama 2 repositories even though the author only applied for one, which was unexpected but beneficial.
The author suggests using a virtual environment and specifically recommends using virtualenv over Python's built-in venv module due to encountered errors.
The author endorses using the AI service ZAI.chat as a cost-effective alternative to ChatGPT Plus (GPT-4), offering a special discount for users.

How to get access to Llama 2 via HuggingFace

A step-by-step guide to using Llama 2 on HuggingFace

Meta’s Llama 2 was released on 19 Jul 2023.

Meta’s large language model (LLM) Llama 2 (Large Language Model Meta AI) was recently released on 19 Jul 2023. Llama 2 has 70B parameters and uses 2 Trillion pretraining tokens. For comparison, GPT-3 has 175B parameters, and GPT-4 has 1.7 trillion parameters (though unverified).

Unlike Llama 1, Llama 2 is open for commercial use, which means it is more easily accessible to the public. Also, unlike OpenAI’s GPT-3 and GPT-4 models, this is free!

I could not find any complete and concise tutorials on setting up access to Llama2 in the local machine (not playground) and decided to write my own in this article.

The most helpful article I found was this, but I found it incomplete as a few steps were missing (the list of packages to install was incomplete for a start).

Llama 2 is here - get it on Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

The first thing you need to do is to head over to the Llama 2 website and click ‘Download the Model’, followed by filling up a form:

Llama 2 - Meta AI

Llama 2 - The next generation of our open source large language model, available for free for research and commercial…

ai.meta.com

Access is granted almost immediately, and you will receive an email looking like the one below at the address which you signed up with.

Ignore this. We will use the models from HuggingFace instead.

Next, if you have not already, you will need to create an account on HuggingFace. Sign up below. Make sure to use the same email you have used for the Llama 2 registration above.

Hugging Face - The AI community building the future.

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Next, you need to request access to one of the models in the official Meta Llama 2 repository. Use the link below, and click on ‘meta-llama/Llama-2–7b-chat-hf’. There will be another form to fill up. Again, use the same email address as you have used in the above.

meta-llama (Meta Llama 2)

Org profile for Meta Llama 2 on Hugging Face, the AI community building the future.

huggingface.co

Wait around 30 minutes, and you will receive the ‘Access Granted’ emails. For some reason, HuggingFace gave me access to all repos even though I only applied for one:

Now we have to start installing the packages. This is best done in a virtual environment. If you have not installed virtualenv already, do so below. I have also tried via the Python native venv module, but I encountered tricky errors so I do not recommend it. In your CLI (command line interface), type the below and press enter:

pip3 install virtualenv

Once installation completes, create a new directory ‘llm’, and set up virtualenv:

mkdir llm
cd llm
virtualenv myvirtualenv
source myvirtualenv/bin/activate

Great! Now you are in a virtual environment and ready to start installing your packages:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install transformers
pip3 install xformers
pip3 install accelerate

The reason I installed Pytorch with the nightly version above was that I actually tried the normal versions of Pytorch first, but I got the below error:

After some googling around, I found this link that recommended using the nightly version instead, hence the above.

MPS cumsum issue - RuntimeError: MPS does not support cumsum op with int64 input. Support has been…

🐛 Describe the bug I'm on a Macbook Pro M1 Pro and I've upgraded to 13.3 Beta 3 - I am running into the cumsum issue…

github.com

If all the packages are installed successfully above, you should now log in to HuggingFace via CLI:

huggingface-cli login

After which you will see the following:

As stated above, you will need to get a token via https://huggingface.co/settings/tokens. Under ‘Name’, you can input anything you want e.g. ‘my test token’. Under ‘Role’, leave it as ‘read’. Click ‘Generate a token’.

Copy this token by clicking the double square button on the right:

Paste it into the terminal and click enter. You will now see these messages if successful:

Next, create a script as follows (courtesy of https://huggingface.co/blog/llama2), save it in the ‘llm’ folder you created earlier as ‘llama2_test.py’:

from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'Tell me a story about a person writing a tutorial for installing llama 2 where every letter starts with s\n',
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Now back in your CLI, type the following:

python llama2_test.py

Voila! Output generated:

You can ask Llama 2 anything by changing the 16th line in the ‘llama2_test.py’ script. Here’s another one:

Pretty cool huh? Llama birds!?

Also, you can use it to explain difficult concepts in your studies:

Note that Latex equations are generated in the above. To see the actual equation, well, you can copy and paste it into Bard 😉 or use this.

When you are done, to deactivate the virtual environment, type this at the command prompt:

deactivate

Hope this helps and you have fun!