How to Install and Run Llama2 Locally on Windows for Free
Are you interested in running Llama2, the powerful language model, locally on your Windows machine? Llama2 is known for its impressive language generation capabilities, and running it on your own system can be a great way to harness its potential. In this guide, we’ll walk you through the step-by-step process of installing and running Llama2 on your Windows computer for free.
Table of Contents
- Introduction
- Installing python 3.10
- Downloading the CUDA Toolkit
- Cmake
- Downloading Llama2 GGUF File
- Checking System Requirements
- Creating the
install_llama.ps1
File - Setting Up Your Project Directory
- Running the
install_llama.ps1
File - Creating a Python Script to Run Llama2
- Conclusion
- FAQs
Introduction
Llama2 is a remarkable language model developed by Hugging Face, and it can be incredibly useful for various natural language processing tasks. To get started, you’ll need to follow these steps carefully.
Downloading the CUDA Toolkit
Before you can run Llama2 on your Windows machine, you need to ensure that you have the necessary tools installed. Llama2 leverages the power of GPUs, and the CUDA Toolkit is essential for this purpose. Here’s how to download it:
- Visit the NVIDIA CUDA Toolkit download page.
- Choose your Windows version.
- Select the “exe(local)” option for download.
- Download the CUDA Toolkit, which is approximately 3.1 GB in size.
- During the installation, don’t change any settings; simply hit “Next” for each step.
Installing Cmake
Installing Python 3.10
Downloading Llama2 GGUF File
Next, you’ll need to download the Llama2 GGUF file, which contains the model you’ll be running. There are two versions available, the 7 billion parameter version and the 13 billion parameter version. Make sure your system meets the hardware requirements for the version you choose.
- For the 7 billion parameter version, use this link.
- For the 13 billion parameter version, use this link.
Please note that the 7 billion parameter version requires a minimum of NVIDIA GeForce GTX 1650 Ti , while the 13 billion parameter version requires a minimum of NVIDIA GeForce RTX 3050. Ensure your laptop meets these requirements for optimal performance.
Checking System Requirements
It’s crucial to verify that your laptop meets the hardware requirements mentioned above. Llama2’s performance depends on having the right GPU, so ensure your system is up to the task.
Installing Rust compiler
Go for a 64 bit setup
Creating the install_llama.ps1 File
Now, let’s create a PowerShell script to install the necessary Python packages and configure your environment for Llama2. Create a file named install_llama.ps1
in your project directory and paste the following code into it:
python -m venv ./venv
./venv/scripts/activate.ps1
$env:FORCE_CMAKE=1
$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
pip install llama-index
pip install transformers
pip install torch
This script sets up a virtual environment, activates it, and installs the required packages for running Llama2.
Setting Up Your Project Directory
At this point, your project folder should look like this:
- install_llama.ps1
- Your virtual environment (venv)
- Other project files
Running the install_llama.ps1 File
Open your terminal and navigate to your project directory. Run the install_llama.ps1
file by executing the following command:
./install_llama.ps1
This will install the necessary packages and configure your environment for running Llama2.
Creating a Python Script to Run Llama2
Now that your environment is set up, you can create a Python script to interact with Llama2. Here’s a sample script to get you started:
from langchain.embeddings import HuggingFaceEmbeddings
from llama_index import (
SimpleDirectoryReader,
VectorStoreIndex,
ServiceContext,
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
# model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
# model_url = "https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF/resolve/main/llama-2-7b-chat.Q4_0.gguf"
path = "./llama-2-13b-chat.Q4_0.gguf"
llm = LlamaCPP(
# You can pass in the URL to a GGML model to download it automatically
model_url=None,
# optionally, you can set the path to a pre-downloaded model instead of model_url
# model_path=None,
model_path=path,
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=3900,
# kwargs to pass to __call__()
generate_kwargs={},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": 10}, #28,29,30 layers works best on my setup.
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
print(response.delta, end="", flush=True)
This script demonstrates how to initialize Llama2 and use it to generate text based on a prompt.
Fast cars, sleek and bold, Roaring tales of speed untold, On the open road, they dance and soar, In their thunderous engine's roar.
Conclusion
Congratulations! You’ve successfully installed and run Llama2 locally on your Windows machine. Now you can harness the power of this language model for various text generation tasks. Happy Generation!
FAQs
1. What are the hardware requirements for running Llama2 on Windows?
- For the 7 billion parameter version, you need at least NVIDIA GeForce GTX 1650 Ti.
- For the 13 billion parameter version, a minimum of NVIDIA GeForce RTX 3050 is required.
2. Can I run Llama2 on other operating systems?
- Llama2 can be run on various operating systems, but this guide focuses on Windows installation.
3. How can I optimize Llama2’s performance on my Windows machine?
- Ensure that you have a compatible GPU and follow the installation steps carefully to optimize performance.
4. Can I fine-tune Llama2 for specific tasks?
- Yes, you can fine-tune Llama2 for specific tasks using Hugging Face’s Transformers library.
5. Where can I find more resources and support for Llama2?
- You can follow @AyushmanPranav on LinkedIn for updates and tips related to Llama2.
- If this article helped you. Support me by Buying me a coffee
For further assistance, feel free to reach out and explore more about