Summary

This article provides instructions on how to install and run the Vicuna model on a local computer using either a GPU or CPU.

Abstract

The article "Vicuna on Your CPU & GPU: Best Free Chatbot According to GPT-4" explains the process of installing and running the Vicuna model on a local computer. The installation can be done using either a GPU or CPU, with the GPU installation being the GPTQ quantized version and the CPU installation being the GGML quantized version. The article provides step-by-step instructions, including the installation of Conda, the creation of a virtual environment, the installation of the web interface, and the download of the quantized model. The article also includes links to a YouTube video for additional guidance and a prompt to follow the author for more AI-related content.

Bullet points

The article provides instructions on how to install and run the Vicuna model on a local computer using either a GPU or CPU.
The installation can be done using either a GPU or CPU, with the GPU installation being the GPTQ quantized version and the CPU installation being the GGML quantized version.
The article provides step-by-step instructions, including the installation of Conda, the creation of a virtual environment, the installation of the web interface, and the download of the quantized model.
The article includes links to a YouTube video for additional guidance.
The author prompts readers to follow them for more AI-related content.

Vicuna on Your CPU & GPU: Best Free Chatbot According to GPT-4

In this article I will show you how to run the Vicuna model on your local computer using either your GPU or just your CPU.

At the moment this article contains only the commands used to install the Vicuna model. I will add more details soon. If you can’t wait for that, feel free to watch my YouTube video in the meantime:

Foundation: Install Conda

This step is recommended for running the Vicuna model with both your GPU or CPU. Using virtual environments helps to avoid version mismatches when working in multiple projects.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sha256sum Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

GPU Installation (GPTQ Quantised)

First, let’s create a virtual environment:

conda create -n vicuna python=3.9
conda activate vicuna

Next, we will install the web interface that will allow us to interact with the Vicuna model in a visually appealing way:

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt

In order to use the GPTQ quantized model version (reduces VRAM requirements from 28 GB to 10 GB), we need to install the following repository in a subfolder:

mkdir repositories
cd repositories
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install

Now it’s time to download the quantized model:

cd ../..
python download-model.py anon8231489123/vicuna-13b-GPTQ-4bit-128g

Finally, everything is set up and we can launch the web interface and interact with the Vicuna model:

python server.py --chat --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128 --share

CPU Installation (GGML Quantised)

Again, let’s first create a virtual environment:

conda create -n vicuna_cpu python=3.9
conda activate vicuna_cpu

Next, we will clone and install the llama.cpp repository:

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make

To download the quantized model binaries, I used the Python module huggingface-hub. For this I first installed the huggingface-hub module:

pip install huggingface-hub

Then I downloaded the binaries of the quantized model by running the following code in the Python interpreter:

python
  
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="eachadea/ggml-vicuna-13b-4bit", filename="ggml-vicuna-13b-4bit.bin", local_dir="./models")
>>> exit()

Perfect, now we can interact with the Vicuna model by using the following command:

./main -m ./models/ggml-vicuna-13b-4bit.bin --color -f ./prompts/alpaca.txt -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7

That’s it! I hope this (early version) of this article was already helpful.

Final Thoughts

I hope you enjoyed this article. I will publish more articles about how to use AI models and how they work in the future. Follow me if that sounds interesting to you. :-)

Isn’t collaboration great? I’m always happy to answer questions or discuss ideas proposed in my articles. So don’t hesitate to reach out to me! 🙌 Also, make sure to subscribe or follow to not miss out on new articles.

YouTube: https://bit.ly/3LqA1Os

LinkedIn: http://bit.ly/3i5Sc1g