Summary

The article provides a tutorial on running state-of-the-art large language models, LLaMA and Alpaca, on a local computer using the dalai library, offering a cost-effective and space-efficient alternative to GPT models.

Abstract

In this article, the author demonstrates how to utilize the dalai library to run the LLaMA and Alpaca language models on a personal computer. The LLaMA model, known for outperforming GPT-3 while being significantly smaller, can be installed and served locally with ease. The Alpaca model, a fine-tuned version of LLaMA, exhibits ChatGPT-like behavior and was developed at a fraction of the cost of GPT-3. The article also touches on the leaked nature of the LLaMA weights, the non-commercial license restrictions, and the impressive feat of running such models on minimal hardware like a Raspberry Pi. The dalai library simplifies the process of installing and running these models, and it also provides an API for integration into custom applications, although commercial use is restricted.

Opinions

The author expresses amazement at the performance and size efficiency of the LLaMA and Alpaca models, particularly their ability to outperform GPT-3.
There is an appreciation for the cost-effectiveness of the Alpaca model's development, highlighting the $600 cost versus the$ 5 million spent on GPT-3.
The author notes the irony in OpenAI's text-davinci-003 model inadvertently aiding the development of a cheaper alternative to ChatGPT.
The article conveys excitement about the accessibility of running advanced language models on local machines, which was previously unfeasible for most users.
There is a sense of anticipation for future developments in deep learning, with the author expressing intent to cover more related topics.
The author encourages collaboration and engagement with the content, inviting readers to reach out with questions or ideas.

LLaMA & Alpaca: “ChatGPT” On Your Local Computer 🤯 | Tutorial

In this article I will show you how you can run state-of-the-art large language models on your local computer. Yes, you’ve heard right.

For this we will use the dalai library which allows us to run the foundational language model LLaMA as well as the instruction-following Alpaca model. While the LLaMA model is a foundational (or broad) language model that is able to predict the next token (word) based on a given input sequence (sentence), the Alpaca model is a fine-tuned version of the LLaMA model capable of following instructions (which you can think of as ChatGPT behaviour). What’s even more impressive, both these models achieve comparable results or even outperform their GPT counterparts while still being small enough to run on your local computer. In this video I will show you that it only takes a few steps (thanks to the dalai library) to run “ChatGPT” on your local computer.

If you like videos more, feel free to check out my YouTube video to this article:

LLaMA Model

The LLaMa model is a foundational language model. While language models are probability distributions over sequences of words or tokens, it is easier to think of them as being next token predictors. So based on a given sequence of words a language model would predict the most plausible next word. I’m sure you’ve seen this behaviour before where you start a sentence and ChatGPT, for example, continues your sentence.

What makes the LLaMA model special? Well, while being 13x smaller than the GPT-3 model, the LLaMA model is still able to outperform the GPT-3 model on most benchmarks. And we all know how good the GPT-3 or ChatGPT models are. This is truely impressive and also the reason why we can run a ChatGPT-like model on our local computer. One guy was even able to run the LLaMA model on his Raspberry Pi, that’s insane.

Originally, the LLaMA model was intended to be used for research purposes only, and model checkpoints were to be requested from Meta. But the weights of the model have been leaked, and now anyone can access them. However, the model is still subject to a non-commercial license.

Install The LLaMA Model

npx dalai llama install 7B

This will install the model on your local computer. I know, it’s almost to easy to be true. Be aware that the LLaMA-7B takes up around 31GB on your computer, so make sure you have some space left. Also, I encountered that the dalai library is under heavy development, so the installation command might have changed when you read this (happened to me three times within one day). You can find the repository here: https://github.com/cocktailpeanut/dalai

Run The LLaMA Model

npx dalai serve

And there you go. It has become this easy to run a large language model on your local computer. I’m really amazed.

Alpaca Model

The Alpaca model is a fine-tuned version of the LLaMA model. More precisely, it is instruction-following model, which can be thought of as “ChatGPT behaviour”. What’s really impressive (I know I used this word a bunch of times now) about the Alpaca model, the fine-tuning process cost less than $600 in total. For comparison, training the GPT-3 model in 2020 cost about $5,000,000. I know the LLaMA model did a lot of the heavy-lifting here, but fine-tuning the LLaMA model to a ChatGPT like model for less than $600 is still mind-blowing.

How were Taori et al. able to achieve this? Ironically, they got some help from OpenAI (not intentionally though). In the start they had 175 self-instruct seed tasks. Using the text-davinci-003 model from OpenAI they modified those seed tasks such that they ended up having 52,000 instruction following examples, which they could use for supervised fine-tuning. Instead of needing human feedback they were this way able to fine-tune a model that is almost as good as ChatGPT while being tremendously cheaper in the production.

How were Taori et al. able to accomplish this? Ironically, they got some help from OpenAI (though not intentionally). Initially, they had only 175 self-instruction tasks. Using OpenAI’s text-davinci-003 model, they modified those initial tasks so that they ended up with 52,000 instruction-following examples that they could use for supervised fine-tuning. Instead of relying on human feedback, this allowed them to fine-tune a model that is almost as good as ChatGPT while being significantly cheaper to produce.

Image from Self-Instruct GitHub Repository.

For better understanding, here is an example of how a seed task can be modified:

Install The Alpaca Model

npx dalai alpaca install 7B

The Alpaca model is already available in a quantized version, so it only needs about 4 GB on your computer. In this way, the installation of the Alpaca is faster than that of the LLaMA model.

Run The Alpaca Model

npx dalai serve

Bonus

The dalai library also provides an API that allows you to integrate both models into your own applications. I’m sure I don’t have much more to say, many of you will get creative right now. Just remember that both models are released under a non-commercial license.

Final Thoughts

I hope you enjoyed this article. I will publish more articles about Deep Learning related topics in the future. I also write about topics in the field of Data Science and Data Engineering.

Isn’t collaboration great? I’m always happy to answer questions or discuss ideas proposed in my articles. So don’t hesitate to reach out to me! 🙌 Also, make sure to subscribe or follow to not miss out on new articles.

YouTube: https://bit.ly/3LqA1Os

LinkedIn: http://bit.ly/3i5Sc1g