avatarMehran Ghamaty

Summary

The article outlines how to use Ollama for self-hosting large language models (LLMs) like Llama 2 to generate content, such as poems, for a website with a moderate number of users.

Abstract

The undefined website provides a guide on using Ollama to self-host LLMs, specifically for generating poems to enhance content on a website with limited user activity. The author emphasizes the ease of setting up Ollama using a provided installation script and starting the server with a simple command. The article details the process of pulling a specific model, Llama 2, and demonstrates how to interface with the model programmatically using HTTP POST requests. This approach avoids the need for external requests and allows for local execution of tasks, although it may be too resource-intensive for some hosting solutions like a droplet. The author appreciates the flexibility of easily switching between different models and adjusting hyper-parameters with minimal code changes, despite the potential drawback of managing a separate micro-service for prompt generation.

Opinions

  • The author finds value in self-hosting LLMs to create content, anticipating a "snowball effect" that could be beneficial for their website.
  • There is a positive sentiment towards the simplicity of installing and running Ollama, as well as the ability to pull and use models like Llama 2 with straightforward commands.
  • The author acknowledges that running Ollama as a micro-service might not be ideal for everyone but personally appreciates the ease of experimenting with different models and hyper-parameters without significant code alterations.

Using Ollama to self-host LLMs

Self-hosting your own LLM solution has never been easier, in this article I will be using Ollama to generate poems for my website. I don’t have to many active users and generating some content for a snowball effect maybe useful.

After installing ollama with the script, I can start ollama server with ollama serve. In order to use the model we run: ollama pull llama2

And to interface with within a program we can do something like:

import json
import requests

data = {
    "model" : "llama2",
    "prompt" : "hello how are you"
}

response = requests.post("http://localhost:11434/api/generate",json=data)

responses = response.content.decode("utf-8").strip().split("\n")
full_response = ""
for line in responses:
    parsed = json.loads(line)
    full_response += parsed["response"]

print(full_response)

Really easy and now we don’t have to deal with external requests. Although this seems to be to intensive for a droplet, but the tasks I have can be ran on my local machine and uploaded.

Some people may not like having to run what amounts to a sperate micro-service in order to make these prompts, but getting to try different models with almost no code changes makes it really awesome to do some hyper-parameter selection.

Llm
Ollama
Llama 2
Recommended from ReadMedium