Using Ollama to self-host LLMs

Self-hosting your own LLM solution has never been easier, in this article I will be using Ollama to generate poems for my website. I don’t have to many active users and generating some content for a snowball effect maybe useful.
After installing ollama with the script, I can start ollama server with ollama serve. In order to use the model we run: ollama pull llama2
And to interface with within a program we can do something like:
import json
import requests
data = {
"model" : "llama2",
"prompt" : "hello how are you"
}
response = requests.post("http://localhost:11434/api/generate",json=data)
responses = response.content.decode("utf-8").strip().split("\n")
full_response = ""
for line in responses:
parsed = json.loads(line)
full_response += parsed["response"]
print(full_response)Really easy and now we don’t have to deal with external requests. Although this seems to be to intensive for a droplet, but the tasks I have can be ran on my local machine and uploaded.
Some people may not like having to run what amounts to a sperate micro-service in order to make these prompts, but getting to try different models with almost no code changes makes it really awesome to do some hyper-parameter selection.


