Installing text-generation-webui to host LLMs for chat or for training [part 3 of a series]
In the previous post in this series, we looked at how to create an EC2 instance on AWS cloud with a GPU card.
The open source project text-generation-webui [https://github.com/oobabooga/text-generation-webui] provides an easy to use user interface for downloading, hosting and using LLMs for inference (text prediction/generation). It can also be used for training, although that is not its main function.
A basic user interface to chat with an LLM powered AI. The interface is provided by the open source project text-generation-webui and includes interfaces for loading models and training.
ref = https://repost.aws/articles/ARQ0Tz9eorSL6EAus7XPMG-Q/how-to-install-textgen-webui-on-aws
Installing and running the text-generation-webui at Launch, via a custom Ubuntu service.
First, you need to connect to the EC2 instance via SSH.
By default, an EC2 instance does not have a static IP address. You can of course assign a static IP address, but that has a running cost.
In case you only occasionally need to connect to the EC2 instance, see this post that provides a workaround.
Now that you are connected to the EC2 instance, you can run a script to install the text-generation-webui.
note: if something goes wrong with the install, you need to delete the installer_files folder — otherwise, following re-installations will not work.
One-time setup scripts:
- One-step simple install of text-generation-webui with GPU. Also downloads a small LLM (mistral 7B).
curl https://gist.githubusercontent.com/mrseanryan/70b09d405e77d0881d01ca288d2476cc/raw | bash
The source code can be seen here:
https://gist.github.com/mrseanryan/70b09d405e77d0881d01ca288d2476cc
Alternative approach:
- Step-by-step manual Install of text-generation-webui with GPU support
https://gist.github.com/mrseanryan/1f6ee66fde867ac37e9c57da2d7a3f09
In the SSH console output, you should see the text-generation-webui HTTP server start up.
It will output a local link like http://127.0.0.1/7860 and also an OpenAI-compatible API URL, like Running on public URL: https://91234123123123.gradio.live
Hold CTRL key and left click on the public URL.
The text-generation-webui web page should load in your default browser.
tip: use a ssh tunnel for a simple localhost URL
To use a simple localhost URL in your browser, like http://localhost:7860 you can use a ssh tunnel by running this command:
ssh -i path-to-my.pem -N -L 7860:localhost:7860 ubuntu@<public IP address of EC2 instance>
Now, the local link should work from your browser: http://localhost:7860/
tip: check the Ubuntu firewall
If you have difficulties connecting, then you can check if the Ubuntu firewall is active:
sudo ufw status
If the firewall is active, you can configure Ubuntu firewall top open the necessary ports:
sudo ufw allow 5000/tcp sudo ufw allow 7860/tcp
For more details about the Ubuntu firewall, see https://www.cyberciti.biz/faq/how-to-configure-firewall-with-ufw-on-ubuntu-20-04-lts/#Open_ports_with_ufw
tip: checking disk space
The LLMs and the Python libraries can consume a lot of disk space.
To check disk space from the Ubuntu terminal:
df -h
To see what folders are taking up the most space:
sudo du -cha - max-depth=1 / | grep -E "M|G"