Using ChatGPT-like LLMs locally: a short setup guide

This is a long form article, you can read it in https://habla.news/a/naddr1qqxnzdesxgurxv3hxgur2vfeqgsqfrktznwkq72z058z2rkresazp6etkheuhzueu0jd8wpy0s52c7qrqsqqqa28wv065y

I bought a new desktop PC this year with the goal of getting back into GPU related programming and learning about AI. I wasn’t comfortable with all the sign up requirements for products like ChatGPT and AI image generation sites, and thought running open source equivalents locally would be better. Up until recently I’d been playing with stable diffusion via various UIs locally, but still needed to try running a text generation LLM. I’d previously tried running Alpaca-electron, but experienced a lot of instability issues.

After a few minutes of searching, This blog led me to find text-generation-webui which proclaims a goal of being the Automatic1111 for text generation. This sounded like just what I wanted, and was.

Requirements / Environment

I’m running Ubuntu, but the same process probably works on Windoze. To follow exactly you need a terminal and git.
An Nvidia GPU. Although this apparently works with CPU-only too (but it’s probably very slow/limited)

Setting up `text-generation-webui`

text-generation-webui is a webapp you can run locally that supports a whole bunch of LLM stuff that I don’t understand. But seems to use a convention-over-configuration approach meaning that most of the defaults just work, but you can configure things when you know what you’re doing.

The steps to get up and running are simple:

cd to the location you want to install it
```
cd /path/to/install
```

Clone the Git repo and enter the created directory


git clone [email protected]:oobabooga/text-generation-webui.git && cd text-generation-webui

Run the script ./start_linux.sh which automatically installs dependencies and then runs the app.
When prompted, I chose NVIDIA for GPU, and N to go with the latest CUDA versions. Once everythign is downloaded, you wil be up and running.

Installing a model

The app is capable of a lot, but it can’t do anything without an LLM model to work with.

Once the app has started, navigate to http://127.0.0.1:7860/ (where the app is listening) and switch to the Model tab at the top. You now need to pick a model. I really don’t know much about these, but HuggingFace is the “GitHub of models”. You’ll find models of all different types, differing by quantization method and other stuff I don’t really understand (YET!).

I read about a recently released model that had been getting a lot of praise, OpenHermes2.5. So I looked on HuggingFace and found TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ which is a quantized version of Yağız Çalık’s merged model. The notes explain that there are different quantisation parameters available on different branches. I chose the 4bit-32g version solely because it had “highest inference quality”.

The model variant you choose will depend on the conditions of your computing environment (eg. VRAM of your graphics car).

Whatever model you choose first, enter the identifier in the Download text field in the format username/model:branch. So in our case it’s

TheBloke/OpenHermes-2.5-neural-chat-v3-3-Slerp-GPTQ:gptq-4bit-32g-actorder_True

Download example, with hugging face name/model in text field

Clicking Get file list will confirm that it’s working. Then click Download to start pulling it to your machine. You can watch the progress in the terminal.

Once the model has downloaded, you’re ready to start chatting away. Select the Chat tab in the top navigation bar.

Chat example

Remember that each time you restart the app, you’ll need to “Load” (not Download) the Model using the list at the top left under the Model tab.

Model selector, for loading already installed models

I actually tried a Llama-2 model before OpenHermes2.5, but the difference in quality and speed when I switched to OpenHermes was so insane, that I skipped mentioning it.

Launcher

Now that you’re able to chat, you might find it convenient to create an Ubuntu launcher so that you don’t have to run the script from the terminal each time you want to start it up.
See my blog post on how to do that: naddr1qq…zvaf

Using ChatGPT-like LLMs locally: a short setup guide

Requirements / Environment

Setting up text-generation-webui

Installing a model

Launcher

Setting up `text-generation-webui`