How to run local LLMs (Local LLM inference)

3 min readDec 16, 2024

This post demonstrates how to setup LLM AI inference on your personal computer or Laptop by running LLMs locally, and there are many reasons you might want to consider doing this. To name a few:

Privacy: This may well be the biggest benefit of running a local LLM. You would want your prompts and conversations to remain private and with a Local LLM, your prompts will never leave your localhost. You can disconnect your internet connection and this will still work.
Ability to customize: Power users may be looking to tweak configurations of the LLM, e.g. temperature, context length, set a system prompt, etc. All of this and more is possible with a locally deployed LLM.
Cost: A local LLM runs free of cost; no subscription, or input/output token costs involved

Prerequisites

Docker engine: Have your docker engine up and running before you go ahead. I recommend installing Docker Desktop
System Requirements: ≥16 GB RAM; which models you can run will be limited by the available memory on your system. Any modern chipset (think Intel i7^, Apple silicon) would do a decent job with the smaller models.

Step 1: Run and setup the ollama container

Execute the below command to pull and spin up the ollama docker container.


# Pull and run the ollama docker container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Access the container shell
docker exec -it ollama /bin/bash

# Pull and run a model to start an interactive session
ollama run llama3.2

# To see available commands
/?

# Et voila! You can start prompting here.

If everything goes well, you would see something like this:

And just like that, you have a completely functional LLM running on your local environment and responding to your prompts. Everything else from this point on is essentially optional, you can simply continue prompting from the terminal.

Step 2: Run open-webui container

# Pull and run the ollama docker container
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Access http://localhost:3000 in your web browser, go through the admin setup process on the UI, and you should see a familiar UI:

This will allow you to select LLM, start/resume/delete conversations, set parameters and lot more. Look up the available models here and start experimenting.

In this blog, I have used Ollama and Open-webui with Docker to setup local LLM inference, but there are many other options out there to achieve the same.

It’s also worth mentioning, that the responses can be relatively slower depending on machine configuration and the resources dedicated for docker engine.

If you think your system has enough resources but, things are still sluggish, review your docker resource settings.

How to run local LLMs (Local LLM inference)

Prerequisites

Step 1: Run and setup the ollama container

Step 2: Run open-webui container

References

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Jaynil Patel

No responses yet