How to Self-Host Any LLM – Step by Step Guide

Updated on Apr 22, 2026 · 9 mins read

self-hosted AI Ollama Open WebUI Docker LLM Deployment AI Privacy

How to Self-Host Any LLM – Step by Step Guide

Self-hosting LLMs is no longer just for infra teams. With tools like Ollama and Open WebUI, you can run capable models on your own machine, keep conversations private, and avoid unpredictable API bills. For developers, founders, and small teams, this setup gives you more control without adding much operational complexity.

In this guide, you will build a local AI stack using Ollama + Open WebUI on Docker. By the end, you will have a ChatGPT-style interface running on your system, with an optional secure way to share it outside your local network.

Summary

Install Docker
- Get Docker Desktop from docker.com
- Ensure Docker is running before proceeding (green whale icon in system tray)
- Verify installation: docker --version
Run Ollama in Docker
```
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```
- What this does: Creates persistent storage, exposes API on port 11434
- Verify: Check with docker ps to see “ollama” container running

Download an LLM Model

# Pull a model into the Ollama container
docker exec -it ollama ollama pull llama3.2:3b
# Or try a more capable model
docker exec -it ollama ollama pull llama3.1:8b

Model sizes: 3B (~2GB), 8B (~4.7GB), 70B (~40GB)

Getting a Chat interface using Open WebUI

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Access Your Self-Hosted LLM
- Open http://localhost:3000 in your browser
- First time: Create admin account (username/password of your choice)
- Model selection: Choose from dropdown in top-right corner

Why Self-Host Your LLMs?

Complete Control and Privacy

When you self-host a language model, prompts and responses stay under your control. This is valuable for teams working with private code, internal docs, customer data, or compliance-sensitive workflows. Instead of depending on third-party defaults, you decide how data is stored, who can access it, and how long it is retained.

Cost Effectiveness at Scale

Cloud API pricing is convenient early on, but it can become expensive once usage scales across teammates, automations, and production workloads. Self-hosting replaces per-token billing with fixed infrastructure costs, which makes budgeting much easier. If your usage is steady and high, local hosting can quickly become the cheaper long-term option.

Customization and Flexibility

A self-hosted stack is also easier to customize. You can test multiple models, switch based on task type (coding, summarization, extraction), and integrate directly with internal tools and pipelines. This flexibility is hard to match when you are limited to one external API provider and its roadmap.

Prerequisites and System Requirements

Hardware Considerations

Hardware needs depend heavily on model size. Smaller models such as Llama 3.2 3B work on modest machines, while larger models demand far more RAM/VRAM and storage bandwidth. The practical approach is to start with a lightweight model, validate your workflow, then scale to larger models only when you need better quality.

A dedicated GPU improves response speed significantly, but it is optional for getting started. Ollama can run on CPU-only systems and will use available acceleration when present. If you are new to self-hosting, begin with what you already have and benchmark real performance before investing in upgrades.

Software Prerequisites

You only need Docker and a terminal. We will run both Ollama and Open WebUI as containers, so your host system stays clean and dependencies remain isolated. This also makes updates, restarts, and troubleshooting much easier later.

Step 1: Setting Up Docker

Installing Docker Desktop

Docker Desktop provides the easiest way to run containerized applications on your local machine. Visit docker.com and download Docker Desktop for your operating system. The installation file is typically 400-500MB and requires administrator privileges to install.

System Requirements:

Windows: Windows 10 64-bit Pro, Enterprise, or Education (Build 19041 or higher), or Windows 11
macOS: macOS 10.15 or newer (Intel or Apple Silicon)
Linux: 64-bit kernel and KVM virtualization support

The installation process is straightforward – just follow the installer prompts and restart your computer if prompted. On Windows, you may need to enable WSL 2 (Windows Subsystem for Linux) if it’s not already enabled. The installer will guide you through this process.

After installation, launch Docker Desktop and wait for it to fully start up. The initial startup can take 2-3 minutes as Docker initializes its services. You’ll know it’s ready when the Docker icon in your system tray shows a green status with “Docker Desktop is running” tooltip.

Configuring Docker Resources

Before proceeding, allocate enough resources to Docker. Open Docker Desktop settings and go to the Resources section. For LLM workloads, a good starting point is:

Memory: At least 8GB (12GB+ for larger models)
CPU: 4+ cores for better performance
Disk Space: 50GB+ for models and container images

Confirming Docker is Running

Before proceeding, verify that Docker is working correctly by opening a terminal and running docker --version. You should see version information like “Docker version 24.0.x”. Also run docker info to confirm the Docker daemon is running properly. If Docker Desktop is running but you’re getting “command not found” errors, try restarting your terminal or logging out and back in to refresh your environment variables.

Step 2: Running Ollama in Docker

Setting Up the Ollama Container

Instead of installing Ollama directly on your system, we’ll run it inside a Docker container. This approach keeps everything isolated and makes management much easier. The official Ollama Docker image is approximately 1.5GB and handles all the complexity of running language models.

Run the following command to start the Ollama container:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Command Breakdown:

-d: Runs container in detached mode (background)
-v ollama:/root/.ollama: Creates named volume “ollama” for persistent model storage
-p 11434:11434: Maps host port 11434 to container port 11434 (Ollama API)
--name ollama: Assigns container name “ollama” for easy reference
ollama/ollama: Official Ollama Docker image from Docker Hub

What happens during startup:

Docker downloads the Ollama image (first time only, ~1.5GB)
Container starts and initializes Ollama service (~30-60 seconds)
API becomes available on http://localhost:11434
Volume is created for persistent model storage

Verifying the Ollama Container

You can check that the Ollama container is running properly by using docker ps to see active containers. You should see output similar to:

CONTAINER ID   IMAGE           COMMAND               STATUS          PORTS                      NAMES
abc123def456   ollama/ollama   "/bin/ollama serve"   Up 2 minutes    0.0.0.0:11434->11434/tcp   ollama

Additional verification commands:

docker logs ollama: View container startup logs
curl http://localhost:11434/api/tags: Test API connectivity (should return empty model list initially)
docker exec -it ollama ollama --version: Check Ollama version inside container

Step 3: Downloading Your First LLM Model

Choosing the Right Model

Ollama supports dozens of popular open-source models, each with different capabilities and resource requirements. Here’s a breakdown of popular models and their specifications:

Recommended Starting Models:

Llama 3.2 3B (~2GB): Great for beginners, handles basic tasks, runs on 8GB RAM systems
Llama 3.1 8B (~4.7GB): Better performance, good for most tasks, needs 12GB+ RAM
Mistral 7B (~4.1GB): Excellent reasoning, multilingual support
CodeLlama 7B (~3.8GB): Specialized for programming tasks

Downloading Models into the Container

Because Ollama is running in a container, you will interact with it using docker exec. This runs Ollama commands directly inside the live container.

Note: You can explore the full list of available models here: Ollama Model Library

Download your first model:

docker exec -it ollama ollama pull llama3.2:3b

If you want a lighter model for quick local testing, you can also try:

docker exec -it ollama ollama pull gemma3:270m

What happens during download:

Ollama contacts the model registry and starts fetching model layers
Downloads model files in chunks (shows progress bar)
Verifies file integrity and extracts model
Model becomes available immediately after download

Step 4: Setting Up Open WebUI with Docker

Understanding Open WebUI

Open WebUI provides a modern, ChatGPT-like interface for interacting with your containerized models. Instead of using command-line interactions, you get a clean web interface with features like conversation history, model switching, document uploads, and even image generation capabilities with compatible models. It’s designed to work seamlessly with Ollama and provides a much more user-friendly experience.

Since both Ollama and Open WebUI run in separate Docker containers, they communicate through Docker’s internal networking. This containerized approach keeps everything organized and makes it easy to manage, update, or troubleshoot each component independently.

Running the Open WebUI Container

Setting up Open WebUI requires just one Docker command, but it’s a bit lengthy due to all the configuration options. The Open WebUI image is approximately 1.2GB and includes a complete web application with chat interface, user management, and model integration.

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Detailed Command Breakdown:

-d: Run container in detached mode (background)
-p 3000:8080: Map host port 3000 to container port 8080 (web interface)
--add-host=host.docker.internal:host-gateway: Enable container to reach host services (Ollama)
-v open-webui:/app/backend/data: Persistent storage for user data, chat history, settings
--name open-webui: Container name for easy reference
--restart always: Auto-restart container on system reboot
ghcr.io/open-webui/open-webui:main: Official Open WebUI image from GitHub Container Registry

Container startup process:

Docker downloads Open WebUI image (first time only, ~1.2GB)
Container initializes web server and database (~30-45 seconds)
Web interface becomes available at http://localhost:3000
Automatic connection attempt to Ollama API on port 11434

Verification steps:

# Check container status
docker ps

# View startup logs
docker logs open-webui

# Test web interface accessibility
curl http://localhost:3000

Accessing Your Web Interface

Once the Docker container is running, open your web browser and navigate to http://localhost:3000. You should see the Open WebUI login screen with a clean, modern interface.

First-time setup process:

Create Admin Account: Enter your desired username and password (this stays local)
Email (Optional): You can skip this field for local installations
Account Creation: Click “Create Account” - you’ll be the admin user
Automatic Login: You’ll be redirected to the main chat interface

Initial configuration checklist:

Model Detection: Open WebUI automatically scans for Ollama models on localhost:11434
Model Selection: Available models appear in dropdown (top-right corner)
Connection Status: Green indicator shows successful Ollama connection
Interface Language: Default is English, but supports multiple languages

If models don’t appear:

Wait 30-60 seconds for initial connection
Check both containers are running: docker ps
Verify Ollama API: curl http://localhost:11434/api/tags
Restart Open WebUI container: docker restart open-webui

Sometimes you need to demo your setup, collaborate with a remote teammate, or access your local UI while traveling. Pinggy is a quick way to expose Open WebUI securely without touching router/NAT settings.

Run this command to share your Open WebUI interface:

ssh -p 443 -R0:localhost:3000 free.pinggy.io

Pinggy will generate a public HTTPS URL like https://abc123.pinggy.link that you can share with others. The connection is encrypted and secure, and you can terminate the tunnel anytime by stopping the command.

Conclusion

You now have a practical self-hosted AI setup: Ollama for local inference and Open WebUI for a clean chat interface. It is private, repeatable, and cost-predictable, which makes it a strong base for both personal and team workflows.

From here, you can improve quality by trying larger models, improve speed with GPU tuning, and integrate Ollama endpoints into internal tools or apps. The biggest advantage is ownership: you control your models, your data path, and your operating costs as your usage grows.