Beginner

Getting Started with Ollama

From zero to running a local LLM in about five minutes. No cloud account, no API key, no complicated setup.

Step 1 — Install

macOS

Download the .dmg from ollama.com — drag to Applications and open. Ollama appears in your menu bar.

Linux

curl -fsSL https://ollama.com/install.sh | sh — installs and starts the Ollama service automatically.

Windows

Download the .exe installer from ollama.com. Runs as a background service after install. Native ARM64 build available for Copilot+ PCs.

After install, Ollama runs as a background service listening on http://localhost:11434. You don't need to start it manually — it's always ready.

Step 2 — Pull a Model

Open a terminal. Pull your first model with ollama pull:

# Great first model — fast, capable, small

ollama pull llama3.2

# Reasoning-focused (good for logic and code)

ollama pull deepseek-r1:7b

# Multimodal — accepts images

ollama pull llama3.2-vision

# Lightweight — runs on 4 GB RAM

ollama pull phi4-mini

The first pull downloads the model weights (2–8 GB typical for small models). Subsequent runs are instant — the model is cached locally.

Step 3 — Run Interactively

ollama run llama3.2

>>> Why is the sky blue?

The sky appears blue because of Rayleigh scattering...

>>> /bye

Type /bye or press Ctrl+D to exit. Type /help inside the chat for commands like /clear (reset context) and /show info (model details).

Key CLI Commands

Command	What it does
ollama pull <model>	Download a model from the library
ollama run <model>	Start an interactive chat session
ollama list	Show all locally downloaded models
ollama show <model>	Show model details (size, context, Modelfile)
ollama ps	Show models currently loaded in memory
ollama rm <model>	Delete a model from disk
ollama serve	Start the Ollama API server manually (runs automatically on install)
ollama run <model> <prompt>	One-shot prompt without entering interactive mode

Hardware Requirements

Ollama will run on almost any machine, but GPU matters for speed. Here's a practical guide:

Hardware	What you can run	Speed
CPU only (16 GB RAM)	3B–7B models (Q4)	5–15 tokens/sec — usable but slow
8 GB VRAM (e.g. RTX 3060)	7B–8B models comfortably	40–80 tokens/sec
16 GB VRAM (e.g. RTX 4080)	13B–14B models	60–120 tokens/sec
24 GB VRAM (e.g. RTX 4090)	32B models, or 70B partially offloaded	100–200 tokens/sec on 32B
Apple M2/M3 (16 GB unified)	13B–27B models using shared memory	60–100 tokens/sec — excellent value
Apple M3 Max / M4 Max (128 GB)	70B+ models fully in memory	50–80 tokens/sec on 70B

Modelfile — Customizing a Model

A Modelfile lets you customize any base model with a system prompt, parameter settings, and example conversations. It works like a Dockerfile — you define a base and layer on modifications, then create a named custom model.

# Modelfile — save as ./Modelfile

FROM llama3.2

# Set a persistent system prompt

SYSTEM """

You are a concise technical assistant. Answer in bullet points.

Never pad responses with unnecessary pleasantries.

"""

# Tune parameters

PARAMETER temperature 0.3

PARAMETER num_ctx 8192

# Create the custom model

$ ollama create my-assistant -f ./Modelfile

$ ollama run my-assistant

Key Modelfile instructions:

FROM — base model to build on
SYSTEM — persistent system prompt (baked into the model config)
PARAMETER temperature — creativity (0 = deterministic, 1 = creative)
PARAMETER num_ctx — context window size in tokens
MESSAGE — pre-load conversation history (few-shot examples)
ADAPTER — attach a LoRA adapter to the base model

Next Steps

Browse available models → Models in Ollama
Use Ollama from your own app → Using the Ollama API
Add a chat UI → install Open WebUI (docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main)

Checklist: Do You Understand This?

Can you install Ollama and pull a model from memory?
Do you know the difference between ollama pull and ollama run?
Can you create a Modelfile with a custom system prompt and temperature?
Do you know what hardware you'd need to run a 13B model at reasonable speed?