Beginner

Getting Started with Ollama

From zero to running a local LLM in about five minutes. No cloud account, no API key, no complicated setup.

Step 1 — Install

1
macOS

Download the .dmg from ollama.com — drag to Applications and open. Ollama appears in your menu bar.

2
Linux

curl -fsSL https://ollama.com/install.sh | sh — installs and starts the Ollama service automatically.

3
Windows

Download the .exe installer from ollama.com. Runs as a background service after install. Native ARM64 build available for Copilot+ PCs.

After install, Ollama runs as a background service listening on http://localhost:11434. You don't need to start it manually — it's always ready.

Step 2 — Pull a Model

Open a terminal. Pull your first model with ollama pull:

# Great first model — fast, capable, small
ollama pull llama3.2
# Reasoning-focused (good for logic and code)
ollama pull deepseek-r1:7b
# Multimodal — accepts images
ollama pull llama3.2-vision
# Lightweight — runs on 4 GB RAM
ollama pull phi4-mini

The first pull downloads the model weights (2–8 GB typical for small models). Subsequent runs are instant — the model is cached locally.

Step 3 — Run Interactively

ollama run llama3.2
>>> Why is the sky blue?
The sky appears blue because of Rayleigh scattering...
>>> /bye

Type /bye or press Ctrl+D to exit. Type /help inside the chat for commands like /clear (reset context) and /show info (model details).

Key CLI Commands

CommandWhat it does
ollama pull <model>Download a model from the library
ollama run <model>Start an interactive chat session
ollama listShow all locally downloaded models
ollama show <model>Show model details (size, context, Modelfile)
ollama psShow models currently loaded in memory
ollama rm <model>Delete a model from disk
ollama serveStart the Ollama API server manually (runs automatically on install)
ollama run <model> <prompt>One-shot prompt without entering interactive mode

Hardware Requirements

Ollama will run on almost any machine, but GPU matters for speed. Here's a practical guide:

HardwareWhat you can runSpeed
CPU only (16 GB RAM)3B–7B models (Q4)5–15 tokens/sec — usable but slow
8 GB VRAM (e.g. RTX 3060)7B–8B models comfortably40–80 tokens/sec
16 GB VRAM (e.g. RTX 4080)13B–14B models60–120 tokens/sec
24 GB VRAM (e.g. RTX 4090)32B models, or 70B partially offloaded100–200 tokens/sec on 32B
Apple M2/M3 (16 GB unified)13B–27B models using shared memory60–100 tokens/sec — excellent value
Apple M3 Max / M4 Max (128 GB)70B+ models fully in memory50–80 tokens/sec on 70B

Modelfile — Customizing a Model

A Modelfile lets you customize any base model with a system prompt, parameter settings, and example conversations. It works like a Dockerfile — you define a base and layer on modifications, then create a named custom model.

# Modelfile — save as ./Modelfile
FROM llama3.2
# Set a persistent system prompt
SYSTEM """
You are a concise technical assistant. Answer in bullet points.
Never pad responses with unnecessary pleasantries.
"""
# Tune parameters
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
# Create the custom model
$ ollama create my-assistant -f ./Modelfile
$ ollama run my-assistant

Key Modelfile instructions:

  • FROM — base model to build on
  • SYSTEM — persistent system prompt (baked into the model config)
  • PARAMETER temperature — creativity (0 = deterministic, 1 = creative)
  • PARAMETER num_ctx — context window size in tokens
  • MESSAGE — pre-load conversation history (few-shot examples)
  • ADAPTER — attach a LoRA adapter to the base model

Next Steps

  • Browse available models → Models in Ollama
  • Use Ollama from your own app → Using the Ollama API
  • Add a chat UI → install Open WebUI (docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main)

Checklist: Do You Understand This?

  • Can you install Ollama and pull a model from memory?
  • Do you know the difference between ollama pull and ollama run?
  • Can you create a Modelfile with a custom system prompt and temperature?
  • Do you know what hardware you'd need to run a 13B model at reasonable speed?

Page built: 01 Jun 2026