🧠 All Things AI
Advanced

Hosted Fine-Tuning Services

Not every team needs to manage GPU infrastructure to fine-tune a model. Hosted fine-tuning services handle the hardware and training pipeline — you provide the data and get back a fine-tuned model endpoint. This page covers the major options and when each makes sense.

OpenAI Fine-Tuning

OpenAI supports fine-tuning for GPT-4o mini and GPT-4o. This is the simplest path if you're already using OpenAI and want to customise model behaviour.

Data Format

OpenAI fine-tuning uses JSONL format with conversation turns:

{"messages": [
  {"role": "system", "content": "You are a helpful product assistant."},
  {"role": "user", "content": "What is your return policy?"},
  {"role": "assistant", "content": "Our return policy allows returns within 30 days..."}
]}
{"messages": [...]}

Process

  1. Upload training JSONL via Files API
  2. Create a fine-tuning job specifying base model, training file, epochs, learning rate
  3. Monitor job status via API or dashboard
  4. Fine-tuned model is assigned a model ID and accessible via standard Chat Completions

Pricing

  • Training: $0.025 per 1,000 training tokens for GPT-4o mini
  • Inference: 2–4× higher than base model (fine-tuned endpoint pricing)
  • 1 epoch over 10,000 examples of 500 tokens each = 5M tokens = ~$125

Limitations

  • You cannot see or export the fine-tuned weights — they stay on OpenAI's infrastructure
  • Fine-tuned models are not automatically updated when base models update
  • Data sent to OpenAI; privacy implications apply

Google Vertex AI Supervised Tuning

Vertex AI supports supervised fine-tuning for Gemini models:

  • Dataset stored in Google Cloud Storage (GCS)
  • Supports Gemini Flash and Pro models
  • Managed training pipeline — no GPU provisioning needed
  • Fine-tuned models deployed to Vertex AI endpoints
  • Training data stays within your GCP project/region

Particularly useful for teams already on GCP wanting to fine-tune Gemini for specific enterprise tasks with data residency requirements.

Together.ai Fine-Tuning

Together.ai offers fine-tuning for open-weight models (LLaMA, Mistral, Qwen, etc.) at competitive prices:

  • Models available: LLaMA 3.1 (8B, 70B), Mistral 7B, Qwen 2.5
  • Training cost: ~$0.002–0.01 per 1K training tokens (significantly cheaper than OpenAI)
  • Fine-tuned model deployed as a private Together.ai endpoint
  • Can also download the fine-tuned weights (unlike OpenAI)
  • JSONL format similar to OpenAI

Best choice when: you want to fine-tune an open-weight model without managing GPUs, at lower cost than OpenAI, and with the option to export weights for self-hosting.

Hugging Face AutoTrain

AutoTrain provides a no-code/low-code interface for fine-tuning on Hugging Face infrastructure:

  • Supports text classification, summarisation, question answering, causal LM
  • Upload dataset via UI; select model; configure hyperparameters visually
  • Fine-tuned model saved directly to your Hugging Face repository
  • Can be deployed immediately via HF Inference Endpoints
  • Pay per compute hour (A10G, A100 GPU options)

Axolotl: Self-Managed on Cloud GPUs

For teams wanting more control without full infrastructure ownership, running Axolotl on a rented cloud GPU (Lambda Labs, Vast.ai, RunPod) is a practical middle ground:

  • Rent a single A100 80GB at ~$1–3/hour
  • Run QLoRA fine-tuning via Axolotl YAML config
  • Full control over model, data, and training parameters
  • Save fine-tuned weights to your own storage
  • Cost for a typical fine-tuning run: $10–50 total

Data Preparation: Common Pitfalls

Fine-tuning fails because of bad data more than anything else

  • Duplicates: Deduplication is essential; duplicated examples overweight certain patterns
  • Inconsistent quality: Mix of good and bad examples confuses the model
  • Wrong format: Missing system prompts, wrong role names, malformed JSON
  • Too narrow: Examples only cover a slice of your actual production inputs
  • Label leakage: Training examples that won't exist in production (e.g., include the answer in the question)
  • Insufficient test split: If you use all data for training, you can't evaluate properly

Service Comparison

ServiceBase modelsExport weights?Relative costBest for
OpenAIGPT-4o mini, GPT-4oNoHighestExisting OpenAI users; easiest integration
Google Vertex AIGemini modelsNoHighGCP-native teams; Gemini customisation
Together.aiLlama, Mistral, QwenYesMediumOpen-weight fine-tuning; cost-conscious
HF AutoTrainAny HF modelYesMediumHF ecosystem users; broadest model selection
Axolotl + rented GPUAny open-weightYes (you own everything)LowestFull control; sensitive data; advanced users

Deployment After Fine-Tuning

Your options after fine-tuning depend on whether you can export the weights:

  • No weight export (OpenAI, Vertex): Use the provider's endpoint directly; model stays in their infrastructure
  • Weights exported: Upload to Hugging Face private repo → deploy via HF Inference Endpoints, or self-host via vLLM/Ollama after GGUF conversion
  • Convert to GGUF for Ollama: Use llama.cpp convert scripts to quantise exported weights for local serving

Checklist: Do You Understand This?

  • What JSONL format does OpenAI fine-tuning expect?
  • What is the key advantage of Together.ai over OpenAI for fine-tuning?
  • Name three common data preparation mistakes that cause fine-tuning to fail.
  • What is the "Axolotl + rented GPU" approach and when does it make sense?
  • After fine-tuning with weight export, how would you serve the model locally via Ollama?