Intermediate

Build vs Buy vs Fine-tune

Every AI initiative reaches this decision: do we buy an off-the-shelf solution, call a model API and build on top of it, fine-tune a foundation model, or train something custom? Getting this wrong in either direction is expensive — buying when you should build leaves competitive capability on the table; building when you should buy burns engineering time on commoditised work. This page gives you a structured framework for making the call.

The Decision Spectrum

Build vs buy is not binary. There is a spectrum of five positions, each with a different cost, time-to-value, control level, and differentiation potential:

1. Buy (SaaS with no customisation)

Use a vendor product as-is. Fastest time-to-value. Zero differentiation. Vendor lock-in risk. Examples: off-the-shelf AI meeting notes (Otter, Fireflies), basic AI customer service bots, resume screening tools.

2. Configure (SaaS with domain customisation)

Use a vendor product but configure it — upload documents, set personas, define workflows. Low engineering effort, some domain-specific adaptation. Examples: Microsoft 365 Copilot configured for your tenant, Salesforce Einstein configured for your CRM data.

3. Prompt + RAG (API with retrieval)

Call a foundation model API with carefully crafted prompts and your own data via RAG. Medium engineering effort. Significant customisation without training costs. Full control over data and prompts. The most common production pattern in 2025.

4. Fine-tune (adapt a foundation model)

Train a pre-existing foundation model on your data to adapt its style, format, or domain knowledge. Requires labelled training data and ML expertise. Used when prompting + RAG cannot reach required performance. Examples: fine-tuning on internal technical documentation, legal clause generation in house style.

5. Build (custom model or architecture)

Train a model from scratch or build a custom architecture on proprietary data. Maximum differentiation, maximum cost, maximum time. Rarely justified for language tasks; more applicable to specialised perception models (medical imaging, industrial quality control).

When to Buy

Commoditised capability

If the AI capability is not a source of competitive advantage and a vendor already solves it adequately, buying is almost always right. Writing assistance, meeting summarisation, and basic code completion are now commoditised.

Speed-to-value is the constraint

If you need AI value within weeks (not months), buying compresses the timeline dramatically. Build cycles for even moderate custom AI are measured in quarters.

Vendor brings compliance coverage

Enterprise-grade vendors (Microsoft, Salesforce, ServiceNow) invest heavily in regulatory compliance, data residency, BAAs, and security certifications. Buying can be faster to compliant than building in regulated industries.

No internal ML talent

Building and maintaining custom AI requires ML engineers, data engineers, and MLOps capability. If you do not have this team, buying is the realistic option — hiring before buying is a longer path.

When to Fine-tune or Build Custom

Proprietary data is the competitive moat

If your differentiation is a proprietary dataset that no vendor has access to — clinical trial data, decades of financial transactions, unique sensor readings — fine-tuning on that data creates a capability a vendor cannot replicate.

Data cannot leave your environment

Privacy law (HIPAA, GDPR), contractual obligations, or national security classification may prevent sending data to a vendor API. Self-hosted or fine-tuned models are then the only compliant path.

Vendor performance is insufficient

Run evaluations first. If a general-purpose model after careful prompting and RAG still fails your performance bar on your specific task, fine-tuning on task-specific data is the next step before considering custom builds.

Cost at scale makes APIs prohibitive

API costs are per-token and scale linearly with usage. At high volumes (millions of calls per day), the economics of self-hosted open-weight models can undercut API pricing by 5–10× — but only if you have the MLOps capability to operate them.

Decision Questions Checklist

Run through these questions for any AI capability decision. The answers will position you clearly on the spectrum.

Is this capability a source of competitive differentiation, or is it table stakes?
Does a vendor solution already exist that meets our requirements at 80% or better?
Can our data leave our environment, or does regulation or contract prevent it?
Do we have the internal ML/MLOps talent to build, fine-tune, and maintain this?
What is our time budget? Can we wait 3–6 months for custom development?
What is the usage volume? At what scale does API cost exceed self-hosting cost?
Have we run evals? Has vendor performance been tested against our specific task and data?
What is the vendor lock-in risk if we buy? What does switching cost?

Most Enterprises Blend

The 2025 enterprise pattern is not pure build or pure buy — it is a layered blend. A typical production stack looks like:

Productivity layer (buy): Microsoft 365 Copilot, Salesforce Einstein for standard employee and CRM AI
Application layer (prompt + RAG): Internal tools built on OpenAI, Anthropic, or Google APIs with proprietary data retrieved via RAG
Domain-specialised (fine-tune): Task-specific models fine-tuned on proprietary datasets for highest-value, performance-sensitive applications
Edge or air-gapped (self-host): Open-weight models (Llama, Mistral) self-hosted where data cannot leave the environment

The winning teams test first

The teams that make this decision well are the ones who prototype fast and run real evaluations before committing. Build a quick prototype with a vendor API; run evals on your actual task; only invest in fine-tuning or custom builds when eval results prove they are necessary.

Checklist: Do You Understand This?

Describe the five positions on the build-vs-buy spectrum and when each is appropriate.
When does data residency or privacy law force a build (or fine-tune) decision?
At what point in the evaluation process should you consider fine-tuning over prompting + RAG?
What is the typical layered blend that most enterprises use in 2025?
Why is evaluation (running real evals on your task and data) essential before committing to any option?
When does API cost at scale make self-hosting an open-weight model economically sensible?