AI Cost Measurement Dashboards
A cost dashboard is the foundation of AI FinOps. Without visibility into where spend is going, you cannot optimise it. The right dashboard structure differs by audience: finance and leadership need budget vs actual; engineering needs per-request cost and efficiency trends; product teams need cost per user and cost per feature. Build all three views from the same underlying data pipeline.
What to Track
| Metric | Why it matters | Dimension to slice by |
|---|---|---|
| Total spend | Budget vs actual; trend over time | Model, use case, team, day/week/month |
| Cost per request | Efficiency of individual calls; P50/P95 distribution | Use case, model version, prompt variant |
| Input vs output token split | Identifies whether input or output is driving cost growth | Use case, model |
| Cache hit rate (and saved cost) | Direct measurement of caching ROI | Cache type, use case |
| Cost per successful task | Unit economics; frames AI spend in business terms | Use case (e.g., cost per resolved support ticket) |
| Token efficiency ratio | Output quality per token; declining ratio = worsening efficiency | Use case, prompt version |
Dashboard Structure by Audience
Executive view
- Total AI spend this month vs budget
- Month-over-month spend trend
- Spend by product/team (pie or bar chart)
- Projected month-end spend vs budget
- Top 3 cost drivers
- Cost per unit of business output (e.g., per ticket resolved)
Engineering view
- Cost per request (P50/P95) by use case
- Input token P95 vs baseline (growing?)
- Output token P95 vs max_tokens limit (hitting cap?)
- Cache hit rate by use case
- Model routing distribution (which tier is handling what)
- Cost anomalies in last 24 hours
Product view
- Cost per active user per day
- Cost per feature (which features are expensive?)
- High-cost user cohort (P99 users)
- Free tier AI cost exposure (what does a "free" user cost?)
- AI margin: gross margin impact of AI COGS
Data Pipeline
# Every LLM call must emit cost metadata to the pipeline
def log_llm_call(response, context: CallContext):
emit_event({
"timestamp": now(),
"use_case": context.use_case, # "support_chat", "doc_analysis"
"team": context.team, # "product", "data-science"
"user_id": context.user_id,
"model": response.model,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"cache_read_tokens": response.usage.cache_read_input_tokens,
"cost_usd": calculate_cost(response.usage, response.model),
"task_success": context.task_success, # set after result evaluated
})
Tooling options
- Langfuse: built-in cost dashboard; open source; per-trace cost tracking
- Custom to Datadog: emit custom metrics from LLM middleware; use Datadog dashboards for org-wide visibility
- Data warehouse: stream events to BigQuery/Snowflake; build dashboards in Looker/Metabase for cross-team reporting
- LiteLLM dashboard: built-in spend tracking by virtual key, team, and model
Pipeline requirements
- Every LLM call must be tagged before it fires — retrofitting tags is painful
- Calculate cost at emission time (store raw tokens + derive cost on query)
- Include task_success signal — needed for cost per successful task metric
- Retention: keep 13 months minimum for year-over-year comparison
- Latency of pipeline: near-realtime (under 5 minutes) for anomaly alerting to be useful
Unit Economics
Unit economics frames AI cost in terms of business outcomes — the language of product and finance, not engineering. Without unit economics, you cannot make a business case for AI investment or assess whether AI is cost-effective compared to alternatives.
| Use case | Unit metric | Business frame |
|---|---|---|
| Customer support chatbot | Cost per resolved ticket | Compare to cost per human-resolved ticket; break-even analysis |
| Document processing | Cost per document processed | Compare to manual processing cost; volume threshold for ROI |
| Code assistant | Cost per developer per day | Assess against productivity lift; licence vs build decision |
| B2B SaaS AI feature | AI COGS per customer per month | Ensure AI COGS is below the margin you need; inform feature pricing |
Anomaly Alerts
- Daily spend > 2× rolling 7-day average → alert immediately; investigate before spend continues
- Single use case consuming > 50% of total daily budget → ring-fence with use-case budget cap
- Per-request cost P95 growing > 30% week-over-week → prompt or context length drift
- Cache hit rate dropping > 20% from baseline → system prompt or prefix changed; verify intentional
- Zero spend on a use case that is normally active → possible silent failure or routing error
Chargeback Model
Allocating AI costs to the teams that generate them creates accountability and motivates efficient use. Teams that see their AI bill are more likely to optimise prompts, use appropriate model tiers, and question whether a use case justifies its cost.
- Tag every LLM call with a team identifier at call time — not retroactively
- Report monthly chargeback in the same cycle as infrastructure chargebacks
- Provide teams with access to their own engineering view dashboard — they should be able to investigate their own costs
- Do not surprise teams with large bills — set soft-limit alerts at 80% of team budget so they can act before month-end
Checklist: Do You Understand This?
- What are the three dashboard views an AI cost measurement system should provide, and who is the audience for each?
- Why must cost metadata be emitted at call time rather than retroactively attributed?
- What is cost per successful task, and why is it more useful than cost per request for business stakeholders?
- At what anomaly threshold should a daily spend alert fire, and what is the investigation workflow?
- Define unit economics for an AI-powered recruiting tool that screens CVs.
- Why does a chargeback model create cost efficiency incentives — what behaviour does it change?