🧠 All Things AI
Advanced

AI Cost Measurement Dashboards

A cost dashboard is the foundation of AI FinOps. Without visibility into where spend is going, you cannot optimise it. The right dashboard structure differs by audience: finance and leadership need budget vs actual; engineering needs per-request cost and efficiency trends; product teams need cost per user and cost per feature. Build all three views from the same underlying data pipeline.

What to Track

MetricWhy it mattersDimension to slice by
Total spendBudget vs actual; trend over timeModel, use case, team, day/week/month
Cost per requestEfficiency of individual calls; P50/P95 distributionUse case, model version, prompt variant
Input vs output token splitIdentifies whether input or output is driving cost growthUse case, model
Cache hit rate (and saved cost)Direct measurement of caching ROICache type, use case
Cost per successful taskUnit economics; frames AI spend in business termsUse case (e.g., cost per resolved support ticket)
Token efficiency ratioOutput quality per token; declining ratio = worsening efficiencyUse case, prompt version

Dashboard Structure by Audience

Executive view

  • Total AI spend this month vs budget
  • Month-over-month spend trend
  • Spend by product/team (pie or bar chart)
  • Projected month-end spend vs budget
  • Top 3 cost drivers
  • Cost per unit of business output (e.g., per ticket resolved)

Engineering view

  • Cost per request (P50/P95) by use case
  • Input token P95 vs baseline (growing?)
  • Output token P95 vs max_tokens limit (hitting cap?)
  • Cache hit rate by use case
  • Model routing distribution (which tier is handling what)
  • Cost anomalies in last 24 hours

Product view

  • Cost per active user per day
  • Cost per feature (which features are expensive?)
  • High-cost user cohort (P99 users)
  • Free tier AI cost exposure (what does a "free" user cost?)
  • AI margin: gross margin impact of AI COGS

Data Pipeline

# Every LLM call must emit cost metadata to the pipeline

def log_llm_call(response, context: CallContext):

emit_event({

"timestamp": now(),

"use_case": context.use_case, # "support_chat", "doc_analysis"

"team": context.team, # "product", "data-science"

"user_id": context.user_id,

"model": response.model,

"input_tokens": response.usage.input_tokens,

"output_tokens": response.usage.output_tokens,

"cache_read_tokens": response.usage.cache_read_input_tokens,

"cost_usd": calculate_cost(response.usage, response.model),

"task_success": context.task_success, # set after result evaluated

})

Tooling options

  • Langfuse: built-in cost dashboard; open source; per-trace cost tracking
  • Custom to Datadog: emit custom metrics from LLM middleware; use Datadog dashboards for org-wide visibility
  • Data warehouse: stream events to BigQuery/Snowflake; build dashboards in Looker/Metabase for cross-team reporting
  • LiteLLM dashboard: built-in spend tracking by virtual key, team, and model

Pipeline requirements

  • Every LLM call must be tagged before it fires — retrofitting tags is painful
  • Calculate cost at emission time (store raw tokens + derive cost on query)
  • Include task_success signal — needed for cost per successful task metric
  • Retention: keep 13 months minimum for year-over-year comparison
  • Latency of pipeline: near-realtime (under 5 minutes) for anomaly alerting to be useful

Unit Economics

Unit economics frames AI cost in terms of business outcomes — the language of product and finance, not engineering. Without unit economics, you cannot make a business case for AI investment or assess whether AI is cost-effective compared to alternatives.

Use caseUnit metricBusiness frame
Customer support chatbotCost per resolved ticketCompare to cost per human-resolved ticket; break-even analysis
Document processingCost per document processedCompare to manual processing cost; volume threshold for ROI
Code assistantCost per developer per dayAssess against productivity lift; licence vs build decision
B2B SaaS AI featureAI COGS per customer per monthEnsure AI COGS is below the margin you need; inform feature pricing

Anomaly Alerts

  • Daily spend > 2× rolling 7-day average → alert immediately; investigate before spend continues
  • Single use case consuming > 50% of total daily budget → ring-fence with use-case budget cap
  • Per-request cost P95 growing > 30% week-over-week → prompt or context length drift
  • Cache hit rate dropping > 20% from baseline → system prompt or prefix changed; verify intentional
  • Zero spend on a use case that is normally active → possible silent failure or routing error

Chargeback Model

Allocating AI costs to the teams that generate them creates accountability and motivates efficient use. Teams that see their AI bill are more likely to optimise prompts, use appropriate model tiers, and question whether a use case justifies its cost.

  • Tag every LLM call with a team identifier at call time — not retroactively
  • Report monthly chargeback in the same cycle as infrastructure chargebacks
  • Provide teams with access to their own engineering view dashboard — they should be able to investigate their own costs
  • Do not surprise teams with large bills — set soft-limit alerts at 80% of team budget so they can act before month-end

Checklist: Do You Understand This?

  • What are the three dashboard views an AI cost measurement system should provide, and who is the audience for each?
  • Why must cost metadata be emitted at call time rather than retroactively attributed?
  • What is cost per successful task, and why is it more useful than cost per request for business stakeholders?
  • At what anomaly threshold should a daily spend alert fire, and what is the investigation workflow?
  • Define unit economics for an AI-powered recruiting tool that screens CVs.
  • Why does a chargeback model create cost efficiency incentives — what behaviour does it change?