Monthly burn rate
$41,280
annualised: $495,360
↑ 18% MoM
Marginal inference cost
$0.0043
per API call blended avg
↓ 12% vs last month
Committed-use utilisation
84%
of $30K committed spend
↓ 4% underrun risk
Variance vs budget
+$6,280
12.4% over Q2 AI budget
↑ Algo-Research primary driver
Total API spend · 30d
$41,280
9,600 calls · 14 providers
Tokens consumed
2.4B
1.6B input · 0.8B output
Routing savings available
$11,340
27.5% of current spend
Avg cost per call
$0.0043
↑ 18% vs prior 30d
Spend by team · attribution
30-day window
Spend by model
cost share
Team-level cost attribution · chargeback view
per-call unit economics
| Team | Cost centre | Calls | Tokens in | Tokens out | 30d spend | % of total | $/call | WoW trend | Status |
|---|
Weekly spend trend · anomaly overlay
spike detected
Routing decision engine · weighted multi-factor scoring
live decisions
Routing decisions are computed using a weighted multi-factor scoring model across four operational dimensions.
Each inbound request is scored per candidate model; the highest-scoring eligible model is selected.
Rejected alternatives and confidence scores are logged for auditability.
35%
Reasoning complexity
25%
Context size
20%
Latency sensitivity
20%
Cost sensitivity
Routing opportunity matrix · by use-case
$11,340 recoverable
| Use case | Current model | Calls/month | Avg tokens in | Avg tokens out | Current spend | Recommended | Saving | Confidence |
|---|
Wasted tokens · 30d
340M
14% of total consumption
Waste cost · 30d
$8,740
annualised: $104,880
Recoverable via caching
$4,200
prompt cache at 90% discount
Avg context retention
28K
target: 10–12K tokens
Waste pattern analysis · detected inefficiencies
5 patterns active
Token efficiency by team
output/input ratio
Context window utilisation · distribution
38% of calls over-provisioned
Vendor comparison matrix · full economics
real pricing · May 2026
| Model | Provider | Input ($/1M tok) | Output ($/1M tok) | Context window | Cached input | Avg latency | P99 latency | Proj. monthly burn | Best for |
|---|
Cost per 1M tokens · input vs output
pricing asymmetry view
Latency vs cost efficiency
throughput trade-off
Governance alerts · live feed
3 critical
Model proliferation · governance risk
8 models active
8 distinct models in active production use across 6 teams. High model proliferation increases
governance overhead, complicates SLA tracking, and inflates vendor management surface.
Target: consolidate to 3 primary models with approved exceptions.
| Model | Teams | Monthly calls | Spend | Status |
|---|
API key audit · access governance
2 stale keys
| Key ID | Owner | Team | Last used | 30d calls | 30d spend | Status | Action |
|---|
AI governance briefing · executive report
powered by claude sonnet
Generates a structured governance briefing from live attribution data — formatted for CTO / Head of AI review.
Findings are numbered, quantified, and include specific recommendations with dollar impact.
Incident simulation · failover routing & business impact
simulation mode
Total recoverable · annual
$186,480
across all recommendations
Routing optimisation
$136,080
model routing changes
Prompt efficiency
$50,400
caching + context trim
Implementation effort
6–8 wks
phased rollout
Optimisation recommendations · prioritised by annual impact
8 recommendations
Projected spend trajectory · with vs without optimisation
12-month forecast