Live · 30d window
$0.00
30d spend
0
API calls
$0.00
routing opportunity
Finance mode
Monthly burn rate
$41,280
annualised: $495,360
↑ 18% MoM
Marginal inference cost
$0.0043
per API call blended avg
↓ 12% vs last month
Committed-use utilisation
84%
of $30K committed spend
↓ 4% underrun risk
Variance vs budget
+$6,280
12.4% over Q2 AI budget
↑ Algo-Research primary driver
Total API spend · 30d
$41,280
9,600 calls · 14 providers
Tokens consumed
2.4B
1.6B input · 0.8B output
Routing savings available
$11,340
27.5% of current spend
Avg cost per call
$0.0043
↑ 18% vs prior 30d
Spend by team · attribution
30-day window
Spend by model
cost share
Team-level cost attribution · chargeback view
per-call unit economics
TeamCost centreCalls Tokens inTokens out 30d spend% of total $/callWoW trendStatus
Weekly spend trend · anomaly overlay
spike detected
Routing decision engine · weighted multi-factor scoring
live decisions
Routing decisions are computed using a weighted multi-factor scoring model across four operational dimensions. Each inbound request is scored per candidate model; the highest-scoring eligible model is selected. Rejected alternatives and confidence scores are logged for auditability.
35%
Reasoning complexity
25%
Context size
20%
Latency sensitivity
20%
Cost sensitivity
Routing opportunity matrix · by use-case
$11,340 recoverable
Use caseCurrent modelCalls/month Avg tokens inAvg tokens out Current spendRecommendedSavingConfidence
Wasted tokens · 30d
340M
14% of total consumption
Waste cost · 30d
$8,740
annualised: $104,880
Recoverable via caching
$4,200
prompt cache at 90% discount
Avg context retention
28K
target: 10–12K tokens
Waste pattern analysis · detected inefficiencies
5 patterns active
Token efficiency by team
output/input ratio
Context window utilisation · distribution
38% of calls over-provisioned
Vendor comparison matrix · full economics
real pricing · May 2026
ModelProvider Input ($/1M tok)Output ($/1M tok) Context windowCached input Avg latencyP99 latency Proj. monthly burnBest for
Cost per 1M tokens · input vs output
pricing asymmetry view
Latency vs cost efficiency
throughput trade-off
Governance alerts · live feed
3 critical
Model proliferation · governance risk
8 models active
8 distinct models in active production use across 6 teams. High model proliferation increases governance overhead, complicates SLA tracking, and inflates vendor management surface. Target: consolidate to 3 primary models with approved exceptions.
ModelTeamsMonthly callsSpendStatus
API key audit · access governance
2 stale keys
Key IDOwnerTeamLast used30d calls30d spendStatusAction
AI governance briefing · executive report
powered by claude sonnet
Generates a structured governance briefing from live attribution data — formatted for CTO / Head of AI review. Findings are numbered, quantified, and include specific recommendations with dollar impact.
Incident simulation · failover routing & business impact
simulation mode
Total recoverable · annual
$186,480
across all recommendations
Routing optimisation
$136,080
model routing changes
Prompt efficiency
$50,400
caching + context trim
Implementation effort
6–8 wks
phased rollout
Optimisation recommendations · prioritised by annual impact
8 recommendations
Projected spend trajectory · with vs without optimisation
12-month forecast