LLM Observatory — AI Cost Attribution & Routing Intelligence

Monthly burn rate

$41,280

annualised: $495,360

↑ 18% MoM

Marginal inference cost

$0.0043

per API call blended avg

↓ 12% vs last month

Committed-use utilisation

84%

of $30K committed spend

↓ 4% underrun risk

Variance vs budget

+$6,280

12.4% over Q2 AI budget

↑ Algo-Research primary driver

Total API spend · 30d

$41,280

9,600 calls · 14 providers

Tokens consumed

2.4B

1.6B input · 0.8B output

Routing savings available

$11,340

27.5% of current spend

Avg cost per call

$0.0043

↑ 18% vs prior 30d

Spend by team · attribution

30-day window

Spend by model

cost share

Team-level cost attribution · chargeback view

per-call unit economics

Team	Cost centre	Calls	Tokens in	Tokens out	30d spend	% of total	$/call	WoW trend	Status

Weekly spend trend · anomaly overlay

spike detected

Routing decision engine · weighted multi-factor scoring

live decisions

Routing decisions are computed using a weighted multi-factor scoring model across four operational dimensions. Each inbound request is scored per candidate model; the highest-scoring eligible model is selected. Rejected alternatives and confidence scores are logged for auditability.

35%

Reasoning complexity

25%

Context size

20%

Latency sensitivity

20%

Cost sensitivity

Routing opportunity matrix · by use-case

$11,340 recoverable

Use case	Current model	Calls/month	Avg tokens in	Avg tokens out	Current spend	Recommended	Saving	Confidence

Wasted tokens · 30d

340M

14% of total consumption

Waste cost · 30d

$8,740

annualised: $104,880

Recoverable via caching

$4,200

prompt cache at 90% discount

Avg context retention

28K

target: 10–12K tokens

Waste pattern analysis · detected inefficiencies

5 patterns active

Token efficiency by team

output/input ratio

Context window utilisation · distribution

38% of calls over-provisioned

Vendor comparison matrix · full economics

real pricing · May 2026

Model	Provider	Input ($/1M tok)	Output ($/1M tok)	Context window	Cached input	Avg latency	P99 latency	Proj. monthly burn	Best for

Cost per 1M tokens · input vs output

pricing asymmetry view

Latency vs cost efficiency

throughput trade-off

Governance alerts · live feed

3 critical

Model proliferation · governance risk

8 models active

8 distinct models in active production use across 6 teams. High model proliferation increases governance overhead, complicates SLA tracking, and inflates vendor management surface. Target: consolidate to 3 primary models with approved exceptions.

Model	Teams	Monthly calls	Spend	Status

API key audit · access governance

2 stale keys

Key ID	Owner	Team	Last used	30d calls	30d spend	Status	Action

AI governance briefing · executive report

powered by claude sonnet

Generates a structured governance briefing from live attribution data — formatted for CTO / Head of AI review. Findings are numbered, quantified, and include specific recommendations with dollar impact.

Incident simulation · failover routing & business impact

simulation mode

Total recoverable · annual

$186,480

across all recommendations

Routing optimisation

$136,080

model routing changes

Prompt efficiency

$50,400

caching + context trim

Implementation effort

6–8 wks

phased rollout

Optimisation recommendations · prioritised by annual impact

8 recommendations

Projected spend trajectory · with vs without optimisation

12-month forecast