Spend & cost metering
Anyray is fully self-hosted, so there is no vendor billing and nothing egresses — no usage aggregate is sent anywhere, no license to phone home, no SaaS. What "billing" means here is the opposite of a vendor invoice: it's the content-free spend ledger Anyray keeps for you, so your org can finally see and attribute its own AI-inference spend.
What's metered
The gateway records one content-free spend row per request into its local spend store
(gateway/src/middlewares/log/spend.ts) — never any prompt or response content:
- who / which team — from the
x-anyray-metadataheader (user,team,session) - model and provider
- token counts (prompt / completion)
- cost (
costUsd, priced from token usage via the Pricing table) and latency / status - savings attribution —
baselineCostUsd,estimatedTokensSaved, andsavingsUsd, the per-request estimate of what the optimizer saved versus the unoptimized baseline
This is the same store the console's Spend view reads. It stays entirely in your environment — there is no signed aggregate, and nothing to export.
Where you see it
- Console → Spend — spend-by-user / by-team / by-model, over the content-free ledger.
- Admin API —
GET /admin/spend(gated byANYRAY_ADMIN_TOKEN) returns the same summary as JSON.
What egresses
Nothing. Anyray is self-hosted by construction — the gateway, the optimizer, the
console, the trace backend, and every datastore run inside your own environment from one
docker compose up. Prompt and response content is encrypted at
rest and never shown to humans; the spend ledger above is
content-free and never leaves the box either. See The data
boundary.
Optimization savings
Per-strategy savings attribution is live in the console. Each spend row carries
baselineCostUsd, estimatedTokensSaved, and savingsUsd alongside costUsd, and the
Dashboard Savings panel rolls these up into tokens-saved
per day and top contributors by optimizer strategy — so you can see which lever
(param_tuning / prompt_compression / semantic_cache / …) is earning its keep this
period.
These are the optimizer's per-decision estimates; they show you the realized savings
trend today. Treat them as estimates rather than a billing-grade reconciliation — tighter
calibration of the savings model (and tying it to cost per correct answer) is still
being refined. See What saves the most.
Spend governance: token caps
Spend visibility pairs with spend governance: Anyray enforces monthly per-user token
caps on the inference endpoints, returning HTTP 429 when a capped user exceeds their
budget (and failing open for uncapped/unattributed users). Caps are set and monitored on
the console Users page or via /admin/user-caps, and are content-free — they count
tokens, never content. See Operate → spend
governance.
Compliance note
Because Anyray runs entirely in your environment and no customer data ever reaches a vendor, it stays largely out of GDPR / SOC2 data-processor scope — there is no processor. This is not legal advice — confirm with your own counsel. See Security.