Skip to main content

Spend & cost metering

Anyray is fully self-hosted, so there is no vendor billing and nothing egresses — no usage aggregate is sent anywhere, no license to phone home, no SaaS. What "billing" means here is the opposite of a vendor invoice: it's the content-free spend ledger Anyray keeps for you, so your org can finally see and attribute its own AI-inference spend.

What's metered

The gateway records one content-free spend row per request into its local spend store (gateway/src/middlewares/log/spend.ts) — never any prompt or response content:

  • who / which team — from the x-anyray-metadata header (user, team, session)
  • model and provider
  • token counts (prompt / completion)
  • cost (costUsd, priced from token usage via the Pricing table) and latency / status
  • savings attributionbaselineCostUsd, estimatedTokensSaved, and savingsUsd, the per-request estimate of what the optimizer saved versus the unoptimized baseline

This is the same store the console's Spend view reads. It stays entirely in your environment — there is no signed aggregate, and nothing to export.

Where you see it

  • Console → Spend — spend-by-user / by-team / by-model, over the content-free ledger.
  • Admin APIGET /admin/spend (gated by ANYRAY_ADMIN_TOKEN) returns the same summary as JSON.

What egresses

Nothing. Anyray is self-hosted by construction — the gateway, the optimizer, the console, the trace backend, and every datastore run inside your own environment from one docker compose up. Prompt and response content is encrypted at rest and never shown to humans; the spend ledger above is content-free and never leaves the box either. See The data boundary.

Optimization savings

Per-strategy savings attribution is live in the console. Each spend row carries baselineCostUsd, estimatedTokensSaved, and savingsUsd alongside costUsd, and the Dashboard Savings panel rolls these up into tokens-saved per day and top contributors by optimizer strategy — so you can see which lever (param_tuning / prompt_compression / semantic_cache / …) is earning its keep this period.

These are the optimizer's per-decision estimates; they show you the realized savings trend today. Treat them as estimates rather than a billing-grade reconciliation — tighter calibration of the savings model (and tying it to cost per correct answer) is still being refined. See What saves the most.

Spend governance: token caps

Spend visibility pairs with spend governance: Anyray enforces monthly per-user token caps on the inference endpoints, returning HTTP 429 when a capped user exceeds their budget (and failing open for uncapped/unattributed users). Caps are set and monitored on the console Users page or via /admin/user-caps, and are content-free — they count tokens, never content. See Operate → spend governance.

Compliance note

Because Anyray runs entirely in your environment and no customer data ever reaches a vendor, it stays largely out of GDPR / SOC2 data-processor scope — there is no processor. This is not legal advice — confirm with your own counsel. See Security.