Operate
Day-to-day operation is about three things: confirming Anyray is saving money,
confirming it's healthy and failing safe, and rolling out new logic safely. Your
main surface is the console on :3000.
:::info Status The implemented surfaces are the console (Dashboard, Spend, Users, Traces, Sessions) and the content-free spend store behind it, plus monthly per-user token caps (spend governance, below). Shadow Mode, holdback, and adaptive review are roadmap and called out below. :::
Watch spend and cost
Spend comes from the gateway's content-free spend store — every request records
who/team (from x-anyray-metadata), model, provider, tokens, cost, latency, and status,
never content. An admin-gated summary is at GET /admin/spend, and it powers the
Spend page in the console (by user and team).
You watch all of this in the console. Use it to answer:
- How much are we spending, and by which users and teams?
- Which models and providers drive the cost?
- What share of traffic is being passed through unchanged vs. optimized?
- Is a strategy worth its overhead and quality risk for this use-case?
Track it as cost per correct answer, not just raw dollars — a cheaper answer that's wrong isn't a saving.
Spend governance: monthly token caps
Beyond watching spend, you can govern it with monthly per-user token caps — part of Anyray's spend-governance invariant. A capped user who exceeds their monthly token budget gets an HTTP 429 instead of another paid request.
- Set & monitor caps on the console Users page, or via
GET/PUT /admin/user-caps(gated byANYRAY_ADMIN_TOKEN). The Users page shows each user's cap and current month-to-date usage. - Enforced on
/v1/chat/completions,/v1/completions,/v1/messages, and/v1/embeddings. User identity comes from thex-anyray-metadataheader (user/userId). - Fails open. Unattributed or uncapped users pass through — caps never block traffic for someone without a cap, consistent with the gateway's fail-open posture.
- Resets on the calendar-month boundary — counters roll over at the start of each month.
Caps are content-free like everything else: they count tokens, never inspect prompt or response content. See Billing.
Confirm it's failing safe
The health you care about is "requests keep flowing even if the optimizer is down." The optimizer sits off the forwarding path: if it's unreachable or slow, the gateway's hard timeout (800 ms) trips and the request passes through unchanged — inference keeps working at full price rather than breaking. The gateway keeps serving regardless. Watch for:
- a spike in passthroughs (optimizer may be unreachable or over-conservative)
- gateway/optimizer container health in
docker compose ps/ logs
Inspect traces and sessions
The console's Traces page lists requests (metadata-only by default), and Sessions groups them. For rare drill-down, a trace deep-links into the internal Langfuse trace-detail view. Use traces to understand a specific optimization decision or a cache hit.
Review what the learner changed (roadmap)
If you opt into adaptive optimization, Anyray would propose and self-apply pipeline changes from your own traffic, and operating it would mean reviewing what changed and whether it helped — which adaptations were promoted, which were auto-rolled-back at the quality gate, and the savings delta on the canary slice. (Roadmap — not yet implemented.)
Roll out new logic
New gateway/optimizer/console logic ships as updated images. Roll it out with compose:
docker compose pull # fetch the new images
docker compose up -d # recreate changed services
Roll out conservatively: update, send representative traffic, confirm spend and the passthrough rate look right, then widen. Optimizer strategy changes don't even need this — they're runtime-mutable from the console Settings page.
The same evidence-first path is meant to apply to adopting Anyray in the first place: start in observe-only Shadow Mode, keep an unoptimized holdback as ground truth, gate on a quality SLO, and keep the kill switch in reach. (Shadow Mode, holdback, and the quality SLO are roadmap.) See Proof, not promises.