Skip to main content

Monitoring & the Anyray console

The Anyray console is the admin's window into the system. It's a first-party React/Vite app (observability/ui/) served on :3000 behind a single admin-key gate (ANYRAY_ADMIN_TOKEN) — one key, no per-page login. It's where you see, on your own traffic, what you're spending, who's spending it, and how requests flow — driven by evidence you can look at, not a promise.

:::info Status The console and the content-free spend/trace stores behind it are implemented. The Shadow-Mode would-be-savings views (projected savings, would-be decision mix, holdback comparison) are roadmap and marked below. :::

What it shows

The console pages, all behind the single admin key:

  • Dashboard (home) — at-a-glance KPI stat cards for Traces, Cost, Tokens, and Savings, each with a 30-day sparkline, plus two roll-up panels (see below). Every number is computed over the content-free stores — metadata only.
  • Traces — a metadata-only list of requests. For rare drill-down, a trace deep-links into the internal Langfuse trace-detail view. Content is shown only when the effective content mode is plaintext.
  • Sessions — requests grouped into sessions for context.
  • Users — per-user spend / attribution plus each user's monthly token-cap usage (see Operate → spend governance).
  • Optimizer — the optimization strategy settings (toggle, reorder, parametrize) and per-user / per-team / per-endpoint / per-model targeting rules, runtime-mutable and audit-logged.
  • Providers — manage server-held provider API keys at runtime; audit-logged by provider slug, never key values.
  • Routing — the gateway's default routing strategy (single / loadbalance / fallback / conditional) with per-target retry/fallback.
  • Connect — on-ramp instructions for the anyray-connect CLI, which points local coding tools at the gateway.
  • Pricing — the admin-editable model pricing table (USD per token) that drives cost attribution in the spend store.
  • Privacy — the org-wide content mode (encrypted / off / plaintext), togglable at runtime and audit-logged.

Dashboard

The home page is the at-a-glance roll-up over the content-free stores:

  • KPI cards — Traces, Cost, Tokens, and Savings, each with a 30-day sparkline so you see the trend, not just the current number.
  • Savings panel — tokens-saved-per-day bars and top contributors by optimizer strategy, so you can see which lever is earning its keep (see Billing → optimization savings).
  • Performance panel — P50 / P95 latency trend and the slowest endpoints.

All aggregations are metadata only — never prompt or response content.

Spend & users

  • Spend over time, broken down by team and user, and by model and provider, sourced from the content-free spend store (GET /admin/spend): who/team, model, provider, tokens, cost, latency, status — never content. The Users page adds each user's monthly token-cap usage.
  • Track it as cost per correct answer, not just raw dollars.

Would-be decisions (Shadow Mode — roadmap)

  • The would-be decision mix (what share would downgrade / hit cache / pass through) and per-strategy projected contribution, so you can see which lever earns what before enabling it.

:::warning Status: roadmap Shadow Mode and its would-be-savings / would-be-diff views — projected savings vs. realized savings, the would-be decision mix, and holdback comparison — are planned, not yet built. Today the console shows realized spend and traces on live traffic. :::

Privacy by design

The console is built so monitoring doesn't become surveillance — consistent with how Anyray treats your data:

  • Runs entirely on-prem. The console and all its data live in your environment; Anyray is fully self-hosted and nothing leaves it.
  • Metadata-first. Spend and traces need no raw content — that's the default view. Traces are metadata-only by default.
  • Content encrypted at rest. When content is stored (mode encrypted, the default), it's AES-256-GCM encrypted via ANYRAY_CONTENT_KEY; humans see ciphertext. Decrypting is an offline, authorized-audit action — never exposed in a UI.
  • Aggregated by team, not per-employee. It's a cost tool, not a way to watch individuals.

See Security and The data boundary.

How it's built (data model)

The console is a read-only view over stores the gateway already populates:

  • the content-free spend store — who/team, model, provider, tokens, cost, latency, status per request (powers Spend); exposed at GET /admin/spend.
  • the trace backend — a self-hosted Langfuse (trace store only); the gateway exports per-request traces to it internally (ANYRAY_OBSERVABILITY_BASEURL / ANYRAY_OBSERVABILITY_PUBLIC_KEY / ANYRAY_OBSERVABILITY_SECRET_KEY). Traces are metadata-only by default; the Anyray console is the real UI, with deep-links into Langfuse trace detail for rare drill-down.

Everything is served on-prem behind the single admin key. Read-only; metadata-only by default.

Bring your own observability (lighter note / roadmap). Because the gateway exports traces internally, plugging Anyray into a different self-hosted observability stack you already run is a natural extension — traces and cost flowing into your existing tooling rather than forcing a new one. Everything stays on-prem; nothing is sent to a third-party SaaS. (First-class OpenTelemetry export to arbitrary backends is roadmap.)

How you use it

  1. Open http://<host>:3000 and sign in with your ANYRAY_ADMIN_TOKEN.
  2. Scan the Dashboard — KPI sparklines, the Savings panel, and the Performance panel — for the at-a-glance picture.
  3. Watch Spend and Users by team and user — where is the money going, and is anyone nearing a token cap?
  4. Use Traces and Sessions to understand specific requests and optimization decisions.
  5. Tune the pipeline and targeting rules on the Optimizer page, server-held keys on Providers, routing on Routing, model prices on Pricing, and the content mode on Privacy — all runtime-mutable and audit-logged.
  6. Keep watching cost-per-correct-answer in production.

(The roadmap flow — run in Shadow Mode, review would-be savings, then compare the optimized cohort against an always-on holdback — is described in Proof, not promises.)

See also