Adaptive optimization

Beyond improvements that arrive when you update your deployment, Anyray can learn from your org's own traffic and improve its strategy pipeline on the fly — reordering and retuning strategies, adapting the router, and tuning the cache to your workload. It is opt-in and off by default, and every self-applied change is quality-gated with automatic rollback.

:::warning Status: roadmap (not yet implemented) The closed-loop learner described here is planned, not shipping. Some substrate already exists — the content-free spend store, per-request traces, and the runtime-mutable optimizer.config.json — but the learning loop, shadow/replay eval, and the org-admin opt-in switch are not built yet. This page documents the intended design so it can be reviewed; nothing here is live. :::

Two kinds of improvement

  ① New capability arrives when you update your self-hosted deployment
       new Anyray image ──(docker compose pull && up -d)──▶  your optimizer

  ② Your optimizer improves ITSELF from your traffic (this page — roadmap, opt-in)
       your own requests + measured savings/quality  ──▶  local learner

Because Anyray is self-hosted, there's no vendor pushing changes to you and no CLI: new strategies ship in a new image you pull on your own schedule. Adaptive optimization (the roadmap loop below) decides how to use them best for your org, on-prem.

The closed loop

Everything above runs inside your environment — nothing leaves.

A proposed change is never trusted blindly: it runs on a small canary slice, is measured against a quality gate, and is rolled back automatically if it regresses. Only changes that demonstrably help on your traffic get promoted.

What it learns

Mechanism	What adapts
Auto-tune order & params	Which strategies run, their order, and their parameters (e.g. confidence threshold, `MAX_TOKENS_CAP`) — driven by your measured savings and quality.
Shadow-eval & promote	Candidate pipelines are tried in shadow/replay against real traffic; winners are promoted via canary, and losers never touch a live answer.
Adapt the router/classifier	The model-routing classifier tunes to your org's own request distribution (your prompt mix, your models).
Tune the cache	Semantic-cache similarity thresholds and TTLs adjust to your observed hit-rate and staleness.

Each adaptation respects the strategy's declared quality risk and the global fail-safe — see Safety.

Org-admin control

Adaptive optimization is opt-in, default off. The org admin enables it and sets the guardrails:

adaptive:
  enabled: true        # admin opt-in (default false)
  quality_gate: true   # block / auto-rollback any change that regresses
  canary_pct: 5        # try each change on 5% of traffic before promoting

enabled — master switch; off until the admin turns it on.
quality_gate — a self-applied change that fails the gate is discarded and rolled back automatically.
canary_pct — how much traffic a candidate change sees before it can be promoted.

See Org Admin → Configure for where this sits, and Operate for reviewing what the learner changed.

Safety: the fail-safe still holds

Learning cannot weaken the core guarantee — the quality gate here is the enforcement arm of Anyray's behavior-preserving principle. Self-applied changes are bounded by the same invariant as everything else: anything uncertain routes to the frontier model, strategies fail open, and a change that regresses quality is rolled back. The worst case of a bad adaptation is "we briefly saved less on a canary slice," never "answers got worse and stayed worse."

Data boundary: it learns on your data, on-prem

The learner reads only local signals — the content-free spend store and traces that already live in your environment — and writes only to your local optimizer config. No prompts, responses, traces, or learned parameters leave your environment — and because Anyray is fully self-hosted, nothing leaves your environment in the first place. This is what makes "Anyray gets better at your costs" compatible with "your data never leaves your environment."

Two kinds of improvement​

The closed loop​

What it learns​

Org-admin control​

Safety: the fail-safe still holds​

Data boundary: it learns on your data, on-prem​