Adaptive optimization
Beyond improvements that arrive when you update your deployment, Anyray can learn from your org's own traffic and improve its strategy pipeline on the fly — reordering and retuning strategies, adapting the router, and tuning the cache to your workload. It is opt-in and off by default, and every self-applied change is quality-gated with automatic rollback.
:::warning Status: roadmap (not yet implemented)
The closed-loop learner described here is planned, not shipping. Some substrate
already exists — the content-free spend store, per-request traces, and the
runtime-mutable optimizer.config.json — but the learning loop, shadow/replay eval, and
the org-admin opt-in switch are not built yet. This page documents the intended design so
it can be reviewed; nothing here is live.
:::
Two kinds of improvement
① New capability arrives when you update your self-hosted deployment
new Anyray image ──(docker compose pull && up -d)──▶ your optimizer
② Your optimizer improves ITSELF from your traffic (this page — roadmap, opt-in)
your own requests + measured savings/quality ──▶ local learner
Because Anyray is self-hosted, there's no vendor pushing changes to you and no CLI: new strategies ship in a new image you pull on your own schedule. Adaptive optimization (the roadmap loop below) decides how to use them best for your org, on-prem.
The closed loop
Everything above runs inside your environment — nothing leaves.
A proposed change is never trusted blindly: it runs on a small canary slice, is measured against a quality gate, and is rolled back automatically if it regresses. Only changes that demonstrably help on your traffic get promoted.
What it learns
| Mechanism | What adapts |
|---|---|
| Auto-tune order & params | Which strategies run, their order, and their parameters (e.g. confidence threshold, MAX_TOKENS_CAP) — driven by your measured savings and quality. |
| Shadow-eval & promote | Candidate pipelines are tried in shadow/replay against real traffic; winners are promoted via canary, and losers never touch a live answer. |
| Adapt the router/classifier | The model-routing classifier tunes to your org's own request distribution (your prompt mix, your models). |
| Tune the cache | Semantic-cache similarity thresholds and TTLs adjust to your observed hit-rate and staleness. |
Each adaptation respects the strategy's declared quality risk and the global fail-safe — see Safety.
Org-admin control
Adaptive optimization is opt-in, default off. The org admin enables it and sets the guardrails:
adaptive:
enabled: true # admin opt-in (default false)
quality_gate: true # block / auto-rollback any change that regresses
canary_pct: 5 # try each change on 5% of traffic before promoting
enabled— master switch; off until the admin turns it on.quality_gate— a self-applied change that fails the gate is discarded and rolled back automatically.canary_pct— how much traffic a candidate change sees before it can be promoted.
See Org Admin → Configure for where this sits, and Operate for reviewing what the learner changed.
Safety: the fail-safe still holds
Learning cannot weaken the core guarantee — the quality gate here is the enforcement arm of Anyray's behavior-preserving principle. Self-applied changes are bounded by the same invariant as everything else: anything uncertain routes to the frontier model, strategies fail open, and a change that regresses quality is rolled back. The worst case of a bad adaptation is "we briefly saved less on a canary slice," never "answers got worse and stayed worse."
Data boundary: it learns on your data, on-prem
The learner reads only local signals — the content-free spend store and traces that already live in your environment — and writes only to your local optimizer config. No prompts, responses, traces, or learned parameters leave your environment — and because Anyray is fully self-hosted, nothing leaves your environment in the first place. This is what makes "Anyray gets better at your costs" compatible with "your data never leaves your environment."