Skip to main content

Adaptive optimization

Beyond improvements that arrive when you update your deployment, Anyray can learn from your org's own traffic and improve its strategy pipeline on the fly — reordering and retuning strategies, adapting the router, and tuning the cache to your workload. It is opt-in and off by default, and every self-applied change is quality-gated with automatic rollback.

:::warning Status: roadmap (not yet implemented) The closed-loop learner described here is planned, not shipping. Some substrate already exists — the content-free spend store, per-request traces, and the runtime-mutable optimizer.config.json — but the learning loop, shadow/replay eval, and the org-admin opt-in switch are not built yet. This page documents the intended design so it can be reviewed; nothing here is live. :::

Two kinds of improvement

① New capability arrives when you update your self-hosted deployment
new Anyray image ──(docker compose pull && up -d)──▶ your optimizer

② Your optimizer improves ITSELF from your traffic (this page — roadmap, opt-in)
your own requests + measured savings/quality ──▶ local learner

Because Anyray is self-hosted, there's no vendor pushing changes to you and no CLI: new strategies ship in a new image you pull on your own schedule. Adaptive optimization (the roadmap loop below) decides how to use them best for your org, on-prem.

The closed loop

Everything above runs inside your environment — nothing leaves.

A proposed change is never trusted blindly: it runs on a small canary slice, is measured against a quality gate, and is rolled back automatically if it regresses. Only changes that demonstrably help on your traffic get promoted.

What it learns

MechanismWhat adapts
Auto-tune order & paramsWhich strategies run, their order, and their parameters (e.g. confidence threshold, MAX_TOKENS_CAP) — driven by your measured savings and quality.
Shadow-eval & promoteCandidate pipelines are tried in shadow/replay against real traffic; winners are promoted via canary, and losers never touch a live answer.
Adapt the router/classifierThe model-routing classifier tunes to your org's own request distribution (your prompt mix, your models).
Tune the cacheSemantic-cache similarity thresholds and TTLs adjust to your observed hit-rate and staleness.

Each adaptation respects the strategy's declared quality risk and the global fail-safe — see Safety.

Org-admin control

Adaptive optimization is opt-in, default off. The org admin enables it and sets the guardrails:

adaptive:
enabled: true # admin opt-in (default false)
quality_gate: true # block / auto-rollback any change that regresses
canary_pct: 5 # try each change on 5% of traffic before promoting
  • enabled — master switch; off until the admin turns it on.
  • quality_gate — a self-applied change that fails the gate is discarded and rolled back automatically.
  • canary_pct — how much traffic a candidate change sees before it can be promoted.

See Org Admin → Configure for where this sits, and Operate for reviewing what the learner changed.

Safety: the fail-safe still holds

Learning cannot weaken the core guarantee — the quality gate here is the enforcement arm of Anyray's behavior-preserving principle. Self-applied changes are bounded by the same invariant as everything else: anything uncertain routes to the frontier model, strategies fail open, and a change that regresses quality is rolled back. The worst case of a bad adaptation is "we briefly saved less on a canary slice," never "answers got worse and stayed worse."

Data boundary: it learns on your data, on-prem

The learner reads only local signals — the content-free spend store and traces that already live in your environment — and writes only to your local optimizer config. No prompts, responses, traces, or learned parameters leave your environment — and because Anyray is fully self-hosted, nothing leaves your environment in the first place. This is what makes "Anyray gets better at your costs" compatible with "your data never leaves your environment."