Design principles

Every optimization Anyray applies has to clear three bars. They're what make it safe to put Anyray on the path of every request your org makes.

   ① EFFECTIVE              ② BEHAVIOR-PRESERVING        ③ COMPOUNDING
   ranked by measured       saves money WITHOUT          gets better on YOUR
   real-world impact        changing your answers        data, over time

1. Effective — ranked by measured impact

We don't add optimizations for novelty. The strategy menu is ordered by measured real-world savings — see What saves the most for the data and sources. You enable the levers that move the needle for your workload, and every one's contribution is attributed separately so you can see exactly what each earned.

We measure effectiveness as cost per correct answer, not raw spend — a cheaper answer that's wrong is not a saving. This single quality-normalized number is what ties "effective" to the behavior-preserving guarantee below: an optimization only counts if it lowers cost without raising the error rate.

2. Behavior-preserving — the hard invariant

Saving money on LLM calls is easy. Saving money without changing the answers your apps expect is the hard part — and it's the bar we hold ourselves to. An optimization is applied only if it leaves answer quality within a measured bound.

This is enforced by layered guardrails, not hope:

Guardrail	What it does	Status
Quality-risk grading	Every strategy declares a risk (`none` · `low` · `medium` · `high`); the pipeline weighs it.	✅ in the contract
Fail-safe to frontier	Any uncertain request (reasoning/ambiguous/low-confidence) routes to the frontier model, unchanged.	✅ core invariant
Fail-open pipeline	A strategy that errors or times out is skipped, never fatal.	✅ implemented
Shadow / replay eval	A candidate change is measured against real traffic before it can affect a live answer.	⏳ roadmap
Auto-rollback	A self-applied change that regresses quality is reverted automatically.	⏳ roadmap (adaptive)

The worst case of any optimization is bounded to "you briefly paid more than you could have," never "your answer got worse and stayed worse."

3. Compounding — it gets better on your data

Anyray isn't meant to be a static rule set. With adaptive optimization (opt-in, roadmap), the optimizer will learn from your org's own traffic — entirely on-prem — and tune its pipeline to fit your workload. The longer it runs, the better it fits your prompts, your models, your cost profile. Improvement compounds with use, and because Anyray is fully self-hosted, none of your data leaves to make it happen.

Why these three, together

Each principle protects the others:

Effective without behavior-preserving is reckless — cheap answers nobody can trust.
Behavior-preserving without effective is pointless — safe but no savings.
Compounding is what turns a good static optimizer into one that keeps pulling ahead on your specific traffic.

Together they're the standard every strategy on the menu is held to.

1. Effective — ranked by measured impact​

2. Behavior-preserving — the hard invariant​

3. Compounding — it gets better on your data​

Why these three, together​

1. Effective — ranked by measured impact

2. Behavior-preserving — the hard invariant

3. Compounding — it gets better on your data

Why these three, together