Design principles
Every optimization Anyray applies has to clear three bars. They're what make it safe to put Anyray on the path of every request your org makes.
① EFFECTIVE ② BEHAVIOR-PRESERVING ③ COMPOUNDING
ranked by measured saves money WITHOUT gets better on YOUR
real-world impact changing your answers data, over time
1. Effective — ranked by measured impact
We don't add optimizations for novelty. The strategy menu is ordered by measured real-world savings — see What saves the most for the data and sources. You enable the levers that move the needle for your workload, and every one's contribution is attributed separately so you can see exactly what each earned.
We measure effectiveness as cost per correct answer, not raw spend — a cheaper answer that's wrong is not a saving. This single quality-normalized number is what ties "effective" to the behavior-preserving guarantee below: an optimization only counts if it lowers cost without raising the error rate.
2. Behavior-preserving — the hard invariant
Saving money on LLM calls is easy. Saving money without changing the answers your apps expect is the hard part — and it's the bar we hold ourselves to. An optimization is applied only if it leaves answer quality within a measured bound.
This is enforced by layered guardrails, not hope:
| Guardrail | What it does | Status |
|---|---|---|
| Quality-risk grading | Every strategy declares a risk (none · low · medium · high); the pipeline weighs it. | ✅ in the contract |
| Fail-safe to frontier | Any uncertain request (reasoning/ambiguous/low-confidence) routes to the frontier model, unchanged. | ✅ core invariant |
| Fail-open pipeline | A strategy that errors or times out is skipped, never fatal. | ✅ implemented |
| Shadow / replay eval | A candidate change is measured against real traffic before it can affect a live answer. | ⏳ roadmap |
| Auto-rollback | A self-applied change that regresses quality is reverted automatically. | ⏳ roadmap (adaptive) |
The worst case of any optimization is bounded to "you briefly paid more than you could have," never "your answer got worse and stayed worse."
3. Compounding — it gets better on your data
Anyray isn't meant to be a static rule set. With adaptive optimization (opt-in, roadmap), the optimizer will learn from your org's own traffic — entirely on-prem — and tune its pipeline to fit your workload. The longer it runs, the better it fits your prompts, your models, your cost profile. Improvement compounds with use, and because Anyray is fully self-hosted, none of your data leaves to make it happen.
Why these three, together
Each principle protects the others:
- Effective without behavior-preserving is reckless — cheap answers nobody can trust.
- Behavior-preserving without effective is pointless — safe but no savings.
- Compounding is what turns a good static optimizer into one that keeps pulling ahead on your specific traffic.
Together they're the standard every strategy on the menu is held to.