The optimizer
The optimizer is Anyray's optimization core — the real optimizer/ service (optimizer:8088,
internal only). It's a gateway-neutral HTTP hook backend that, given an LLM request,
decides how to serve it as cheaply as possible by running a configurable pipeline of
optimization strategies. It is off the request's forwarding path:
the gateway calls it from its hook points through a thin adapter
(gateway/src/services/optimizer.ts) and always forwards traffic itself.
:::info Status
The optimizer (optimizer/, TypeScript) is implemented and running over
Optimizer Protocol v1. The registry ships param_tuning,
prompt_compression, context_compression, window_budget, tool_pruning (all
default-on) and semantic_cache (default-off) — see
the strategy menu.
Model routing is the
gateway's job, not the optimizer's; the automated classifier-driven version is roadmap.
:::
It's a pipeline, not a single router
A common misconception is that Anyray is "a model router." Model routing isn't even an optimizer strategy — it's a capability of the gateway. The optimizer orchestrates an ordered pipeline of non-routing strategies that each org enables and arranges to fit its use-cases — parameter tuning, prompt compression, context compression, context-window budgeting, tool pruning, and semantic caching. See Optimization strategies.
What the optimizer decides
For each request, the optimizer's POST /v1/optimize hook runs the configured pipeline
and returns a (possibly) transformed request plus a list of content-free decisions:
| Outcome | Meaning |
|---|---|
| request unchanged | No strategy changed the request — the gateway forwards it as-is. |
| transformed request | One or more strategies rewrote the request (messages, tools, params). The gateway forwards the returned request. |
| cacheHit | When the caller canShortCircuit, semantic_cache may return cacheHit:true + cachedResponse; the gateway serves it and skips the provider. |
A second hook, POST /v1/optimize-response, can transform the response after the
call (e.g. for output-stage strategies). There is no meter() call — spend metering
is the gateway's content-free spend store, not an optimizer endpoint.
It's a black box behind a fixed contract
The optimizer is deliberately opaque. Its internal make-up — which strategies exist, how they're ordered, the cache — can change daily, and it may be one service today and several tomorrow. The gateway and its adapters never see inside and never change, because the only fixed thing is the Optimizer Protocol v1 contract. New strategies ship inside the optimizer without touching anything downstream.
╔══════════════════════════════════════════════════════════╗
║ ANYRAY OPTIMIZER ▓▓▓ BLACK BOX ▓▓▓ ║
║ a configurable pipeline of strategies: ║
║ param tuning · prompt compression · context compression · ║
║ window budget · tool pruning · semantic cache ║
║ reordered / swapped without touching any adapter ║
╚══════════════════════════════════════════════════════════╝
▲ POST /v1/optimize ▲ POST /v1/optimize-response
└────── the ONLY fixed contract (Protocol v1) ──────┘
Fail-open is a hard invariant
The single most important rule: the optimizer can never break a request. The gateway calls it with a hard 800ms timeout and treats any non-2xx, timeout, or malformed body as "no optimization" — forwarding the original request. Within the pipeline a strategy that throws is skipped, not fatal. Cost optimization is never allowed to degrade an answer — the downside of a failure is bounded to "full price," never "wrong result." (Fail-safe-to-frontier model routing is a gateway behavior; the automated version is roadmap.)
What is not the optimizer's job
These stay the gateway's responsibility, never the optimizer's:
- talking to providers, wire-format translation, streaming
- model/provider routing, fallbacks, load-balancing
- provider-key management and request signing
- spend attribution (the content-free spend store) and content privacy
The optimizer only decides; the gateway transports, routes, and meters. See How requests are optimized.