The optimizer

The optimizer is Anyray's optimization core — the real optimizer/ service (optimizer:8088, internal only). It's a gateway-neutral HTTP hook backend that, given an LLM request, decides how to serve it as cheaply as possible by running a configurable pipeline of optimization strategies. It is off the request's forwarding path: the gateway calls it from its hook points through a thin adapter (gateway/src/services/optimizer.ts) and always forwards traffic itself.

:::info Status The optimizer (optimizer/, TypeScript) is implemented and running over Optimizer Protocol v1. The registry ships param_tuning, prompt_compression, context_compression, window_budget, tool_pruning (all default-on) and semantic_cache (default-off) — see the strategy menu. Model routing is the gateway's job, not the optimizer's; the automated classifier-driven version is roadmap. :::

It's a pipeline, not a single router

A common misconception is that Anyray is "a model router." Model routing isn't even an optimizer strategy — it's a capability of the gateway. The optimizer orchestrates an ordered pipeline of non-routing strategies that each org enables and arranges to fit its use-cases — parameter tuning, prompt compression, context compression, context-window budgeting, tool pruning, and semantic caching. See Optimization strategies.

What the optimizer decides

For each request, the optimizer's POST /v1/optimize hook runs the configured pipeline and returns a (possibly) transformed request plus a list of content-free decisions:

Outcome	Meaning
request unchanged	No strategy changed the request — the gateway forwards it as-is.
transformed request	One or more strategies rewrote the request (messages, tools, params). The gateway forwards the returned `request`.
cacheHit	When the caller `canShortCircuit`, `semantic_cache` may return `cacheHit:true` + `cachedResponse`; the gateway serves it and skips the provider.

A second hook, POST /v1/optimize-response, can transform the response after the call (e.g. for output-stage strategies). There is no meter() call — spend metering is the gateway's content-free spend store, not an optimizer endpoint.

It's a black box behind a fixed contract

The optimizer is deliberately opaque. Its internal make-up — which strategies exist, how they're ordered, the cache — can change daily, and it may be one service today and several tomorrow. The gateway and its adapters never see inside and never change, because the only fixed thing is the Optimizer Protocol v1 contract. New strategies ship inside the optimizer without touching anything downstream.

 ╔══════════════════════════════════════════════════════════╗
 ║  ANYRAY OPTIMIZER    ▓▓▓  BLACK BOX  ▓▓▓                     ║
 ║  a configurable pipeline of strategies:                    ║
 ║  param tuning · prompt compression · context compression · ║
 ║  window budget · tool pruning · semantic cache             ║
 ║  reordered / swapped without touching any adapter          ║
 ╚══════════════════════════════════════════════════════════╝
   ▲ POST /v1/optimize        ▲ POST /v1/optimize-response
   └────── the ONLY fixed contract (Protocol v1) ──────┘

Fail-open is a hard invariant

The single most important rule: the optimizer can never break a request. The gateway calls it with a hard 800ms timeout and treats any non-2xx, timeout, or malformed body as "no optimization" — forwarding the original request. Within the pipeline a strategy that throws is skipped, not fatal. Cost optimization is never allowed to degrade an answer — the downside of a failure is bounded to "full price," never "wrong result." (Fail-safe-to-frontier model routing is a gateway behavior; the automated version is roadmap.)

What is not the optimizer's job

These stay the gateway's responsibility, never the optimizer's:

talking to providers, wire-format translation, streaming
model/provider routing, fallbacks, load-balancing
provider-key management and request signing
spend attribution (the content-free spend store) and content privacy

The optimizer only decides; the gateway transports, routes, and meters. See How requests are optimized.

It's a pipeline, not a single router​

What the optimizer decides​

It's a black box behind a fixed contract​

Fail-open is a hard invariant​

What is not the optimizer's job​

It's a pipeline, not a single router

What the optimizer decides

It's a black box behind a fixed contract

Fail-open is a hard invariant

What is not the optimizer's job