Skip to main content

The optimizer

The optimizer is Anyray's optimization core — the real optimizer/ service (optimizer:8088, internal only). It's a gateway-neutral HTTP hook backend that, given an LLM request, decides how to serve it as cheaply as possible by running a configurable pipeline of optimization strategies. It is off the request's forwarding path: the gateway calls it from its hook points through a thin adapter (gateway/src/services/optimizer.ts) and always forwards traffic itself.

:::info Status The optimizer (optimizer/, TypeScript) is implemented and running over Optimizer Protocol v1. The registry ships param_tuning, prompt_compression, context_compression, window_budget, tool_pruning (all default-on) and semantic_cache (default-off) — see the strategy menu. Model routing is the gateway's job, not the optimizer's; the automated classifier-driven version is roadmap. :::

It's a pipeline, not a single router

A common misconception is that Anyray is "a model router." Model routing isn't even an optimizer strategy — it's a capability of the gateway. The optimizer orchestrates an ordered pipeline of non-routing strategies that each org enables and arranges to fit its use-cases — parameter tuning, prompt compression, context compression, context-window budgeting, tool pruning, and semantic caching. See Optimization strategies.

What the optimizer decides

For each request, the optimizer's POST /v1/optimize hook runs the configured pipeline and returns a (possibly) transformed request plus a list of content-free decisions:

OutcomeMeaning
request unchangedNo strategy changed the request — the gateway forwards it as-is.
transformed requestOne or more strategies rewrote the request (messages, tools, params). The gateway forwards the returned request.
cacheHitWhen the caller canShortCircuit, semantic_cache may return cacheHit:true + cachedResponse; the gateway serves it and skips the provider.

A second hook, POST /v1/optimize-response, can transform the response after the call (e.g. for output-stage strategies). There is no meter() call — spend metering is the gateway's content-free spend store, not an optimizer endpoint.

It's a black box behind a fixed contract

The optimizer is deliberately opaque. Its internal make-up — which strategies exist, how they're ordered, the cache — can change daily, and it may be one service today and several tomorrow. The gateway and its adapters never see inside and never change, because the only fixed thing is the Optimizer Protocol v1 contract. New strategies ship inside the optimizer without touching anything downstream.

╔══════════════════════════════════════════════════════════╗
║ ANYRAY OPTIMIZER ▓▓▓ BLACK BOX ▓▓▓ ║
║ a configurable pipeline of strategies: ║
║ param tuning · prompt compression · context compression · ║
║ window budget · tool pruning · semantic cache ║
║ reordered / swapped without touching any adapter ║
╚══════════════════════════════════════════════════════════╝
▲ POST /v1/optimize ▲ POST /v1/optimize-response
└────── the ONLY fixed contract (Protocol v1) ──────┘

Fail-open is a hard invariant

The single most important rule: the optimizer can never break a request. The gateway calls it with a hard 800ms timeout and treats any non-2xx, timeout, or malformed body as "no optimization" — forwarding the original request. Within the pipeline a strategy that throws is skipped, not fatal. Cost optimization is never allowed to degrade an answer — the downside of a failure is bounded to "full price," never "wrong result." (Fail-safe-to-frontier model routing is a gateway behavior; the automated version is roadmap.)

What is not the optimizer's job

These stay the gateway's responsibility, never the optimizer's:

  • talking to providers, wire-format translation, streaming
  • model/provider routing, fallbacks, load-balancing
  • provider-key management and request signing
  • spend attribution (the content-free spend store) and content privacy

The optimizer only decides; the gateway transports, routes, and meters. See How requests are optimized.