LiteLLM

LiteLLM is Anyray's reference host adapter. A real reference adapter exists in optimizer/adapters/, showing how a host gateway maps its hooks onto Optimizer Protocol v1. (Anyray's own gateway is the implemented default; LiteLLM is the roadmap path for orgs that already run LiteLLM.)

:::tip Status: reference adapter (packaging roadmap) The adapter exists as a reference in optimizer/adapters/: it calls /v1/optimize pre-call and /v1/optimize-response / /v1/cache post-call. Full packaging into a published, drop-in plugin is roadmap. :::

How it plugs in

Capability	Mechanism
Optimize (rewrite request)	`async_pre_call_hook` calls `POST {OPTIMIZER}/v1/optimize` with `canShortCircuit: false` and applies the request transform.
Cache hit	Delegated to LiteLLM's built-in cache (`cache: true`) — the pre-call hook can't return a cached success.
Optimize response	`async_post_call_success_hook` calls `POST {OPTIMIZER}/v1/optimize-response` (and `POST {OPTIMIZER}/v1/cache` for write-back).

  LiteLLM async_pre_call_hook        ──POST /v1/optimize──────────▶ OPTIMIZER  (request transform; canShortCircuit:false)
  LiteLLM async_post_call_success    ──POST /v1/optimize-response─▶ OPTIMIZER  (response transform)
                                     ──POST /v1/cache─────────────▶ OPTIMIZER  (semantic_cache write-back)
  LiteLLM built-in cache             ── serves cache hits (cache: true)

:::note Why the post-call hook, not a logger callback The post-call work runs in async_post_call_success_hook because it fires on both the OpenAI /chat/completions and the Anthropic /v1/messages routes (the one Claude Code uses), streaming and non-streaming, and reliably carries the request metadata. The logging callback (async_log_success_event) does not carry that metadata on the /v1/messages path. Verified against LiteLLM 1.87 across messages/completions × stream/non-stream. :::

:::warning Cache-hit behavior (canShortCircuit: false) A str return from LiteLLM's async_pre_call_hook is treated as a rejection (HTTP 400), not a cached success — so the LiteLLM adapter reports canShortCircuit: false. It applies request transforms only, and cache-hit serving is delegated to LiteLLM's built-in cache. Inline hosts (Anyray's own gateway, Kong, Envoy) can serve cache hits directly. :::

Run the reference adapter

cd optimizer/adapters/litellm
# follow the adapter README to point a LiteLLM proxy's hooks at the optimizer

Then point an SDK at the LiteLLM proxy:

export OPENAI_BASE_URL=http://localhost:4000

Send a chat completion and watch the optimizer transform the request and the post-call hook record the response. The adapter callback lives in anyray_callback.py.

Fail-open

If the optimizer is unreachable or returns an error, the hook passes the request through unchanged — LiteLLM keeps serving inference rather than failing. This is mandatory for every adapter; see the protocol.

How it plugs in​

Run the reference adapter​

Fail-open​

See also​

How it plugs in

Run the reference adapter

Fail-open

See also