Skip to main content

LiteLLM

LiteLLM is Anyray's reference host adapter. A real reference adapter exists in optimizer/adapters/, showing how a host gateway maps its hooks onto Optimizer Protocol v1. (Anyray's own gateway is the implemented default; LiteLLM is the roadmap path for orgs that already run LiteLLM.)

:::tip Status: reference adapter (packaging roadmap) The adapter exists as a reference in optimizer/adapters/: it calls /v1/optimize pre-call and /v1/optimize-response / /v1/cache post-call. Full packaging into a published, drop-in plugin is roadmap. :::

How it plugs in

CapabilityMechanism
Optimize (rewrite request)async_pre_call_hook calls POST {OPTIMIZER}/v1/optimize with canShortCircuit: false and applies the request transform.
Cache hitDelegated to LiteLLM's built-in cache (cache: true) — the pre-call hook can't return a cached success.
Optimize responseasync_post_call_success_hook calls POST {OPTIMIZER}/v1/optimize-response (and POST {OPTIMIZER}/v1/cache for write-back).
LiteLLM async_pre_call_hook ──POST /v1/optimize──────────▶ OPTIMIZER (request transform; canShortCircuit:false)
LiteLLM async_post_call_success ──POST /v1/optimize-response─▶ OPTIMIZER (response transform)
──POST /v1/cache─────────────▶ OPTIMIZER (semantic_cache write-back)
LiteLLM built-in cache ── serves cache hits (cache: true)

:::note Why the post-call hook, not a logger callback The post-call work runs in async_post_call_success_hook because it fires on both the OpenAI /chat/completions and the Anthropic /v1/messages routes (the one Claude Code uses), streaming and non-streaming, and reliably carries the request metadata. The logging callback (async_log_success_event) does not carry that metadata on the /v1/messages path. Verified against LiteLLM 1.87 across messages/completions × stream/non-stream. :::

:::warning Cache-hit behavior (canShortCircuit: false) A str return from LiteLLM's async_pre_call_hook is treated as a rejection (HTTP 400), not a cached success — so the LiteLLM adapter reports canShortCircuit: false. It applies request transforms only, and cache-hit serving is delegated to LiteLLM's built-in cache. Inline hosts (Anyray's own gateway, Kong, Envoy) can serve cache hits directly. :::

Run the reference adapter

cd optimizer/adapters/litellm
# follow the adapter README to point a LiteLLM proxy's hooks at the optimizer

Then point an SDK at the LiteLLM proxy:

export OPENAI_BASE_URL=http://localhost:4000

Send a chat completion and watch the optimizer transform the request and the post-call hook record the response. The adapter callback lives in anyray_callback.py.

Fail-open

If the optimizer is unreachable or returns an error, the hook passes the request through unchanged — LiteLLM keeps serving inference rather than failing. This is mandatory for every adapter; see the protocol.

See also