LiteLLM
LiteLLM is Anyray's reference host adapter. A real reference adapter exists in
optimizer/adapters/,
showing how a host gateway maps its hooks onto Optimizer Protocol v1. (Anyray's own
gateway is the implemented default; LiteLLM is the roadmap path for orgs that already run
LiteLLM.)
:::tip Status: reference adapter (packaging roadmap)
The adapter exists as a reference in
optimizer/adapters/:
it calls /v1/optimize pre-call and /v1/optimize-response / /v1/cache post-call.
Full packaging into a published, drop-in plugin is roadmap.
:::
How it plugs in
| Capability | Mechanism |
|---|---|
| Optimize (rewrite request) | async_pre_call_hook calls POST {OPTIMIZER}/v1/optimize with canShortCircuit: false and applies the request transform. |
| Cache hit | Delegated to LiteLLM's built-in cache (cache: true) — the pre-call hook can't return a cached success. |
| Optimize response | async_post_call_success_hook calls POST {OPTIMIZER}/v1/optimize-response (and POST {OPTIMIZER}/v1/cache for write-back). |
LiteLLM async_pre_call_hook ──POST /v1/optimize──────────▶ OPTIMIZER (request transform; canShortCircuit:false)
LiteLLM async_post_call_success ──POST /v1/optimize-response─▶ OPTIMIZER (response transform)
──POST /v1/cache─────────────▶ OPTIMIZER (semantic_cache write-back)
LiteLLM built-in cache ── serves cache hits (cache: true)
:::note Why the post-call hook, not a logger callback
The post-call work runs in async_post_call_success_hook because it fires on both the
OpenAI /chat/completions and the Anthropic /v1/messages routes (the one Claude Code
uses), streaming and non-streaming, and reliably carries the request metadata. The logging
callback (async_log_success_event) does not carry that metadata on the /v1/messages
path. Verified against LiteLLM 1.87 across messages/completions × stream/non-stream.
:::
:::warning Cache-hit behavior (canShortCircuit: false)
A str return from LiteLLM's async_pre_call_hook is treated as a rejection (HTTP 400),
not a cached success — so the LiteLLM adapter reports canShortCircuit: false. It applies
request transforms only, and cache-hit serving is delegated to LiteLLM's built-in
cache. Inline hosts (Anyray's own gateway, Kong, Envoy) can serve cache hits directly.
:::
Run the reference adapter
cd optimizer/adapters/litellm
# follow the adapter README to point a LiteLLM proxy's hooks at the optimizer
Then point an SDK at the LiteLLM proxy:
export OPENAI_BASE_URL=http://localhost:4000
Send a chat completion and watch the optimizer transform the request and the post-call hook
record the response. The adapter callback lives in anyray_callback.py.
Fail-open
If the optimizer is unreachable or returns an error, the hook passes the request through unchanged — LiteLLM keeps serving inference rather than failing. This is mandatory for every adapter; see the protocol.
See also
- Cloud Providers → Vertex + Claude Code — a LiteLLM + Vertex stack for Claude Code (Anyray's own gateway also speaks Vertex now).
- Developers → Using SDKs for the client side.
- Cloud Providers for running on GKE → Vertex, Bedrock, etc.