Anyray's own gateway
Anyray ships its own gateway (TypeScript + Hono), and it's the implemented default —
the one the rest of the system runs against today. It's a fork of Portkey's open-source
gateway, so it is multi-provider out of the box: openai, anthropic, bedrock,
google-vertex-ai, azure-openai, and more. It is OpenAI-compatible and listens on
:8787.
:::info Status: implemented (default)
The gateway (gateway/) is the running default. It speaks Anthropic, Vertex, and
Bedrock natively (no separate adapter needed), and it calls the optimizer over
Optimizer Protocol v1 via the in-repo adapter
gateway/src/services/optimizer.ts.
:::
When to choose it
- You have no gateway and don't want to adopt one just for Anyray — this is the default.
- You want a single vendor for transport + optimization.
- You call any of the supported providers — OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI — the gateway translates and routes natively.
If you already run LiteLLM, Portkey, Kong, Cloudflare, or Envoy and want to keep it, an adapter is the path (LiteLLM has a reference adapter; the rest are roadmap stubs).
What it does
- OpenAI-compatible proxy/router your SDKs point at. As a Portkey fork it speaks many
providers natively — Anthropic
/v1/messages, Vertex (Claude), and Bedrock included. - Owns provider transport, model/provider routing + fallbacks + load-balancing, provider keys, content-free spend attribution, and content privacy.
- Because the gateway owns the response path, it can serve cache hits directly — its
optimizer adapter sets
canShortCircuit: true, so acacheHitfrom/v1/optimizeis returned to the caller without touching the provider (unlike LiteLLM, where cache hits are delegated to the host's built-in cache).
How it reaches the optimizer
It calls the optimizer (optimizer:8088, internal) over Optimizer
Protocol v1 via the in-repo adapter
gateway/src/services/optimizer.ts:
POST /v1/optimize(pre-call) — transform the request; may returncacheHit+cachedResponse(served directly, sincecanShortCircuit: true).POST /v1/optimize-response(post-call) — transform the response.POST /v1/cache(write-back) forsemantic_cache.
The hook is fail-open with a hard 800 ms timeout: on any error or timeout the
request is forwarded unchanged. Spend is recorded by the gateway's own content-free spend
store — there is no meter() call to the optimizer.
SDK ──▶ Anyray gateway (Hono, :8787)
│ Optimizer Protocol v1 (gateway/src/services/optimizer.ts)
│ POST /v1/optimize → request transform / cacheHit
▼
OPTIMIZER (:8088, internal — fail-open, 800ms)
│
▼ provider call (openai · anthropic · bedrock · google-vertex-ai · azure-openai · …)
then POST /v1/optimize-response (response transform)
Run it
The whole system (gateway + optimizer + console + datastores) comes up from the root compose:
cp .env.example .env
docker compose up -d # gateway on :8787, console on :3000
Or run just the gateway on the host:
cd gateway && npm run build && node build/start-server.js
See gateway/README.md
for the implemented feature list.
Deploy
See Cloud Providers for environment-specific notes. The gateway speaks Bedrock and Vertex natively, so no provider-specific adapter is required.