Skip to main content

Anyray's own gateway

Anyray ships its own gateway (TypeScript + Hono), and it's the implemented default — the one the rest of the system runs against today. It's a fork of Portkey's open-source gateway, so it is multi-provider out of the box: openai, anthropic, bedrock, google-vertex-ai, azure-openai, and more. It is OpenAI-compatible and listens on :8787.

:::info Status: implemented (default) The gateway (gateway/) is the running default. It speaks Anthropic, Vertex, and Bedrock natively (no separate adapter needed), and it calls the optimizer over Optimizer Protocol v1 via the in-repo adapter gateway/src/services/optimizer.ts. :::

When to choose it

  • You have no gateway and don't want to adopt one just for Anyray — this is the default.
  • You want a single vendor for transport + optimization.
  • You call any of the supported providers — OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI — the gateway translates and routes natively.

If you already run LiteLLM, Portkey, Kong, Cloudflare, or Envoy and want to keep it, an adapter is the path (LiteLLM has a reference adapter; the rest are roadmap stubs).

What it does

  • OpenAI-compatible proxy/router your SDKs point at. As a Portkey fork it speaks many providers natively — Anthropic /v1/messages, Vertex (Claude), and Bedrock included.
  • Owns provider transport, model/provider routing + fallbacks + load-balancing, provider keys, content-free spend attribution, and content privacy.
  • Because the gateway owns the response path, it can serve cache hits directly — its optimizer adapter sets canShortCircuit: true, so a cacheHit from /v1/optimize is returned to the caller without touching the provider (unlike LiteLLM, where cache hits are delegated to the host's built-in cache).

How it reaches the optimizer

It calls the optimizer (optimizer:8088, internal) over Optimizer Protocol v1 via the in-repo adapter gateway/src/services/optimizer.ts:

  • POST /v1/optimize (pre-call) — transform the request; may return cacheHit + cachedResponse (served directly, since canShortCircuit: true).
  • POST /v1/optimize-response (post-call) — transform the response.
  • POST /v1/cache (write-back) for semantic_cache.

The hook is fail-open with a hard 800 ms timeout: on any error or timeout the request is forwarded unchanged. Spend is recorded by the gateway's own content-free spend store — there is no meter() call to the optimizer.

SDK ──▶ Anyray gateway (Hono, :8787)
│ Optimizer Protocol v1 (gateway/src/services/optimizer.ts)
│ POST /v1/optimize → request transform / cacheHit

OPTIMIZER (:8088, internal — fail-open, 800ms)

▼ provider call (openai · anthropic · bedrock · google-vertex-ai · azure-openai · …)
then POST /v1/optimize-response (response transform)

Run it

The whole system (gateway + optimizer + console + datastores) comes up from the root compose:

cp .env.example .env
docker compose up -d # gateway on :8787, console on :3000

Or run just the gateway on the host:

cd gateway && npm run build && node build/start-server.js

See gateway/README.md for the implemented feature list.

Deploy

See Cloud Providers for environment-specific notes. The gateway speaks Bedrock and Vertex natively, so no provider-specific adapter is required.