Skip to main content

GCP / GKE

GKE is a first-class target. Anyray runs in your GKE cluster / GCP project; prompts, the optimizer, traces, and cache all stay there. The Anyray gateway speaks Vertex (Claude) natively, so you can point Claude Code straight at it; a LiteLLM + Vertex stack remains a valid alternative (below).

:::tip Just want Claude Code on Vertex working fast? If your team uses Claude Code on Vertex and you don't run a gateway yet, start with the Vertex + Claude Code quickstart. It deploys the shared stack on a GCE VM; this page covers running that same stack in a GKE cluster. :::

Endpoint redirect (config-based)

Integration is config-based: workloads set their SDK base-URL env to point at the in-cluster Anyray endpoint — OPENAI_BASE_URL for OpenAI SDKs, ANTHROPIC_BASE_URL for Anthropic SDKs (Claude Code) — via their Deployment/pod spec or your config management.

No org CA, no TLS-MITM, no HTTPS_PROXY. (An earlier zero-touch admission-webhook injection was dropped in favor of this explicit, auditable config-based redirect.)

Anyray gateway → Vertex (the native path)

The Anyray gateway speaks Vertex natively, so the simplest in-cluster shape is:

GKE workloads ──▶ Anyray gateway (:8787, in-cluster) ──▶ Vertex (Claude)
│ Optimizer Protocol v1 (/v1/optimize …)

ANYRAY OPTIMIZER (:8088, internal, credential-free)
  • Bring up the stack from the repo root (docker compose up -d) or your in-cluster manifests; point workloads' base-URL env at the gateway's in-cluster Service.
  • The optimizer is credential-free — it decides request transforms; the gateway holds Vertex credentials (via workload identity) and makes the signed call.
  • No interception layer is needed beyond the endpoint-override env.

The LiteLLM + Claude-on-Vertex pattern (alternative)

If you prefer LiteLLM as the front door (e.g. you already run it), it's still a valid pattern when a client runs LiteLLM on GKE → Claude on Vertex:

GKE workloads ──▶ LiteLLM (in-cluster, :4000) ──▶ Vertex (Claude)
│ /v1/optimize · /v1/optimize-response

ANYRAY OPTIMIZER (credential-free)
  • Deploy the LiteLLM adapter + the optimizer in-cluster.
  • LiteLLM holds the Vertex credentials and makes the signed call; the optimizer only decides.

Credentials

Provider credentials (including Vertex auth via workload identity) are held and used by the gateway (Anyray's own gateway, or LiteLLM in the alternative pattern), not the optimizer. The optimizer never takes custody of provider keys.

See also