Org Admin overview
This section is for the person who runs Anyray for an organization — deploying it, pointing workloads at it, configuring optimization, operating it day-to-day, and owning its security and billing posture.
The job to be done: get your organization's AI-inference spend under control — so you can support more AI, not less. Your developers (Claude Code and other AI assistants), internal agents, and SDK apps generate LLM cost across laptops and clusters. Anyray lets you see and attribute it (by team, app, model, and provider) and cut it, without asking anyone to change how they work — and using only cost metadata, never prompt or response content. Cost governance here doesn't mean watching your people.
:::tip You adopt Anyray at zero risk — you don't commit blind (roadmap) The common worry — "will this hurt our performance?" — is meant to be removed by how you adopt, not by a promise. The planned path: start in observe-only Shadow Mode, where Anyray changes nothing and shows you, on your own traffic, what it would save and the measured quality delta; keep a permanent unoptimized holdback as ground truth, gate on a quality SLO, and keep a one-flag kill switch. (Shadow Mode, holdback, and the quality SLO are roadmap.) Today the equivalent discipline is to begin with the conservative, provably-lossless strategies and watch real spend in the console before widening. See Proof, not promises. :::
:::info One monorepo — docker compose up
There is no separate install bundle. The whole system — gateway, console, optimizer, trace
backend, and datastores — lives in one repo,
anyrayHQ/monorepo, and comes up from a single
root docker-compose.yml. Everything runs in your environment; nothing is closed or
phoned home. These pages are the conceptual guide; that repo is the runnable source of
truth.
:::
Your job, end to end
- Install — clone the monorepo and
docker compose up -dto stand up the gateway, console, optimizer, trace backend, and datastores in your environment. - Configure — choose and order the
optimization strategies for your use-cases and set their
parameters, from the console Settings page or
optimizer.config.json. Set your content-privacy mode here too. - Point traffic at it — redirect worker endpoints to the gateway base URL
(
OPENAI_BASE_URL/ANTHROPIC_BASE_URL→ the gateway on:8787). No app changes, no CA/TLS-MITM. (Zero-touch admission-webhook injection is roadmap.) - Operate — watch spend, traces, and sessions in the console; confirm it
fails safe; roll out new logic with
docker compose pull && up -d. - Secure — everything is self-hosted; content is encrypted at rest and nothing leaves your environment.
- Bill — understand the content-free spend store and what it records.
(Observe-only Shadow Mode with a control holdback — proving savings and quality on your own traffic before enabling anything — is roadmap. See Proof, not promises.)
Decide your topology first
| Situation | What you deploy | Start at |
|---|---|---|
| Default — you want a gateway | The Anyray gateway (multi-provider) + optimizer + console | Anyray's own gateway |
| You already run another AI gateway | The optimizer + console, plus an adapter for your gateway (roadmap) | Gateways |
| You run on GKE / Vertex, Bedrock, etc. | Same, plus provider-specific wiring | Cloud Providers |
The default, implemented path is Anyray's own gateway — a multi-provider proxy/router that already speaks Anthropic, OpenAI, Vertex, Bedrock, and more. Adapters that put the optimizer in front of a different host gateway (LiteLLM, Kong, Envoy, Cloudflare, Portkey) are roadmap.
The mental model in one line
You docker compose up the stack in your environment, redirect traffic to the Anyray
gateway, and the optimizer makes each request cheaper while
failing open — with
your data never leaving.