Skip to main content

Org Admin overview

This section is for the person who runs Anyray for an organization — deploying it, pointing workloads at it, configuring optimization, operating it day-to-day, and owning its security and billing posture.

The job to be done: get your organization's AI-inference spend under control — so you can support more AI, not less. Your developers (Claude Code and other AI assistants), internal agents, and SDK apps generate LLM cost across laptops and clusters. Anyray lets you see and attribute it (by team, app, model, and provider) and cut it, without asking anyone to change how they work — and using only cost metadata, never prompt or response content. Cost governance here doesn't mean watching your people.

:::tip You adopt Anyray at zero risk — you don't commit blind (roadmap) The common worry — "will this hurt our performance?" — is meant to be removed by how you adopt, not by a promise. The planned path: start in observe-only Shadow Mode, where Anyray changes nothing and shows you, on your own traffic, what it would save and the measured quality delta; keep a permanent unoptimized holdback as ground truth, gate on a quality SLO, and keep a one-flag kill switch. (Shadow Mode, holdback, and the quality SLO are roadmap.) Today the equivalent discipline is to begin with the conservative, provably-lossless strategies and watch real spend in the console before widening. See Proof, not promises. :::

:::info One monorepo — docker compose up There is no separate install bundle. The whole system — gateway, console, optimizer, trace backend, and datastores — lives in one repo, anyrayHQ/monorepo, and comes up from a single root docker-compose.yml. Everything runs in your environment; nothing is closed or phoned home. These pages are the conceptual guide; that repo is the runnable source of truth. :::

Your job, end to end

  1. Install — clone the monorepo and docker compose up -d to stand up the gateway, console, optimizer, trace backend, and datastores in your environment.
  2. Configure — choose and order the optimization strategies for your use-cases and set their parameters, from the console Settings page or optimizer.config.json. Set your content-privacy mode here too.
  3. Point traffic at it — redirect worker endpoints to the gateway base URL (OPENAI_BASE_URL / ANTHROPIC_BASE_URL → the gateway on :8787). No app changes, no CA/TLS-MITM. (Zero-touch admission-webhook injection is roadmap.)
  4. Operate — watch spend, traces, and sessions in the console; confirm it fails safe; roll out new logic with docker compose pull && up -d.
  5. Secure — everything is self-hosted; content is encrypted at rest and nothing leaves your environment.
  6. Bill — understand the content-free spend store and what it records.

(Observe-only Shadow Mode with a control holdback — proving savings and quality on your own traffic before enabling anything — is roadmap. See Proof, not promises.)

Decide your topology first

SituationWhat you deployStart at
Default — you want a gatewayThe Anyray gateway (multi-provider) + optimizer + consoleAnyray's own gateway
You already run another AI gatewayThe optimizer + console, plus an adapter for your gateway (roadmap)Gateways
You run on GKE / Vertex, Bedrock, etc.Same, plus provider-specific wiringCloud Providers

The default, implemented path is Anyray's own gateway — a multi-provider proxy/router that already speaks Anthropic, OpenAI, Vertex, Bedrock, and more. Adapters that put the optimizer in front of a different host gateway (LiteLLM, Kong, Envoy, Cloudflare, Portkey) are roadmap.

The mental model in one line

You docker compose up the stack in your environment, redirect traffic to the Anyray gateway, and the optimizer makes each request cheaper while failing open — with your data never leaving.