Configure

Configuring Anyray is mostly choosing which optimization strategies to run, in what order, with what parameters — tuned to your use-case. It is not just setting one model-router threshold.

Optimization is configured in optimizer.config.json: an ordered list of strategies, each enabled or disabled and carrying its own params. The config is runtime-mutable — you edit it live from the console Settings page or via the admin API, and every change is audit-logged. You do not restart the stack to change strategies.

Pick a pipeline for your use-case

The config is an ordered strategies[] array. Array order is execution order, and each entry is { kind, enabled, params }:

{
  "strategies": [
    { "kind": "semantic_cache",      "enabled": false, "params": {} },
    { "kind": "vision_ocr",          "enabled": false, "params": {} },
    { "kind": "prompt_compression",  "enabled": true,  "params": {} },
    { "kind": "context_compression", "enabled": true,  "params": {} },
    { "kind": "code_skeleton",       "enabled": true,  "params": {} },
    { "kind": "code_graph",          "enabled": false, "params": {} },
    { "kind": "relevance_filter",    "enabled": false, "params": {} },
    { "kind": "window_budget",       "enabled": true,  "params": { "maxTokens": 24000 } },
    { "kind": "tool_pruning",        "enabled": true,  "params": {} },
    { "kind": "param_tuning",        "enabled": true,  "params": { "maxTokensCap": 4096 } }
  ],
  "overrides": {
    "byEndpoint": {
      "/v1/chat/completions": { "strategies": [ /* per-endpoint overrides */ ] }
    },
    "rules": [ /* ordered targeting rules — see below */ ]
  }
}

To change the pipeline you reorder, enable/disable, or re-parametrize entries — there are no separate "enabled list" and "order list" to keep in sync; the array is both.

The registered strategy kinds (the whole registry) are:

`kind`	What it does	Default
`prompt_compression`	Shorten / restructure the prompt and system message.	on
`context_compression`	Shrink bulky tool/function output the agent reads back (minify JSON, collapse whitespace, cap oversized blobs). Reversible: the original is stashed and retrievable via `POST /v1/retrieve`.	on
`code_skeleton`	Outline a source file the agent read back to its signatures/structure, eliding statement bodies. Reversible via `POST /v1/retrieve`.	on
`code_graph`	Graph-aware multi-file skeletonization: keep the bodies of the symbols the live question actually touches (plus their graph neighbors), elide the rest. Enable this or `code_skeleton` for multi-file workloads, not both. Reversible via `POST /v1/retrieve`.	off
`relevance_filter`	Keep only the lines of a large tool output relevant to the live question (BM25, lexical). Reversible via `POST /v1/retrieve`.	off
`window_budget`	Fit the conversation into a context-window token budget by cropping the oldest middle messages, pinning the system prompt and most recent turns. Reversible: cropped originals are stashed and retrievable via `POST /v1/retrieve`.	on
`tool_pruning`	Drop tools unlikely to be invoked for this request.	on
`param_tuning`	Clamp generation params (e.g. `max_tokens`, `temperature`).	on
`vision_ocr`	Swap a pasted text-only screenshot for its OCR'd text (local Tesseract, confidence-gated). Reversible via `POST /v1/retrieve`.	off
`semantic_cache`	Short-circuit near-duplicate requests from cache.	off

See the strategy menu and the per-use-case pipelines table.

Targeting rules: different pipelines per team / user / model / endpoint

Beyond the legacy overrides.byEndpoint map, the config supports overrides.rules — an ordered list that lets one org run different strategy pipelines for different teams, users, models, or endpoints. Each rule has a when matcher and an action:

{
  "overrides": {
    "rules": [
      {
        "when": { "teams": ["platform"], "models": ["gpt-4o*"] },
        "disable": ["prompt_compression"]
      },
      {
        "when": { "endpoints": ["/v1/embeddings"] },
        "enable": ["semantic_cache"],
        "params": { "semantic_cache": { "threshold": 0.95 } }
      }
    ]
  }
}

when matches on endpoints, models, users, teams, and metadata — each a list or glob; identity comes from the x-anyray-metadata header. A rule with multiple matchers applies only when all of them match.
disable / enable turn strategies off/on for matching requests, and params overrides their params. disable wins over enable when both touch the same strategy.
Rules are evaluated in order, layered on top of the base strategies[] pipeline.

Configure them from the console Optimizer page or via PUT /admin/optimizer/settings — runtime-mutable and audit-logged, like the rest of the optimizer config.

Editing the config

There are two equivalent ways to change optimizer.config.json, both gated by your single ANYRAY_ADMIN_TOKEN:

Console Settings page — toggle, reorder, and parametrize strategies in the UI.
Admin API — GET /admin/optimizer/settings returns the current config plus capabilities (the registry kinds and their default params); PUT /admin/optimizer/settings writes a new config.

The config file path is ANYRAY_OPTIMIZER_CONFIG (a baked default seed ships with the image). Every change — from the console or the API — is audit-logged.

Content-privacy mode

How much request/response content Anyray stores is an org-wide setting, configured here and togglable at runtime from the console (audit-logged). It is driven by environment defaults and switchable live:

Mode (`ANYRAY_CONTENT_MODE`)	Behavior
`encrypted` (default)	Content stored encrypted at rest (AES-256-GCM via `ANYRAY_CONTENT_KEY`); humans only ever see ciphertext.
`off`	Store no content at all — metadata only.
`plaintext`	Store content in the clear. Deploy-gated: only available when `ANYRAY_ALLOW_PLAINTEXT=true`.

The fail-safe never degrades up to plaintext: if ANYRAY_ALLOW_PLAINTEXT is not set, plaintext is unavailable regardless of runtime toggles. See The data boundary and Security.

Gateway admin configuration

Some configuration belongs to the gateway, not the optimizer — provider credentials, routing, and model pricing. These are also runtime-mutable behind the single ANYRAY_ADMIN_TOKEN, each with its own console page and /admin/* endpoint.

Providers (server-held provider keys)

Provider API keys can be set at runtime from the console Providers page (or GET/PUT /admin/provider-keys) instead of, or in addition to, the ANYRAY_PROVIDER_KEY_* environment variables. The runtime admin path takes precedence over the env vars. Keys are stored server-side and never exposed to clients (callers send only a placeholder); every change is audit-logged by provider slug — never the key value. See Security → server-held provider keys.

Routing (gateway routing config)

Choosing which model/provider a request goes to — routing, fallbacks, and load-balancing — is the gateway's job, not an optimizer strategy. Set the default routing strategy on the console Routing page (or GET/PUT /admin/routing-config):

Strategy	Behavior
`single`	Send every request to one target.
`loadbalance`	Spread requests across multiple targets.
`fallback`	Try targets in order; advance on failure.
`conditional`	Pick a target from request conditions.

Each target carries its own retry / fallback settings. (You can also configure routing on the Anyray gateway directly.)

:::info Roadmap: automated model routing An automated classifier that downgrades routine requests to a cheaper model and fail-safes uncertain ones to a frontier model — with a confidence threshold — is roadmap. The gateway today gives you explicit routing, fallbacks, and load-balancing; the classify-and-downgrade version is planned. :::

Pricing (model pricing table)

The console Pricing page (or GET/PUT /admin/pricing) holds the admin-editable model pricing table — USD per token per model. These prices drive cost attribution in the content-free spend store: token usage is multiplied by the configured price to produce the costUsd on each spend row. Unpriced models still pass through — the request is served normally, just with no cost attributed. Keep the table current as you adopt new models so spend stays accurate. See Billing.

Adaptive optimization (roadmap)

Instead of hand-tuning the pipeline, an admin could opt into adaptive optimization — Anyray learning from your org's own traffic and improving its strategies on the fly, with a quality gate and automatic rollback.

:::warning Status: roadmap The closed-loop learner and its opt-in switch are planned, not yet implemented. See Adaptive optimization for the full design and its safety/data-boundary guarantees. :::

The fail-safe is not configurable away

However you compose the pipeline, strategies fail open: any optimizer error or timeout (the gateway uses an 800 ms hard timeout) lets the request pass through unchanged to the provider. You can make Anyray more conservative; you cannot make it block a request on optimizer failure. See The optimizer.

Validate before rollout

Change the config from the console (or PUT /admin/optimizer/settings), then send representative requests for the use-case and check the per-strategy savings before widening traffic. Because the config is runtime-mutable, you can adjust and re-test without redeploying.

Pick a pipeline for your use-case​

Targeting rules: different pipelines per team / user / model / endpoint​

Editing the config​

Content-privacy mode​

Gateway admin configuration​

Providers (server-held provider keys)​

Routing (gateway routing config)​

Pricing (model pricing table)​

Adaptive optimization (roadmap)​

The fail-safe is not configurable away​

Validate before rollout​