Configure
Configuring Anyray is mostly choosing which optimization strategies to run, in what order, with what parameters — tuned to your use-case. It is not just setting one model-router threshold.
Optimization is configured in optimizer.config.json: an ordered list of strategies,
each enabled or disabled and carrying its own params. The config is runtime-mutable —
you edit it live from the console Settings page or via the admin API, and every change
is audit-logged. You do not restart the stack to change strategies.
Pick a pipeline for your use-case
The config is an ordered strategies[] array. Array order is execution order, and each
entry is { kind, enabled, params }:
{
"strategies": [
{ "kind": "semantic_cache", "enabled": false, "params": {} },
{ "kind": "vision_ocr", "enabled": false, "params": {} },
{ "kind": "prompt_compression", "enabled": true, "params": {} },
{ "kind": "context_compression", "enabled": true, "params": {} },
{ "kind": "code_skeleton", "enabled": true, "params": {} },
{ "kind": "code_graph", "enabled": false, "params": {} },
{ "kind": "relevance_filter", "enabled": false, "params": {} },
{ "kind": "window_budget", "enabled": true, "params": { "maxTokens": 24000 } },
{ "kind": "tool_pruning", "enabled": true, "params": {} },
{ "kind": "param_tuning", "enabled": true, "params": { "maxTokensCap": 4096 } }
],
"overrides": {
"byEndpoint": {
"/v1/chat/completions": { "strategies": [ /* per-endpoint overrides */ ] }
},
"rules": [ /* ordered targeting rules — see below */ ]
}
}
To change the pipeline you reorder, enable/disable, or re-parametrize entries — there are no separate "enabled list" and "order list" to keep in sync; the array is both.
The registered strategy kinds (the whole registry) are:
kind | What it does | Default |
|---|---|---|
prompt_compression | Shorten / restructure the prompt and system message. | on |
context_compression | Shrink bulky tool/function output the agent reads back (minify JSON, collapse whitespace, cap oversized blobs). Reversible: the original is stashed and retrievable via POST /v1/retrieve. | on |
code_skeleton | Outline a source file the agent read back to its signatures/structure, eliding statement bodies. Reversible via POST /v1/retrieve. | on |
code_graph | Graph-aware multi-file skeletonization: keep the bodies of the symbols the live question actually touches (plus their graph neighbors), elide the rest. Enable this or code_skeleton for multi-file workloads, not both. Reversible via POST /v1/retrieve. | off |
relevance_filter | Keep only the lines of a large tool output relevant to the live question (BM25, lexical). Reversible via POST /v1/retrieve. | off |
window_budget | Fit the conversation into a context-window token budget by cropping the oldest middle messages, pinning the system prompt and most recent turns. Reversible: cropped originals are stashed and retrievable via POST /v1/retrieve. | on |
tool_pruning | Drop tools unlikely to be invoked for this request. | on |
param_tuning | Clamp generation params (e.g. max_tokens, temperature). | on |
vision_ocr | Swap a pasted text-only screenshot for its OCR'd text (local Tesseract, confidence-gated). Reversible via POST /v1/retrieve. | off |
semantic_cache | Short-circuit near-duplicate requests from cache. | off |
See the strategy menu and the per-use-case pipelines table.
Targeting rules: different pipelines per team / user / model / endpoint
Beyond the legacy overrides.byEndpoint map, the config supports overrides.rules — an
ordered list that lets one org run different strategy pipelines for different teams,
users, models, or endpoints. Each rule has a when matcher and an action:
{
"overrides": {
"rules": [
{
"when": { "teams": ["platform"], "models": ["gpt-4o*"] },
"disable": ["prompt_compression"]
},
{
"when": { "endpoints": ["/v1/embeddings"] },
"enable": ["semantic_cache"],
"params": { "semantic_cache": { "threshold": 0.95 } }
}
]
}
}
whenmatches onendpoints,models,users,teams, andmetadata— each a list or glob; identity comes from thex-anyray-metadataheader. A rule with multiple matchers applies only when all of them match.disable/enableturn strategies off/on for matching requests, andparamsoverrides their params.disablewins overenablewhen both touch the same strategy.- Rules are evaluated in order, layered on top of the base
strategies[]pipeline.
Configure them from the console Optimizer page or via
PUT /admin/optimizer/settings — runtime-mutable and audit-logged, like the rest of the
optimizer config.
Editing the config
There are two equivalent ways to change optimizer.config.json, both gated by your single
ANYRAY_ADMIN_TOKEN:
- Console Settings page — toggle, reorder, and parametrize strategies in the UI.
- Admin API —
GET /admin/optimizer/settingsreturns the current config pluscapabilities(the registry kinds and their default params);PUT /admin/optimizer/settingswrites a new config.
The config file path is ANYRAY_OPTIMIZER_CONFIG (a baked default seed ships with the
image). Every change — from the console or the API — is audit-logged.
Content-privacy mode
How much request/response content Anyray stores is an org-wide setting, configured here and togglable at runtime from the console (audit-logged). It is driven by environment defaults and switchable live:
Mode (ANYRAY_CONTENT_MODE) | Behavior |
|---|---|
encrypted (default) | Content stored encrypted at rest (AES-256-GCM via ANYRAY_CONTENT_KEY); humans only ever see ciphertext. |
off | Store no content at all — metadata only. |
plaintext | Store content in the clear. Deploy-gated: only available when ANYRAY_ALLOW_PLAINTEXT=true. |
The fail-safe never degrades up to plaintext: if ANYRAY_ALLOW_PLAINTEXT is not set,
plaintext is unavailable regardless of runtime toggles. See
The data boundary and Security.
Gateway admin configuration
Some configuration belongs to the gateway, not the optimizer — provider credentials,
routing, and model pricing. These are also runtime-mutable behind the single
ANYRAY_ADMIN_TOKEN, each with its own console page and /admin/* endpoint.
Providers (server-held provider keys)
Provider API keys can be set at runtime from the console Providers page (or
GET/PUT /admin/provider-keys) instead of, or in addition to, the ANYRAY_PROVIDER_KEY_*
environment variables. The runtime admin path takes precedence over the env vars. Keys
are stored server-side and never exposed to clients (callers send only a placeholder); every
change is audit-logged by provider slug — never the key value. See
Security → server-held provider keys.
Routing (gateway routing config)
Choosing which model/provider a request goes to — routing, fallbacks, and
load-balancing — is the gateway's job, not an optimizer strategy. Set the default
routing strategy on the console Routing page (or GET/PUT /admin/routing-config):
| Strategy | Behavior |
|---|---|
single | Send every request to one target. |
loadbalance | Spread requests across multiple targets. |
fallback | Try targets in order; advance on failure. |
conditional | Pick a target from request conditions. |
Each target carries its own retry / fallback settings. (You can also configure routing on the Anyray gateway directly.)
:::info Roadmap: automated model routing An automated classifier that downgrades routine requests to a cheaper model and fail-safes uncertain ones to a frontier model — with a confidence threshold — is roadmap. The gateway today gives you explicit routing, fallbacks, and load-balancing; the classify-and-downgrade version is planned. :::
Pricing (model pricing table)
The console Pricing page (or GET/PUT /admin/pricing) holds the admin-editable model
pricing table — USD per token per model. These prices drive cost attribution in the
content-free spend store: token usage is multiplied by the configured price to produce the
costUsd on each spend row. Unpriced models still pass through — the request is served
normally, just with no cost attributed. Keep the table current as you adopt new models so
spend stays accurate. See Billing.
Adaptive optimization (roadmap)
Instead of hand-tuning the pipeline, an admin could opt into adaptive optimization — Anyray learning from your org's own traffic and improving its strategies on the fly, with a quality gate and automatic rollback.
:::warning Status: roadmap The closed-loop learner and its opt-in switch are planned, not yet implemented. See Adaptive optimization for the full design and its safety/data-boundary guarantees. :::
The fail-safe is not configurable away
However you compose the pipeline, strategies fail open: any optimizer error or timeout (the gateway uses an 800 ms hard timeout) lets the request pass through unchanged to the provider. You can make Anyray more conservative; you cannot make it block a request on optimizer failure. See The optimizer.
Validate before rollout
Change the config from the console (or PUT /admin/optimizer/settings), then send
representative requests for the use-case and check the
per-strategy savings before widening traffic. Because the config is
runtime-mutable, you can adjust and re-test without redeploying.