Skip to main content

Developer FAQ

Will my answers get worse?

No — that's the core design constraint. The optimizer hook fails open (800 ms timeout), so if it's down or slow your request is forwarded unchanged. Model routing is the gateway's job; automated cheap/frontier routing is roadmap, and when it ships, anything uncertain goes to the requested/frontier model. The worst case is "you paid full price," never "you got a worse answer."

Is Anyray just a model router?

No. Model routing is the gateway's job (provider/model selection + fallbacks). The optimizer applies request optimizationsparam_tuning, prompt_compression, tool_pruning, and semantic_cache (default off). Your org chooses which strategies run, and in what order, so what happens to your request depends on that configuration.

Do I have to change my code or SDK?

No. You keep your OpenAI / Anthropic SDK and your request shapes. Only the base URL changes — set it yourself, or run anyray-connect to point Claude Code, Cursor, Windsurf, and your shell/SDK env at the gateway in one command.

Is there a CLI to point my tools at the gateway?

Yes — anyray-connect. Run npx anyray-connect --gateway <gateway> and it writes each tool's base URL plus a placeholder key (your real provider key stays server-side) and content-free attribution metadata only. It's idempotent, previews with --dry-run, and undoes itself with --revert. Note: this CLI is the client-side on-ramp — it does not install the stack (that's docker compose, run by your admin).

Does streaming still work? Do tool calls still work?

Yes. Streaming, tool/function calls, and the response shape are the gateway's responsibility and are unchanged. The optimizer only rewrites the request (params/messages/tools), or serves a cache hit.

Will it add latency?

Generally no, often less. Cache hits skip the provider. The one added step is the /v1/optimize call, which is designed to be fast and fails open (800 ms timeout) if the optimizer is slow or down. Spend is recorded by the gateway's own store and never blocks your response.

What happens if the optimizer is down?

Your request passes through unchanged to the requested model. An optimizer outage means "no optimization right now," not "inference is broken."

The model I requested isn't the one that ran — is that a bug?

No. Model routing is the gateway's job; if your org enables automated routing (roadmap) you may see a cheaper capable model the gateway judged safe. If you need a specific model to always run (e.g. for evals), that's an admin configuration, not a code change on your side.

Does my prompt data leave the company?

No. Anyray is fully self-hosted — nothing leaves your org's environment. Content (prompts/responses) is encrypted at rest by default and never shown in any UI; only content-free metadata is used. See The data boundary.

How do I see what happened to my request?

Traces are stored locally and are metadata-only by default. Ask your admin for access to the Anyray console to see the metadata (model, provider, tokens, cost, latency) for a given request. See Org Admin → Operate.