The AdaptiveLLM Gateway
Routes every request. Learns from every response.
Know when a provider silently ships a regression. Cut model spend at equal quality — automatically. Answer "why did our bill double?" in one screen, not a grep. Built for teams shipping AI-powered products who've outgrown raw API access.
or browse all available models →Catch regressions
before your users do
Replay bank + judge re-eval flags quality drops the moment they appear.
Cut spend at equal quality
automatically
Nightly cost migration with rollback. Judge-score gated — no quality cliff.
Answer "why did our bill double?"
in one screen
Per-user + per-token attribution. MTD + forecast + anomaly. CSV export.
Unified access to leading providers
The gateway, the ops platform, in one.
Adaptive routing is the foundation. Regression detection, auto cost migration, and spend intelligence are what you actually buy it for.
Catch what you'd otherwise miss
Most LLM regressions are invisible until a customer complains. Not here.
Silent-regression detection
A replay bank of your top historical prompts re-runs against current models on a schedule. When quality drops, you know before your users do.
Quality-adjusted spend
Every cost row carries the judge-score envelope — median, p25, p75, and cost per quality point. Know which dollar is buying a good answer.
Weight-drift × spend
Changed your routing weights last week? See the exact spend mix in the attribution window after each change. Unique to Provara.
Cut spend without cutting quality
Automated, auditable, rollback-able — not a black box that just picks cheaper.
Auto cost migration
A nightly job migrates routing cells to cheaper models when quality parity holds. Reports savings in dollars. One click to roll back.
Savings recommendations
Per-cell "switch from X to Y" recommendations ranked by projected monthly savings, using real judge scores to guarantee quality parity.
Budgets with hard-stop
Monthly or quarterly caps with threshold alerts at 50/75/90/100%. Optional hard-stop returns HTTP 402 the moment a tenant hits the cap.
Stay on your terms
Self-host it or use the Cloud — same code, same features.
Self-host with Docker
One compose file, zero telemetry, BSL-licensed source. Prompts, keys, and scores never leave your infrastructure.
OpenAI-compatible API
Change the base URL in your existing code — any SDK that speaks /v1/chat/completions Just Works. OpenAI SDK, LangChain, LlamaIndex, all of it.
Audit log + compliance
Tenant-scoped audit trail with tier-gated retention (90d / 365d / 730d). CSV export, SIEM-pull API, SOC 2-shaped event vocabulary.
How it learns
Smarter with every request.
Every request is classified into a (task, complexity) cell. Explicit user ratings and the built-in LLM judge feed a per-cell quality EMA for every model you route to. Over time, models that actually perform on your traffic earn more of it — automatically.
- Weighted learning. User feedback nudges scores harder than automated judge scores — your signal always wins.
- Persistent across restarts. EMA scores live in SQLite, not memory. Weeks of signal survive every deploy.
- Sample-gated. A model needs real evidence before the router picks it on quality. Under-sampled cells fall back to cost-cheapest.
Color = quality score · updates on every scored request
Get started in minutes
Three steps to an adaptive LLM gateway.
Sign up or self-host
Create an account with Google or GitHub, or deploy with Docker.
$ docker compose up -dAdd your API keys
Connect any provider through the dashboard. Keys are encrypted at rest.
Route requests
Point your app at Provara. Drop-in OpenAI SDK compatible.
baseURL: "https://provara/v1"Drop-in compatible
Provara exposes an OpenAI-compatible API. Change two lines in your existing code — the base URL and the API key — and you're routing through Provara.
Works with the OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI chat completions format.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://your-provara.example/v1",
apiKey: "your-provara-token",
});
const response = await client.chat.completions.create({
model: "gpt-4o", // or any model from any provider
messages: [{ role: "user", content: "Hello!" }],
});Ready to stop flying blind?
Catch regressions before your users do. Cut spend at equal quality. See who burned your API budget. Self-host for free or use the Cloud.