BSL-licensed · Self-host or Cloud · OpenAI-compatible

The AdaptiveLLM Gateway

Routes every request. Learns from every response.

Know when a provider silently ships a regression. Cut model spend at equal quality — automatically. Answer "why did our bill double?" in one screen, not a grep. Built for teams shipping AI-powered products who've outgrown raw API access.

Start free See it live →Star on GitHub

or browse all available models →

Catch regressions

before your users do

Replay bank + judge re-eval flags quality drops the moment they appear.

Cut spend at equal quality

automatically

Nightly cost migration with rollback. Judge-score gated — no quality cliff.

Answer "why did our bill double?"

in one screen

Per-user + per-token attribution. MTD + forecast + anomaly. CSV export.

Unified access to leading providers

OpenAI

Anthropic

Google

Mistral

xAI

Ollama

Groq

Together AI

The gateway, the ops platform, in one.

Adaptive routing is the foundation. Regression detection, auto cost migration, and spend intelligence are what you actually buy it for.

Catch what you'd otherwise miss

Most LLM regressions are invisible until a customer complains. Not here.

Silent-regression detection

A replay bank of your top historical prompts re-runs against current models on a schedule. When quality drops, you know before your users do.

Quality-adjusted spend

Every cost row carries the judge-score envelope — median, p25, p75, and cost per quality point. Know which dollar is buying a good answer.

Weight-drift × spend

Changed your routing weights last week? See the exact spend mix in the attribution window after each change. Unique to Provara.

Cut spend without cutting quality

Automated, auditable, rollback-able — not a black box that just picks cheaper.

Auto cost migration

A nightly job migrates routing cells to cheaper models when quality parity holds. Reports savings in dollars. One click to roll back.

Savings recommendations

Per-cell "switch from X to Y" recommendations ranked by projected monthly savings, using real judge scores to guarantee quality parity.

Budgets with hard-stop

Monthly or quarterly caps with threshold alerts at 50/75/90/100%. Optional hard-stop returns HTTP 402 the moment a tenant hits the cap.

Stay on your terms

Self-host it or use the Cloud — same code, same features.

Self-host with Docker

One compose file, zero telemetry, BSL-licensed source. Prompts, keys, and scores never leave your infrastructure.

OpenAI-compatible API

Change the base URL in your existing code — any SDK that speaks /v1/chat/completions Just Works. OpenAI SDK, LangChain, LlamaIndex, all of it.

Audit log + compliance

Tenant-scoped audit trail with tier-gated retention (90d / 365d / 730d). CSV export, SIEM-pull API, SOC 2-shaped event vocabulary.

How it learns

Smarter with every request.

Every request is classified into a (task, complexity) cell. Explicit user ratings and the built-in LLM judge feed a per-cell quality EMA for every model you route to. Over time, models that actually perform on your traffic earn more of it — automatically.

Weighted learning. User feedback nudges scores harder than automated judge scores — your signal always wins.
Persistent across restarts. EMA scores live in SQLite, not memory. Weeks of signal survive every deploy.
Sample-gated. A model needs real evidence before the router picks it on quality. Under-sampled cells fall back to cost-cheapest.

Adaptive Routinglive

simple

medium

complex

coding

nano · 4.7

haiku · 4.8

sonnet · 4.9

flash · 4.5

nano · 4.6

haiku · 4.7

creative

nano · 4.4

opus · 4.9

general

nano · 4.7

gpt-4o · 4.5

sonnet · 4.8

Color = quality score · updates on every scored request

Get started in minutes

Three steps to an adaptive LLM gateway.

Sign up or self-host

Create an account with Google or GitHub, or deploy with Docker.

$ docker compose up -d

Add your API keys

Connect any provider through the dashboard. Keys are encrypted at rest.

OpenAIAnthropicGoogleMistral+4 more

Route requests

Point your app at Provara. Drop-in OpenAI SDK compatible.

baseURL: "https://provara/v1"

Drop-in compatible

Provara exposes an OpenAI-compatible API. Change two lines in your existing code — the base URL and the API key — and you're routing through Provara.

Works with the OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI chat completions format.

app.ts

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://your-provara.example/v1",
  apiKey: "your-provara-token",
});

const response = await client.chat.completions.create({
  model: "gpt-4o", // or any model from any provider
  messages: [{ role: "user", content: "Hello!" }],
});

Ready to stop flying blind?

Catch regressions before your users do. Cut spend at equal quality. See who burned your API budget. Self-host for free or use the Cloud.

Get Started View on GitHub