Documentation

Context Optimizer

Context Optimizer is Provara's optimization and governance layer for retrieved context in RAG and agentic systems. It reduces duplicate, stale, risky, and low-value context before model routing, then records the savings and quality signals for operators.

Open API Reference Open Dashboard

What Is Shipped

The current implementation covers runtime context optimization, managed collections, connector ingestion, canonical block governance, quality evaluation, retrieval analytics, and dashboard visibility.

Exact and semantic duplicate removal
Lexical or embedding relevance scoring with reranking
Freshness, conflict, risk, and compression controls
Raw-vs-optimized quality evaluation and retrieval analytics

Runtime Quickstart

Send already-retrieved chunks to POST /v1/context/optimize. The response returns retained chunks, dropped chunks, risk buckets, conflict details, and metrics.

{
  "dedupeMode": "semantic",
  "semanticThreshold": 0.72,
  "rankMode": "embedding",
  "query": "What is the refund window?",
  "freshnessMode": "metadata",
  "conflictMode": "scored",
  "compressionMode": "extractive",
  "scanRisk": true,
  "chunks": [
    {
      "id": "refunds.md#4",
      "content": "Refunds are available within 30 days.",
      "source": "help-center",
      "metadata": { "updatedAt": "2026-04-01T00:00:00.000Z" }
    }
  ]
}

Managed Collections

Collections are tenant-scoped containers for reusable context. The dashboard can create the first collection explicitly or auto-create a default collection when the first file upload or external connector source is submitted. Document ingestion stores deterministic blocks with content hashes, token estimates, source metadata, and collection counters.

Deleting a collection removes its sources, documents, raw blocks, canonical blocks, and canonical review events. Connector credentials are tenant-level records and remain available for other collections.

Raw document object storage is optional. When the gateway runs withDOCUMENT_STORAGE_DRIVER=r2, Provara writes raw document text to Cloudflare R2 before committing searchable rows.

Connectors

Connector sources start as tenant-scoped records and sync into the same document and block pipeline as manual ingestion. Failed syncs persist status and last error so operators can diagnose credential, storage, rate-limit, or source issues.

Connector	Configuration	Sync Behavior
File upload	Text, Markdown, JSON, CSV, and other text files up to 500,000 UTF-8 bytes.	Creates a file_upload source and syncs it into stored documents and blocks.
GitHub	Owner, repository, branch, path, extensions, file count, file size, and optional token credential.	Fetches selected tree/blob content and skips files whose blob SHA already synced.
S3	Bucket, region, prefix, extensions, file count, file size, and encrypted AWS credential.	Uses SigV4 ListObjectsV2/GetObject and skips objects whose ETag already synced.
Confluence	Base URL, space key, labels, title filter, page count, page size, and encrypted API token.	Uses Confluence Cloud content search and skips pages whose version already synced.

Credential Requirements

Connector credentials are encrypted tenant-scoped secrets. Credential creation requires PROVARA_MASTER_KEY on the gateway. API and dashboard responses return credential metadata and hasSecret, never raw secret values.

GitHub sources can use an optional encrypted GitHub token credential.
S3 sources require an encrypted AWS access-key credential.
Confluence sources require an encrypted email/API-token credential.

Canonical Governance

Stored blocks can be distilled into canonical context blocks. Canonical blocks start as drafts, preserve source block and document IDs, and export only after review approval.

Policy checks run active Guardrails rules against canonical content before approval. Blocking or quarantine decisions persist evidence and prevent approval until the block is fixed or rejected.

Dashboard Workflow

The dashboard at /dashboard/context shows optimizer metrics, quality and retrieval analytics, managed collections, connector credentials, connector source creation, source sync status, canonical review, and policy-check actions.

For a fresh tenant, file upload and connector source creation can auto-create a default managed collection once the required source fields are filled. The collection table includes a delete action for removing a collection and its associated context.

API Surface

The generated API reference is the source of truth for request and response schemas. The human guide groups the most important endpoints below.

Runtime and visibility

POST /v1/context/optimize
GET /v1/context/summary
GET /v1/context/events
POST /v1/context/evaluate
GET /v1/context/quality/summary
GET /v1/context/retrieval/summary

Collections and connectors

GET /v1/context/collections
POST /v1/context/collections
DELETE /v1/context/collections/{id}
POST /v1/context/collections/{id}/documents
GET /v1/context/collections/{id}/sources
POST /v1/context/collections/{id}/sources
POST /v1/context/sources/{id}/sync

Governance

POST /v1/context/collections/{id}/distill
GET /v1/context/collections/{id}/canonical-blocks
POST /v1/context/canonical-blocks/{id}/policy-check
PATCH /v1/context/canonical-blocks/{id}/review
GET /v1/context/collections/{id}/export

What Comes Later

The shipped scope does not yet include Google Drive, SharePoint, Notion, Zendesk, Intercom, permission-aware connectors, managed vector export, retrieval A/B tests, or a full context policy engine. Those remain roadmap layers.