Documentation
Context Optimizer
Context Optimizer is Provara's optimization and governance layer for retrieved context in RAG and agentic systems. It reduces duplicate, stale, risky, and low-value context before model routing, then records the savings and quality signals for operators.
What Is Shipped
The current implementation covers runtime context optimization, managed collections, connector ingestion, canonical block governance, quality evaluation, retrieval analytics, and dashboard visibility.
- Exact and semantic duplicate removal
- Lexical or embedding relevance scoring with reranking
- Freshness, conflict, risk, and compression controls
- Raw-vs-optimized quality evaluation and retrieval analytics
Runtime Quickstart
Send already-retrieved chunks to POST /v1/context/optimize. The response returns retained chunks, dropped chunks, risk buckets, conflict details, and metrics.
{
"dedupeMode": "semantic",
"semanticThreshold": 0.72,
"rankMode": "embedding",
"query": "What is the refund window?",
"freshnessMode": "metadata",
"conflictMode": "scored",
"compressionMode": "extractive",
"scanRisk": true,
"chunks": [
{
"id": "refunds.md#4",
"content": "Refunds are available within 30 days.",
"source": "help-center",
"metadata": { "updatedAt": "2026-04-01T00:00:00.000Z" }
}
]
}Managed Collections
Collections are tenant-scoped containers for reusable context. The dashboard can create the first collection explicitly or auto-create a default collection when the first file upload or external connector source is submitted. Document ingestion stores deterministic blocks with content hashes, token estimates, source metadata, and collection counters.
Deleting a collection removes its sources, documents, raw blocks, canonical blocks, and canonical review events. Connector credentials are tenant-level records and remain available for other collections.
Raw document object storage is optional. When the gateway runs withDOCUMENT_STORAGE_DRIVER=r2, Provara writes raw document text to Cloudflare R2 before committing searchable rows.
Connectors
Connector sources start as tenant-scoped records and sync into the same document and block pipeline as manual ingestion. Failed syncs persist status and last error so operators can diagnose credential, storage, rate-limit, or source issues.
| Connector | Configuration | Sync Behavior |
|---|---|---|
| File upload | Text, Markdown, JSON, CSV, and other text files up to 500,000 UTF-8 bytes. | Creates a file_upload source and syncs it into stored documents and blocks. |
| GitHub | Owner, repository, branch, path, extensions, file count, file size, and optional token credential. | Fetches selected tree/blob content and skips files whose blob SHA already synced. |
| S3 | Bucket, region, prefix, extensions, file count, file size, and encrypted AWS credential. | Uses SigV4 ListObjectsV2/GetObject and skips objects whose ETag already synced. |
| Confluence | Base URL, space key, labels, title filter, page count, page size, and encrypted API token. | Uses Confluence Cloud content search and skips pages whose version already synced. |
Credential Requirements
Connector credentials are encrypted tenant-scoped secrets. Credential creation requires PROVARA_MASTER_KEY on the gateway. API and dashboard responses return credential metadata and hasSecret, never raw secret values.
- GitHub sources can use an optional encrypted GitHub token credential.
- S3 sources require an encrypted AWS access-key credential.
- Confluence sources require an encrypted email/API-token credential.
Canonical Governance
Stored blocks can be distilled into canonical context blocks. Canonical blocks start as drafts, preserve source block and document IDs, and export only after review approval.
Policy checks run active Guardrails rules against canonical content before approval. Blocking or quarantine decisions persist evidence and prevent approval until the block is fixed or rejected.
Dashboard Workflow
The dashboard at /dashboard/context shows optimizer metrics, quality and retrieval analytics, managed collections, connector credentials, connector source creation, source sync status, canonical review, and policy-check actions.
For a fresh tenant, file upload and connector source creation can auto-create a default managed collection once the required source fields are filled. The collection table includes a delete action for removing a collection and its associated context.
API Surface
The generated API reference is the source of truth for request and response schemas. The human guide groups the most important endpoints below.
Runtime and visibility
POST /v1/context/optimizeGET /v1/context/summaryGET /v1/context/eventsPOST /v1/context/evaluateGET /v1/context/quality/summaryGET /v1/context/retrieval/summary
Collections and connectors
GET /v1/context/collectionsPOST /v1/context/collectionsDELETE /v1/context/collections/{id}POST /v1/context/collections/{id}/documentsGET /v1/context/collections/{id}/sourcesPOST /v1/context/collections/{id}/sourcesPOST /v1/context/sources/{id}/sync
Governance
POST /v1/context/collections/{id}/distillGET /v1/context/collections/{id}/canonical-blocksPOST /v1/context/canonical-blocks/{id}/policy-checkPATCH /v1/context/canonical-blocks/{id}/reviewGET /v1/context/collections/{id}/export
What Comes Later
The shipped scope does not yet include Google Drive, SharePoint, Notion, Zendesk, Intercom, permission-aware connectors, managed vector export, retrieval A/B tests, or a full context policy engine. Those remain roadmap layers.