ContextUnity RAG Extension

ContextUnity RAG Extension is a specialized extension that provides an agnostic Retrieval-Augmented Generation (RAG) API and Chat Gateway for the ContextUnity ecosystem. Rather than just passing traffic, it acts as a robust state and interface layer — handling session persistence, citations distribution, frontend rendering, and multi-channel chat integrations (Web, Telegram, MCP).

Session Persistence Layer

Maintains user and chat message states locally via SQLite or PostgreSQL to isolate session metadata from core ContextUnity services.

Nuxt 3 AG-UI Frontend

A ready-to-use modern chat UI that supports markdown rendering, citations, search suggestions, and SSE streaming.

Agnostic Chat Gateway

HTTP + SSE API providing generic chat completions routing requests to the ContextRouter.

Telegram Bot Hub

A native webhook-based Telegram connector (aiogram) that streams ContextRouter outputs directly to Telegram users.

Model Context Protocol

FastMCP implementation making ContextUnity RAG Extension capabilities consumable by external agent systems.

1. Architectural Role & Execution Flow

ContextUnity RAG Extension acts as a Frontend Gateway and State Relayer. It enforces a strict separation of concerns from the semantic ML layers (taxonomies, vector chunking, graph search), which are exclusively handled by ContextRouter and ContextBrain.

graph TD
    Client[Web UI / Telegram] -->|POST /agui| RAG[ContextUnity RAG Extension API]
    RAG -->|RouterClient.stream_agent| Router[ContextRouter]
    Router -->|BrainClient.query| Brain[(ContextBrain DB)]
    Router -.->|SSE Events| RAG
    RAG -.->|Event Stream| Client
    RAG -->|save_message| SessionDB[(SQLite/PG Session Store)]

The `/agui` Streaming Endpoint

When a client application initiates a chat via POST /agui:

It passes messages and a threadId (which acts as the session_id).
The endpoint invokes ContextRouter via the async Python SDK (RouterClient().stream_agent).
It transparently streams Server-Sent Events (SSE) back to the client.
Interception: When ContextRouter emits the final result event, ContextUnity RAG Extension intercepts the answer, citations, and searchSuggestions, and persists them into the local session database under the assistant system role.

The `/sessions` Subsystem

Because SSE streaming pushes metadata incrementally, reloading the UI requires historically accurate transcripts. ContextUnity RAG Extension solves this by providing:

GET /sessions/{session_id}: Returns the clean chat transcript. To conserve UI payload size, complex metadata (like citations) is intentionally stripped.
GET /sessions/{session_id}/citations: Resolves the specific assistant messageId entries and provides their corresponding citations, intent, and searchSuggestions so the UI can rehydrate source annotations non-blockingly.

2. Core Components

Backend API (`/api`)

The core FastAPI framework built strictly with Pydantic V2.

Employs ContextUnity Middleware for shared Request Context and Tracing (RequestContextMiddleware).
Enforces strict Route Isolation. All HTTP state interactions (e.g., changing session names, truncating message trees) hit the abstract ctx.session_store interface, ensuring raw database operations never bleed into endpoint code.

Telegram Webhook Bot (`/bot`)

Contains an independent entrypoint for Telegram interactions. It parses Telegram objects, translates them into the universal RAG payload (with corresponding threadId), and connects to the exact same stream_agent flow used by the Web UI. Responses are continuously formatted from SSE frames into Telegram message edits.

Nuxt 3 Chat Frontend (`/ui`)

An agnostic frontend application. Rather than locking into a rigid system, the UI sends arbitrary content_configs to ContextUnity RAG Extension, which are swallowed by ContextRouter to dictate underlying agent behaviors without modifying the RAG gateway code.

MCP Server (`/mcp`)

Exposes FastMCP tools, establishing ContextUnity RAG Extension as an agnostic knowledge conduit for desktop assistant orchestrators like Claude Desktop or Cursor.

3. Running ContextUnity RAG Extension

The package relies on uv and mise for reproducible, isolated process management. Docker Compose is deliberately avoided for local dev cycles.

Sync dependencies:
Terminal window
```
mise run api-sync
```
Start the API Server:
Spins up the FastAPI server on PORT 7200. ContextRouter bootstrapping is executed securely within the app lifespan.
Terminal window
```
mise run api-dev
```
Start the Telegram Bot:
Terminal window
```
mise run bot-dev
```
Start the UI:
Terminal window
```
mise run ui-install
mise run ui-dev
```

4. Environment variables

ContextUnity RAG Extension uses .env variables mapped into Pydantic Settings.

Variable	Description	Default
`DB_TYPE`	Type of DB for session persistence (`sqlite`, `postgres`)	`sqlite`
`POSTGRES_DSN`	Connection string if using Postgres	`<empty>`
`TELEGRAM_BOT_TOKEN`	Token for the Telegram webhook integration	`<empty>`

Note: System credentials for routing external LLM requests are managed strictly by ContextRouter’s project.yaml manifests. ContextUnity RAG Extension only requires data-persistence environments.

5. Integrating with Projects

You can use the ContextUnity RAG Extension within your project to handle chat streaming, citations, and metadata. Because ContextUnity follows a strict Service Mesh paradigm, your application does not spin up the Router or Brain directly. Instead, your project declares its capabilities (such as the retrieval_augmented template) and its API keys inside the standard contextunity.project.yaml manifest. ContextUnity RAG Extension simply acts as the conversational bridge.

Step 1: The Project Manifest

Add a manifest to your project to configure exactly how your downstream retrieval_augmented graph operates.

apiVersion: "contextunity/v1alpha3"
kind: "ContextUnityProject"

project:
  id: "my-rag-project"
  name: "My Internal RAG Portal"
  tenant: "my_company"

# Declare requirements for ContextUnity bootstrapping
services:
  router: { enabled: true }
  brain: { enabled: true }
  shield: { enabled: true }

router:
  graph:
    id: "my-rag-project"
    mode: "template"
    template: "retrieval_augmented" # The canonical ContextUnity RAG template

    config:
      max_retries: 2
      knowledge_domains: ["internal_docs", "hr_policy"]
      model: "openai/gpt-4o"

    # Explicit node overrides for specific tasks within the RAG pipeline
    overrides:
      - name: "detect_intent"
        config:
          model: "openai/gpt-4o-mini"
          model_secret_ref: "OPENAI_API_KEY"

  policy:
    tracing_enabled: true

secrets:
  - owner: "contextunity"
    resolver: "env"
    keys:
      - "OPENAI_API_KEY"
      - "POSTGRES_DSN"

Step 2: Client Integration (AG-UI)

With your project communicating with ContextRouter, your UI simply points to the ContextUnity RAG Extension API via the /agui streaming endpoint. ContextUnity RAG Extension handles all stateless streaming, and automatically attaches the conversation to your defined retrieval_augmented graph.

const response = await fetch('http://localhost:7200/agui', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    messages: [
      { role: "user", content: "What is ContextUnity?" }
    ],
    threadId: "session-123",
    content_configs: {
      "language": "ukrainian",
      "instruction": "Reply like a pirate."
    }
  })
});