Model Registry & Providers
ContextRouter never instantiates LLM provider SDKs directly. All AI generation is routed through the unified Model Registry.
Model Keys
Models are selected by registry key: "<provider>/<name>"
| Provider | Key Pattern | Modalities |
|---|---|---|
| Vertex AI | vertex/gemini-2.5-flash | Text + Image + Audio + Video |
| OpenAI | openai/gpt-5-mini | Text + Image + Audio (ASR) |
| Anthropic | anthropic/claude-sonnet-4 | Text + Image |
| Groq | groq/llama-3.3-70b-versatile | Text + Image + Audio (ASR) |
| Perplexity | perplexity/sonar | Text |
| OpenRouter | openrouter/openai/gpt-5.1 | Text + Image |
| RLM | rlm/gpt-5-mini | Text |
| Local (Ollama) | local/llama3.1 | Text + Image |
| Local (vLLM) | local-vllm/meta-llama/Llama-3.1-8B-Instruct | Text + Image |
| RunPod | runpod/custom-model | Text + Image |
| HuggingFace | hf/distilgpt2 | Task-dependent |
| HF Hub | hf-hub/model-name | Task-dependent |
| OpenAI Batch | openai-batch/gpt-5-mini | Text |
Multimodal Interface
All providers share a unified multimodal contract:
from contextunity.router.modules.models.types import ModelRequest, TextPart, ImagePart
# Text-only requestrequest = ModelRequest( parts=[TextPart(text="Hello, world!")], system="You are a helpful assistant", temperature=0.7,)
# Multimodal request (text + image)request = ModelRequest( parts=[ TextPart(text="What's in this image?"), ImagePart(mime="image/jpeg", data_b64="...", uri="https://example.com/image.jpg"), ])Fallback System
Strategies
| Strategy | Behavior | Streaming |
|---|---|---|
fallback (sequential) | Try candidates in order | Same — never switch mid-stream |
parallel | Run all concurrently, return first success | Falls back to sequential |
cost-priority | Same as fallback; order cheapest → most expensive | Sequential |
Error Handling
# Quota exhaustion → immediate fallback (no retries)except ModelQuotaExhaustedError: continue
# Rate limiting → fallback with delayexcept ModelRateLimitError: continueProject vs Global Fallback
- Project: Specifies
fallback_keysincontextunity.project.yamlmanifest — per-node control - Global:
CU_ROUTER_ALLOW_GLOBAL_FALLBACK=true+CU_ROUTER_FALLBACK_LLMS— safety net
If a node exhausts its fallback_keys and no global fallback is configured, the request fails gracefully.
Reasoning Models (gpt-5, o1, o3)
Reasoning models require special handling:
# Use max_completion_tokens, not max_tokens# Include extra budget for chain-of-thought reasoningif is_reasoning_model: bind_kwargs["max_completion_tokens"] = 8000 # 4k reasoning + 4k response # Temperature must be 1 for reasoning model APIRLM (Recursive Language Models)
RLM wraps any base LLM with recursive REPL capabilities. For processing massive contexts (50k+ items) where standard LLMs experience context degradation.
Reference: arXiv:2512.24601 | GitHub
Key Benefits:
- GPT-5-mini with RLM outperforms GPT-5 on long-context tasks
- Context stored as Python variable, not in prompt
- Model can
grep,filter,iterate, and recursively analyze - 60-70% cost reduction for bulk processing
model = model_registry.create_llm( "rlm/gpt-5-mini", config=config, environment="docker", # Isolated execution (recommended))| Environment | Use Case | Safety |
|---|---|---|
local | Development | Low (same process) |
docker | Production | High (isolated container) |
modal | Cloud scaling | High |
Local Models
vLLM (OpenAI-compatible)
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.1-8B-Instruct --port 8000Set LOCAL_VLLM_BASE_URL=http://localhost:8000/v1. Use local-vllm/meta-llama/Llama-3.1-8B-Instruct.
Ollama
ollama serve && ollama pull llama3.1Set LOCAL_OLLAMA_BASE_URL=http://localhost:11434/v1. Use local/llama3.1.
API Key Resolution
Router resolves keys via two-tier fallback:
| Priority | Source | Path |
|---|---|---|
| 1 | Shield | {tenant}/api_keys/{provider} |
| 2 | Router env | OPENAI_API_KEY, etc. |