Analytics & Healing
ContextView includes an analytics and self-healing subsystem that monitors service health, detects error patterns, and can automatically attempt recovery actions.
Analytics Agent
The AnalyticsAgent analyzes system metrics across three dimensions:
Trend Analysis
| Metric | What’s Tracked |
|---|---|
| Request Rate | Volume changes, traffic spikes, drop-offs |
| Error Rate | Error frequency trends, new error types |
| Latency | P50/P95/P99 latency distributions, regression detection |
Anomaly Detection
The analytics agent detects anomalies by comparing current metrics against historical baselines:
- Sudden error spikes — new error types or rate increases > 2σ
- Latency regressions — P95 latency exceeding historical 95th percentile
- Traffic anomalies — unusual request patterns (load spikes, drop-offs)
Error Detection
The ErrorDetector classifies errors from trace data:
from contextunity.view.analytics.error_detector import ErrorDetector
detector = ErrorDetector()errors = detector.detect(traces=recent_traces)
for error in errors: print(f"[{error.severity}] {error.service}: {error.pattern}") print(f" Occurrences: {error.count}") print(f" First seen: {error.first_seen}")Service Healing
The ServiceHealer attempts automated recovery when errors are detected:
from contextunity.view.healing.service_healer import ServiceHealer
healer = ServiceHealer(admin_client)result = await healer.heal( service_endpoint="router:50052", error_type="UNAVAILABLE",)# result = { "actions": ["restart_attempted"], "success": True }Healing Actions
| Error Type | Recovery Action |
|---|---|
UNAVAILABLE | Service restart, connection pool refresh |
DEADLINE_EXCEEDED | Timeout adjustment, load balancing |
RESOURCE_EXHAUSTED | Cache eviction, memory pressure relief |
INTERNAL | Log collection, diagnostic trace creation |
Code Fixer
The CodeFixer module analyzes recurring error patterns and suggests code-level fixes:
- Identifies common anti-patterns in graph configurations
- Suggests timeout and retry parameter adjustments
- Recommends model fallback chain modifications
gRPC RPCs
All analytics and healing operations are exposed via the AdminService gRPC:
| RPC | Permission | Description |
|---|---|---|
GetSystemAnalytics | admin:read | System-wide analytics |
GetErrorAnalytics | admin:read | Error pattern analysis |
DetectSystemErrors | admin:write | Run error detection scan |
TriggerSelfHealing | admin:write | Initiate automated healing |
GetHealingStatus | admin:read | Check healing operation status |
Integration with Dashboard
Analytics data powers the Dashboard overview charts:
- KPI cards — real-time counts and rates
- Time-series charts — Chart.js graphs of request/error trends
- Alert badges — anomaly indicators on service health panel