Skip to content

Analytics & Healing

ContextView includes an analytics and self-healing subsystem that monitors service health, detects error patterns, and can automatically attempt recovery actions.

Analytics Agent

The AnalyticsAgent analyzes system metrics across three dimensions:

Trend Analysis

MetricWhat’s Tracked
Request RateVolume changes, traffic spikes, drop-offs
Error RateError frequency trends, new error types
LatencyP50/P95/P99 latency distributions, regression detection

Anomaly Detection

The analytics agent detects anomalies by comparing current metrics against historical baselines:

  • Sudden error spikes — new error types or rate increases > 2σ
  • Latency regressions — P95 latency exceeding historical 95th percentile
  • Traffic anomalies — unusual request patterns (load spikes, drop-offs)

Error Detection

The ErrorDetector classifies errors from trace data:

from contextunity.view.analytics.error_detector import ErrorDetector
detector = ErrorDetector()
errors = detector.detect(traces=recent_traces)
for error in errors:
print(f"[{error.severity}] {error.service}: {error.pattern}")
print(f" Occurrences: {error.count}")
print(f" First seen: {error.first_seen}")

Service Healing

The ServiceHealer attempts automated recovery when errors are detected:

from contextunity.view.healing.service_healer import ServiceHealer
healer = ServiceHealer(admin_client)
result = await healer.heal(
service_endpoint="router:50052",
error_type="UNAVAILABLE",
)
# result = { "actions": ["restart_attempted"], "success": True }

Healing Actions

Error TypeRecovery Action
UNAVAILABLEService restart, connection pool refresh
DEADLINE_EXCEEDEDTimeout adjustment, load balancing
RESOURCE_EXHAUSTEDCache eviction, memory pressure relief
INTERNALLog collection, diagnostic trace creation

Code Fixer

The CodeFixer module analyzes recurring error patterns and suggests code-level fixes:

  • Identifies common anti-patterns in graph configurations
  • Suggests timeout and retry parameter adjustments
  • Recommends model fallback chain modifications

gRPC RPCs

All analytics and healing operations are exposed via the AdminService gRPC:

RPCPermissionDescription
GetSystemAnalyticsadmin:readSystem-wide analytics
GetErrorAnalyticsadmin:readError pattern analysis
DetectSystemErrorsadmin:writeRun error detection scan
TriggerSelfHealingadmin:writeInitiate automated healing
GetHealingStatusadmin:readCheck healing operation status

Integration with Dashboard

Analytics data powers the Dashboard overview charts:

  • KPI cards — real-time counts and rates
  • Time-series charts — Chart.js graphs of request/error trends
  • Alert badges — anomaly indicators on service health panel