SLIs, SLOs & Review Workflows

Definitions, targets, and operational procedures for the DataVibe AI interception platform. Operators and reviewers should read this before handling the approval queue in a production environment.

Service Level Indicators (SLIs)

SLIs are the raw measurements that power the SLO calculations. DataVibe emits each SLI continuously from the slo-watchdog cron (runs every 15 minutes) and surfaces them on the system health page.

SLI	What we measure	Unit	Source
Review latency p50	Median time from submission to reviewer decision	minutes	gate_submissions.reviewed_at − created_at
Review latency p95	95th-percentile review time: worst-of-normal latency	minutes	gate_submissions (percentile)
SLA compliance rate	% of reviewed submissions resolved before breach threshold	%	gate_submissions.sla_breached_at IS NULL
SLA breach count	Submissions that exceeded the breach threshold	count	gate_submissions.sla_breached_at IS NOT NULL
Escalation rate	% of reviews escalated (approaching SLA + autoEscalate=true)	%	gate_submissions.escalated_at IS NOT NULL
Block rate	% of submissions blocked by policy (hard BLOCK rule)	%	gate_submissions.status = BLOCKED
Queue depth	Total QUEUED submissions at time of measurement	count	gate_submissions.status = QUEUED
Oldest queue age	Age of the oldest QUEUED submission	seconds	NOW() − MIN(created_at) WHERE status = QUEUED
False-positive proxy	% of WARN-only submissions that were ultimately approved	%	approved / queued WHERE no BLOCK violations
Audit chain integrity	Whether the tamper-evident hash chain is unbroken	boolean	GET /api/audit/verify-chain

Service Level Objectives (SLOs)

Default SLO targets are listed below. Workspaces can override the warn and breach thresholds via Settings → SLA Configuration or via PUT /api/workspaces/:slug/sla.

SLO	Default target	Configurable?	Alert when
Review latency: warn	≤ 10 min p50	Yes (warnMinutes)	Any submission exceeds this in the QUEUED state
Review latency: breach	≤ 30 min p95	Yes (breachMinutes)	Any submission exceeds this → sla_breached_at set
SLA compliance rate (7d)	≥ 90%	No (improvement target)	Drops below 90% on the /sla dashboard
Queue depth	< 100 QUEUED	Via QUEUE_BACKUP_THRESHOLD env	Threshold crossed → banner in queue UI
Audit chain integrity	100% (no breaks)	No	GET /api/audit/verify-chain returns ok: false
DLQ depth	0 queued items	No	> 10 items → slo.dlq.depth_high system event
Core API availability	99.5% / 30d	No	Non-200 health response → slo.core_api.unhealthy event

Review workflow

Every AI-generated action that triggers a WARN or BLOCK policy rule is routed to the human approval queue before dispatch. The workflow is:

Submission queued. The edge gateway (or Core API) writes a gate_submissions row with status = QUEUED and notifies reviewers via SSE pubsub + optional Slack/Teams channel.
SLA clock starts. The slo-watchdog cron tracks age against the workspace SLA config. At warn threshold: yellow badge. At breach threshold: sla_breached_at is set, a system event is emitted, and an optional escalation notification is sent.
Reviewer action. A REVIEWER, ADMIN, or OWNER opens the queue item, reads the policy violations, and approves or rejects with an optional note.
Dispatch or closure. Approval fires the downstream dispatch (email provider, webhook, etc.) via the reliability layer. Rejection closes the item with no outbound action.
Audit record. Every decision is appended to the tamper-evident audit chain with reviewed_by, reviewed_at, and the decision. Verifiable at GET /api/audit/verify-chain.

Assigning SLAs and escalation targets

For high-stakes submissions, reviewers can set a hard deadline and an escalation contact:

# Set a 2-hour SLA on a specific submission with a named escalation contact
PATCH /api/gate/{submissionId}/sla
{
  "dueAt": "2026-05-14T18:00:00.000Z",
  "escalateTo": "user_compliance_lead",
  "note": "Finance team deal, must clear before 6pm."
}

# Configure workspace-wide default SLA
PUT /api/workspaces/{slug}/sla
{
  "warnMinutes": 10,
  "breachMinutes": 30,
  "autoEscalate": true,
  "escalateTo": null    // null = broadcast to all ADMINs
}

On-call runbook

When the slo-watchdog emits a slo.gate_queue.sla_breach system event:

Go to /queue and sort by oldest first. Identify submissions with a red "SLA BREACH" badge.
Triage: does the violation require legal review, or is it a clear pass? Reject with a note or approve.
If queue depth > backup threshold: page the designated on-call reviewer (configured in workspace SLA → escalateTo). Do not batch-approve to clear the queue, each submission needs individual review.
If the Core API is down (slo.core_api.unreachable): new submissions are blocked at the edge gateway. Existing QUEUED items can still be reviewed and approved from the dashboard. Check Render dashboard for core-api status.
After resolution: verify audit chain integrity at /audit → "Verify chain". Ensure ok: true.

Audit chain break runbook

Call GET /api/audit/verify-chain?workspace=<slug>. The response includes firstBreakAt (the row ID and timestamp where the chain diverges).
Identify the break: was it a manual DB edit, a failed write, or a race condition? Check systemEvent rows for kind = audit_log_write_failed near the break timestamp.
Do NOT delete or modify the broken rows, this is an immutable log. Open an incident ticket and escalate to engineering.
Communicate to affected workspace admins that the integrity break has been detected and is under investigation.

Reviewer efficiency signals

The SLA performance dashboard shows per-reviewer metrics. Two signals warrant follow-up:

Rejection rate = 0% with > 10 decisions (rubber-stamping). A reviewer approving everything without rejections may not be reading submissions carefully. Review their queue history and consider spot-checking approved items.
Rejection rate > 80% (over-blocking). The reviewer may have a misaligned understanding of policy intent. Schedule a calibration session with the policy owner.
p95 decision time > 2× breach threshold. The reviewer is a bottleneck. Consider adding a second reviewer to the workspace or adjusting the escalation path.

View SLA dashboard Open approval queue Audit log Deployment modes

Last updated · April 2026

Production-ready APIs
SOC 2 Type I - In Progress