DataVibe
AI SafetyDocsBook a DemoLogin

© 2026 DataVibe. Built for fintech analytics, ML, and data operations.

HomeDocsDemoLogin
Docs/SLIs, SLOs & Review Workflows

Last updated · April 2026

·
  • Production-ready APIs
  • SOC 2 Type I - In Progress

SLIs, SLOs & Review Workflows

Definitions, targets, and operational procedures for the DataVibe AI interception platform. Operators and reviewers should read this before handling the approval queue in a production environment.

Service Level Indicators (SLIs)

SLIs are the raw measurements that power the SLO calculations. DataVibe emits each SLI continuously from the slo-watchdog cron (runs every 15 minutes) and surfaces them on the system health page.

SLIWhat we measureUnitSource
Review latency p50Median time from submission to reviewer decisionminutesgate_submissions.reviewed_at − created_at
Review latency p9595th-percentile review time — worst-of-normal latencyminutesgate_submissions (percentile)
SLA compliance rate% of reviewed submissions resolved before breach threshold%gate_submissions.sla_breached_at IS NULL
SLA breach countSubmissions that exceeded the breach thresholdcountgate_submissions.sla_breached_at IS NOT NULL
Escalation rate% of reviews escalated (approaching SLA + autoEscalate=true)%gate_submissions.escalated_at IS NOT NULL
Block rate% of submissions blocked by policy (hard BLOCK rule)%gate_submissions.status = BLOCKED
Queue depthTotal QUEUED submissions at time of measurementcountgate_submissions.status = QUEUED
Oldest queue ageAge of the oldest QUEUED submissionsecondsNOW() − MIN(created_at) WHERE status = QUEUED
False-positive proxy% of WARN-only submissions that were ultimately approved%approved / queued WHERE no BLOCK violations
Audit chain integrityWhether the tamper-evident hash chain is unbrokenbooleanGET /api/audit/verify-chain

Service Level Objectives (SLOs)

Default SLO targets are listed below. Workspaces can override the warn and breach thresholds via Settings → SLA Configuration or via PUT /api/workspaces/:slug/sla.

SLODefault targetConfigurable?Alert when
Review latency — warn≤ 10 min p50Yes (warnMinutes)Any submission exceeds this in the QUEUED state
Review latency — breach≤ 30 min p95Yes (breachMinutes)Any submission exceeds this → sla_breached_at set
SLA compliance rate (7d)≥ 90%No (improvement target)Drops below 90% on the /sla dashboard
Queue depth< 100 QUEUEDVia QUEUE_BACKUP_THRESHOLD envThreshold crossed → banner in queue UI
Audit chain integrity100% (no breaks)NoGET /api/audit/verify-chain returns ok: false
DLQ depth0 queued itemsNo> 10 items → slo.dlq.depth_high system event
Core API availability99.5% / 30dNoNon-200 health response → slo.core_api.unhealthy event

Review workflow

Every AI-generated action that triggers a WARN or BLOCK policy rule is routed to the human approval queue before dispatch. The workflow is:

  1. Submission queued. The edge gateway (or Core API) writes a gate_submissions row with status = QUEUED and notifies reviewers via SSE pubsub + optional Slack/Teams channel.
  2. SLA clock starts. The slo-watchdog cron tracks age against the workspace SLA config. At warn threshold: yellow badge. At breach threshold: sla_breached_at is set, a system event is emitted, and an optional escalation notification is sent.
  3. Reviewer action. A REVIEWER, ADMIN, or OWNER opens the queue item, reads the policy violations, and approves or rejects with an optional note.
  4. Dispatch or closure. Approval fires the downstream dispatch (email provider, webhook, etc.) via the reliability layer. Rejection closes the item with no outbound action.
  5. Audit record. Every decision is appended to the tamper-evident audit chain with reviewed_by, reviewed_at, and the decision. Verifiable at GET /api/audit/verify-chain.

Assigning SLAs and escalation targets

For high-stakes submissions, reviewers can set a hard deadline and an escalation contact:

# Set a 2-hour SLA on a specific submission with a named escalation contact
PATCH /api/gate/{submissionId}/sla
{
  "dueAt": "2026-05-14T18:00:00.000Z",
  "escalateTo": "user_compliance_lead",
  "note": "Finance team deal — must clear before 6pm."
}

# Configure workspace-wide default SLA
PUT /api/workspaces/{slug}/sla
{
  "warnMinutes": 10,
  "breachMinutes": 30,
  "autoEscalate": true,
  "escalateTo": null    // null = broadcast to all ADMINs
}

On-call runbook

When the slo-watchdog emits a slo.gate_queue.sla_breach system event:

  1. Go to /queue and sort by oldest first. Identify submissions with a red "SLA BREACH" badge.
  2. Triage: does the violation require legal review, or is it a clear pass? Reject with a note or approve.
  3. If queue depth > backup threshold: page the designated on-call reviewer (configured in workspace SLA → escalateTo). Do not batch-approve to clear the queue — each submission needs individual review.
  4. If the Core API is down (slo.core_api.unreachable): new submissions are blocked at the edge gateway. Existing QUEUED items can still be reviewed and approved from the dashboard. Check Render dashboard for core-api status.
  5. After resolution: verify audit chain integrity at /audit → "Verify chain". Ensure ok: true.

Audit chain break runbook

  1. Call GET /api/audit/verify-chain?workspace=<slug>. The response includes firstBreakAt (the row ID and timestamp where the chain diverges).
  2. Identify the break: was it a manual DB edit, a failed write, or a race condition? Check systemEvent rows for kind = audit_log_write_failed near the break timestamp.
  3. Do NOT delete or modify the broken rows — this is an immutable log. Open an incident ticket and escalate to engineering.
  4. Communicate to affected workspace admins that the integrity break has been detected and is under investigation.

Reviewer efficiency signals

The SLA performance dashboard shows per-reviewer metrics. Two signals warrant follow-up:

  • Rejection rate = 0% with > 10 decisions (rubber-stamping). A reviewer approving everything without rejections may not be reading submissions carefully. Review their queue history and consider spot-checking approved items.
  • Rejection rate > 80% (over-blocking). The reviewer may have a misaligned understanding of policy intent. Schedule a calibration session with the policy owner.
  • p95 decision time > 2× breach threshold. The reviewer is a bottleneck. Consider adding a second reviewer to the workspace or adjusting the escalation path.
View SLA dashboardOpen approval queueAudit logDeployment modes

Gate your first AI submission in under 60 seconds →

Sign up, generate an API key, and POST one message to the gate. It lands in your approval queue instantly.

Get started freeView quickstart