DocsOutages and Recovery

Outages And Recovery

What Happens If Kvasyr Is Down

While the process is down:

  • No new chain indexing ticks run.
  • No webhook deliveries are attempted.
  • No backfill jobs are processed.

After restart:

  • Indexing resumes from the last confirmed point for each chain.
  • Kvasyr catches up to current finalized chain state, then returns to normal near-real-time delivery.
  • Finality still applies, using chain finality tags (finalized/safe) when supported, with depth fallback.

What Catch-Up Means For Webhooks

During catch-up, webhook traffic can temporarily spike because delayed finalized events are delivered quickly.

  • Delivery semantics remain at-least-once.
  • Event arrival order can vary during catch-up and retries.
  • Your webhook handler should be idempotent.

If A Client Webhook Is Unreachable

Success condition:

  • Only HTTP 2xx marks a delivery as delivered.

Failure behavior (4xx, 5xx, timeout, connect errors):

  • Kvasyr retries automatically with exponential backoff.
  • Retries stop after the configured maximum attempts.

If endpoint becomes reachable later:

  • If max attempts has not been reached, a later retry can still succeed.
  • If max attempts has been reached, an admin can manually retry and, when needed, reset attempt counters for a clean re-drive.

Tracking Catch-Up Bursts

Use event-id deduplication and track unique-vs-duplicate traffic during recovery windows.

// Pseudocode for an idempotent webhook handler
const events = Array.isArray(req.body?.events) ? req.body.events : [];
if (events.length === 0) return res.status(400).send("missing events");
 
for (const event of events) {
  const eventId = event?.id;
  if (!eventId) continue;
  if (await seenBefore(eventId)) continue;
  await markSeen(eventId, { ttlSeconds: 7 * 24 * 60 * 60 });
  await processBusinessLogic(event);
}
return res.status(200).send("ok");

Operationally, it helps to chart:

  • Incoming webhook rate per minute.
  • Unique event IDs per minute.
  • Duplicate ratio (duplicates / total).

Client Footguns Checklist

  • Deduplicate by event identity (payload.events[i].id).
  • Treat delivery as at-least-once, not exactly-once.
  • Do not assume strict ordering.
  • Accept delayed deliveries after outages and process by event content, not arrival time.
  • Verify signatures and timestamp freshness on every request.
  • Monitor delivery failures and retry backlog so you can respond before events age out.