Outages And Recovery

What Happens If Kvasyr Is Down

While the process is down:

No new chain indexing ticks run.
No webhook deliveries are attempted.
No backfill jobs are processed.

After restart:

Indexing resumes from the last confirmed point for each chain.
Kvasyr catches up to current finalized chain state, then returns to normal near-real-time delivery.
Finality still applies, so newest head-block events wait until they cross chain finality_depth.

What Catch-Up Means For Webhooks

During catch-up, webhook traffic can temporarily spike because delayed finalized events are delivered quickly.

Delivery semantics remain at-least-once.
Event arrival order can vary during catch-up and retries.
Your webhook handler should be idempotent.

If A Client Webhook Is Unreachable

Success condition:

Only HTTP 2xx marks a delivery as delivered.

Failure behavior (4xx, 5xx, timeout, connect errors):

Kvasyr retries automatically with exponential backoff.
Retries stop after the configured maximum attempts.

If endpoint becomes reachable later:

If max attempts has not been reached, a later retry can still succeed.
If max attempts has been reached, an admin can manually retry and, when needed, reset attempt counters for a clean re-drive.

Tracking Catch-Up Bursts

Use event-id deduplication and track unique-vs-duplicate traffic during recovery windows.

// Pseudocode for an idempotent webhook handler
const eventId = req.header("x-kvasyr-event-id") ?? req.body.id;
if (!eventId) return res.status(400).send("missing event id");
 
if (await seenBefore(eventId)) {
  return res.status(200).send("duplicate ignored");
}
 
await markSeen(eventId, { ttlSeconds: 7 * 24 * 60 * 60 });
await processBusinessLogic(req.body);
return res.status(200).send("ok");

Operationally, it helps to chart:

Incoming webhook rate per minute.
Unique event IDs per minute.
Duplicate ratio (duplicates / total).

Client Footguns Checklist

Deduplicate by event identity (payload.id or X-Kvasyr-Event-Id).
Treat delivery as at-least-once, not exactly-once.
Do not assume strict ordering.
Accept delayed deliveries after outages and process by event content, not arrival time.
Verify signatures and timestamp freshness on every request.
Monitor delivery failures and retry backlog so you can respond before events age out.

Configuration Overview