Launch special — let's split the check with SPLITCHECK for 50% off
3 min read

Webhook Retry Strategies: Exponential Backoff Done Right

How major providers retry failed webhooks — Stripe, GitHub, Shopify — and how to design retry logic for your own outbound webhooks.

ReliabilityWebhooksBackoff
O

Ozer

Developer & Founder of HookSense

Webhook delivery is best-effort. Networks drop packets, servers crash, ack messages get lost. The provider's job is to retry until your handler confirms success. The receiver's job is to handle those retries gracefully.

This guide covers how the major providers retry, how to design your own outbound retry logic, and the role of exponential backoff with jitter.

How major providers retry

ProviderRetry windowSchedule
Stripe3 days (live), 3 hours (test)Exponential — 5s, 5m, 30m, 2h, etc.
GitHub8 hoursUp to 5 retries with increasing delay
Shopify48 hours19 retries with exponential backoff
Slack1 hour3 retries: 1m, 5m, 30m
Twilio24 hours~3 retries depending on response

All of them retry on non-2xx responses and timeouts. Most treat 410 Gone as "stop retrying" — useful if you want a provider to stop sending to a removed endpoint.

Why exponential backoff

If a provider retried instantly after a failure, your server would get hammered the moment it recovered — and likely fail again from the load. Exponential backoff spaces retries out: 1 min → 5 min → 25 min → 2 hr. Each attempt gives your service more time to actually recover before the next batch arrives.

The constant: each retry's delay is roughly 2-5× the previous one. The total retry window typically caps at a few days because beyond that, the event is probably stale anyway.

Why jitter

Pure exponential backoff has a problem: if 10,000 receivers behind the same load balancer all return 503 at the same second, all 10,000 retries fire at the same second 1 minute later. The recovering service gets a thundering herd.

Adding random jitter (±20% of the calculated delay) spreads retry timing across receivers. Stripe, AWS SNS, and most modern providers add jitter by default.

function nextDelay(attempt: number): number {
  const base = Math.min(60_000 * Math.pow(2, attempt), 2 * 60 * 60_000); // cap at 2h
  const jitter = base * 0.2 * (Math.random() * 2 - 1); // ±20%
  return Math.round(base + jitter);
}

Receiver-side guidance

If you're receiving webhooks, your job is to:

  1. Return 2xx fast (under 5 seconds, ideally under 1). If your business logic takes longer, persist the event and respond 200 immediately — process out-of-band.
  2. Handle retries idempotently. See the idempotency guide.
  3. Return 410 Gone if you've removed an endpoint and don't want more retries.
  4. Return 5xx, not 4xx, on transient failures. 4xx tells most providers "this will never succeed" and they'll give up faster.

Sender-side guidance (you're emitting webhooks)

If you're sending webhooks to your own customers, design retry logic that mirrors what Stripe does:

  • Queue every event. Don't fire-and-forget.
  • Mark deliveries as failed on non-2xx or timeout.
  • Retry with exponential backoff and jitter, capped at 24-48 hours.
  • Expose a delivery log to your customers so they can replay failed events.
  • Auto-disable endpoints after N consecutive failures over M days (Stripe disables after 30 days of failure).

Or skip building this yourself and use Hookdeck/Svix as an outbound webhook gateway. See Hookdeck vs HookSense for the tradeoffs.

Testing retry behavior

HookSense's auto-retry feature (Sense plan) replays failed captured webhooks with exponential backoff — useful for testing how your handler behaves under realistic retry patterns. Set a request to fail, observe the retry schedule, confirm idempotency holds.

Further reading

Related posts

Try HookSense Free

Inspect, debug, and replay webhooks in real-time. No credit card required.

Get Started Free