Webhook Retry Strategies: Exponential Backoff Done Right
How major providers retry failed webhooks — Stripe, GitHub, Shopify — and how to design retry logic for your own outbound webhooks.
Ozer
Developer & Founder of HookSense
Webhook delivery is best-effort. Networks drop packets, servers crash, ack messages get lost. The provider's job is to retry until your handler confirms success. The receiver's job is to handle those retries gracefully.
This guide covers how the major providers retry, how to design your own outbound retry logic, and the role of exponential backoff with jitter.
How major providers retry
| Provider | Retry window | Schedule |
|---|---|---|
| Stripe | 3 days (live), 3 hours (test) | Exponential — 5s, 5m, 30m, 2h, etc. |
| GitHub | 8 hours | Up to 5 retries with increasing delay |
| Shopify | 48 hours | 19 retries with exponential backoff |
| Slack | 1 hour | 3 retries: 1m, 5m, 30m |
| Twilio | 24 hours | ~3 retries depending on response |
All of them retry on non-2xx responses and timeouts. Most treat 410 Gone as "stop retrying" — useful if you want a provider to stop sending to a removed endpoint.
Why exponential backoff
If a provider retried instantly after a failure, your server would get hammered the moment it recovered — and likely fail again from the load. Exponential backoff spaces retries out: 1 min → 5 min → 25 min → 2 hr. Each attempt gives your service more time to actually recover before the next batch arrives.
The constant: each retry's delay is roughly 2-5× the previous one. The total retry window typically caps at a few days because beyond that, the event is probably stale anyway.
Why jitter
Pure exponential backoff has a problem: if 10,000 receivers behind the same load balancer all return 503 at the same second, all 10,000 retries fire at the same second 1 minute later. The recovering service gets a thundering herd.
Adding random jitter (±20% of the calculated delay) spreads retry timing across receivers. Stripe, AWS SNS, and most modern providers add jitter by default.
function nextDelay(attempt: number): number {
const base = Math.min(60_000 * Math.pow(2, attempt), 2 * 60 * 60_000); // cap at 2h
const jitter = base * 0.2 * (Math.random() * 2 - 1); // ±20%
return Math.round(base + jitter);
}
Receiver-side guidance
If you're receiving webhooks, your job is to:
- Return 2xx fast (under 5 seconds, ideally under 1). If your business logic takes longer, persist the event and respond 200 immediately — process out-of-band.
- Handle retries idempotently. See the idempotency guide.
- Return 410 Gone if you've removed an endpoint and don't want more retries.
- Return 5xx, not 4xx, on transient failures. 4xx tells most providers "this will never succeed" and they'll give up faster.
Sender-side guidance (you're emitting webhooks)
If you're sending webhooks to your own customers, design retry logic that mirrors what Stripe does:
- Queue every event. Don't fire-and-forget.
- Mark deliveries as failed on non-2xx or timeout.
- Retry with exponential backoff and jitter, capped at 24-48 hours.
- Expose a delivery log to your customers so they can replay failed events.
- Auto-disable endpoints after N consecutive failures over M days (Stripe disables after 30 days of failure).
Or skip building this yourself and use Hookdeck/Svix as an outbound webhook gateway. See Hookdeck vs HookSense for the tradeoffs.
Testing retry behavior
HookSense's auto-retry feature (Sense plan) replays failed captured webhooks with exponential backoff — useful for testing how your handler behaves under realistic retry patterns. Set a request to fail, observe the retry schedule, confirm idempotency holds.
Further reading
Related posts
Try HookSense Free
Inspect, debug, and replay webhooks in real-time. No credit card required.
Get Started Free