Launch special — let's split the check with SPLITCHECK for 50% off
4 min read

Webhook Returns 500 in Production: Diagnose and Fix

Your webhook handler returns 500 and the provider keeps retrying. How to find the real error, stop retry storms, make handlers idempotent, and acknowledge fast.

ProductionDebuggingReliabilityWebhooks
O

Ozer

Developer & Founder of HookSense

Your webhook handler works on every test, then returns 500 in production — and because providers retry failed deliveries, one bad payload quietly turns into a retry storm. This guide covers how to find the real error, stop the storm, and restructure the handler so a downstream bug never causes it again.

What a 500 actually triggers

Providers treat any non-2xx as a failed delivery and retry with exponential backoff — Stripe for up to 3 days, GitHub for several hours, others on their own schedules. If your handler fails on every attempt, those retries accumulate. Two things follow: your error tracker fills with duplicates, and after sustained failures the provider may disable the endpoint entirely. So a 500 isn't just one error; it's a compounding one.

Step 1: Find the payload that breaks it

Production 500s are almost always an unhandled exception on a payload shape you didn't see in testing — a missing optional field, a null where you expected a string, a new event subtype, an unexpectedly large body. Reading stack traces helps, but the fastest fix is to get the exact failing payload and run it against your handler locally.

Capture the request with an inspector, then replay it at localhost with a debugger attached. You can fire the same bytes as many times as you need, instead of waiting for the provider to resend on its backoff schedule.

// Reproduce locally: forward captured webhooks to your dev server
npx hooksense listen -p 3000 --path /api/webhooks
// then replay the failing request from the dashboard, debugger attached

Step 2: Acknowledge fast, process later

The structural fix for most webhook 500s is to stop doing real work inside the request. Verify the signature, persist the raw event durably (a queue or a table), return 2xx immediately, and process asynchronously:

app.post("/webhooks", express.raw({ type: "application/json" }), async (req, res) => {
  const event = verifyAndParse(req.body, req.header("Stripe-Signature"));
  await queue.add("process-webhook", { raw: req.body.toString(), id: event.id });
  res.sendStatus(200); // acknowledge before any business logic runs
});

Now a bug in your business logic can't cause endless provider retries — the provider already got its 2xx. You retry from your own queue, on your own terms, with your own backoff and alerting.

Step 3: Make the handler idempotent

Once you accept retries (whether from the provider or your own queue), the same event will be processed more than once. Without idempotency that means double charges, duplicate emails, or duplicate rows. Deduplicate by the provider's event ID:

async function process(event) {
  const { rowCount } = await db.query(
    "INSERT INTO processed_events (id) VALUES ($1) ON CONFLICT DO NOTHING",
    [event.id],
  );
  if (rowCount === 0) return; // already handled — safe to skip
  await applyBusinessLogic(event);
}

Step 4: Stop an active retry storm

If you're already in a storm, work in this order:

  1. Stop returning non-2xx. Switch to acknowledge-then-process so the provider stops retrying immediately.
  2. Add idempotency so the backlog of in-flight retries can flush without duplicate side effects.
  3. Replay deliberately. Once the bug is fixed, replay the events that genuinely failed from your own captured store — not by waiting on the provider.

Don't return 200 to hide failures

One caveat: acknowledge fast, but only after the event is durably stored. Returning 200 and then crashing before you persist the event means it's gone forever — the provider believes it succeeded and won't retry. The contract is "I have safely taken responsibility for this event," not "I looked at it."

HookSense keeps a searchable history of every request your endpoint received, including the ones that 500'd, and replays any of them to your handler so you can reproduce and fix production failures on demand. Create a free endpoint to capture your next failing webhook.

Related posts

Related terms

Try HookSense Free

Inspect, debug, and replay webhooks in real-time. No credit card required.

Get Started Free