How does an AI agent receive an async callback?

The agent creates a callback endpoint (a webhook URL) over MCP, hands that URL to the long-running job, then calls wait_for_callback. The server blocks until the webhook arrives, verifies its signature, decrypts the body, and returns it — so the agent gets the result the moment it lands instead of polling.

Why is polling bad for AI agents?

Each poll is a tool call that consumes context-window tokens and budget, and it still adds latency between the result being ready and the agent noticing. A blocking wait_for_callback makes one call and is woken the instant the webhook arrives.

What is wait_for_callback?

An MCP tool that blocks until the next webhook hits your endpoint, then returns the request (signature-verified and decrypted). It accepts a timeout and an `after` cursor so a callback that arrived between calls is returned immediately rather than missed.

Last updated: June 19, 20264 min read

How AI Agents Receive Async Callbacks Without Polling

AI agents that kick off async work — deploys, renders, approvals, long tool calls — usually poll for the result, burning context and budget. Here's the callback pattern: give the agent a webhook URL it can await over MCP with wait_for_callback.

AI AgentsMCPCallbacksWebhooks

Ozer

Developer & Founder of HookSense

AI agents are great at deciding what to do. The awkward part is waiting. The moment an agent kicks off something that doesn't finish instantly — a deploy, a video render, a human approval, a long-running tool call, a hand-off to another agent — it has to find out when that work is done. The default answer has been polling, and for agents, polling is uniquely expensive.

This post covers the callback pattern for agents: instead of looping "is it done yet?", give the agent a webhook URL it can await, and wake it the instant the result lands.

Why polling is worse for agents than for servers

In a normal backend, a polling loop is a cheap setInterval. In an agent, every "check again" is a tool call that:

Burns context. Each poll and its response are tokens in a finite context window. Poll a render that takes ten minutes and you can spend more context on "not done yet" than on the actual task.
Costs money and latency. Each round trip is an LLM call. You either poll often (expensive) or rarely (slow to notice the result).
Loses the thread. Long gaps between polls invite the agent to wander off, summarize, or drop the original goal.

The fix is the same one backends learned years ago: stop asking, get told. Webhooks. The twist for agents is delivering that webhook back into the agent's own loop — over the Model Context Protocol (MCP).

The callback pattern, in three steps

Create a callback endpoint. The agent calls a tool that returns a unique URL.
Hand the URL to the job. Pass it as the webhook/callback target of whatever async work you started.
Await it. The agent calls a blocking tool that returns the moment the webhook arrives — already verified and decrypted.

With HookSense's MCP server, those map to create_callback_endpoint and wait_for_callback. Add the server to any MCP client (Claude Desktop, Cursor, Claude Code) with one npx line:

{
  "mcpServers": {
    "hooksense": {
      "command": "npx",
      "args": ["-y", "@hooksense/mcp"],
      "env": { "HOOKSENSE_TOKEN": "hsk_..." }
    }
  }
}

Then the agent's flow looks like this:

// 1. agent opens a callback URL
create_callback_endpoint()
  -> { callbackUrl: "https://hooksense.com/w/ab12cd" }

// 2. you start the job with that URL as its callback
//    (Replicate prediction, deploy hook, approval step, another agent...)

// 3. agent awaits the result instead of polling
wait_for_callback({ slug: "ab12cd", timeoutMs: 30000 })
  -> { status: "received", request: { body: { status: "done", ... } } }

One call. The agent is suspended until the webhook lands, then resumes with the payload in hand.

Don't miss the callback that arrives too early

There is one race to handle: what if the job finishes between creating the endpoint and calling wait_for_callback? Naive blocking would wait forever for an event that already happened.

The fix is a cursor. Pass after — the timestamp of the last callback you saw — and the server returns any callback newer than that immediately, only blocking if there genuinely isn't one yet:

wait_for_callback({ slug: "ab12cd", after: "2026-06-19T10:00:00.000Z" })

This makes the wait idempotent and safe to retry: if it returns { status: "pending" } after the timeout, just call it again. No event slips through the gap.

Verify before you act

A callback is an external instruction. If your agent acts on it, you want to know it's genuinely from the sender — not forged by anyone who guessed the URL. HookSense verifies the signature and decrypts the body before handing the callback to the agent, and exposes a verify_signature tool for an explicit timing-safe HMAC check. We cover that in depth in Verifying Webhook Signatures Inside an MCP Server.

When to use this

Long tool calls — renders, builds, exports that outlast a single request.
Human-in-the-loop — block until a person approves, then continue.
Multi-agent hand-offs — one agent waits for another to report back.
External jobs — Replicate, Temporal, CI/CD, payment confirmations.

If your agent is polling for an async result today, it's a candidate. Give it a callback URL it can await instead — get a HookSense endpoint in one second, no signup, at hooksense.com.

Share:X / Twitter LinkedIn

Jun 19, 2026·Security