The AI coding assistant has finished its task. It has refactored the legacy module, updated the dependencies, and even added a few new tests. Now, it needs to get its work merged. It opens a pull request, assigns it to the lead developer, and waits. And waits. The lead is busy, in meetings, or on vacation. The automation, for all its speed and efficiency, has slammed into a wall of human latency. The throughput of the entire system is now bottlenecked by a single, manual approval step.
This is the central paradox of human-in-the-loop (HITL) governance. We need human oversight for safety, quality, and authority. But if implemented naively, that oversight can completely negate the benefits of automation.
Effective agentic systems don’t just ask for help; they have a sophisticated, multi-layered escalation framework. They treat human attention as the scarcest and most valuable resource in the system and are designed to consume it as efficiently as possible. This means not all escalations are created equal.
Many HITL systems operate on a simple, binary model: the agent either works fully autonomously, or it stops and waits for a human. This “all or nothing” approach is inefficient and frustrating for both the agent and the human.
We are using a sledgehammer—a full, synchronous stop—for every problem, whether it’s a fly or a boulder.
The crudest form of this is the synchronous ask_user tool. The agent’s loop is blocked until the human responds to a prompt like, “I have written the code. May I proceed with opening a pull request? [Y/n]”.
This is the worst of all worlds. It demands the human’s immediate, undivided attention. It couples the agent’s liveness directly to human responsiveness. It doesn’t provide the human with enough context to make an informed decision (where is the code? what are the changes?). And it doesn’t allow the agent to do anything else while it waits.
It’s a pattern that guarantees low throughput and high frustration.
An effective governance framework is asynchronous and tiered. It recognizes that different situations require different levels of human intervention. The agent should be able to continue with other tasks while an escalation is pending, and the system should be able to act on its own if a human doesn’t respond within a reasonable timeframe.
Here is a model for a tiered escalation system:
Tier 1: FYI (For Your Information)
feature/logging-refactor.”Tier 2: Soft Approval (Veto Power)
feature/logging-refactor PR in 15 minutes unless you veto this action.”Tier 3: Hard Approval (Gated Action)
old-customer-records-2022 database. This action is irreversible. [Approve] [Deny]”send_slack_message might be Tier 1. git_merge might be Tier 2. aws_s3_delete_bucket is definitely Tier 3.request_human_review(tier, message, context_data). This allows the model to use its own reasoning to decide when it’s uncertain. If it’s trying to fix a bug and has two plausible but conflicting approaches, it can proactively request a Tier 2 review from a human to get their opinion.Human governance is not an obstacle to be routed around; it’s a critical component of a robust and trustworthy AI system. The goal is not to eliminate human oversight, but to make it as efficient and impactful as possible.
By moving away from the simplistic, blocking “ask for permission” model and toward a sophisticated, tiered, and asynchronous escalation framework, we can have the best of both worlds. We get the speed and scalability of automation, guided by the wisdom and authority of human experts. We build systems that respect human attention, preserve human authority, and still allow our agents to get their work done.
A well-designed automated system doesn’t just run on its own; it knows how and when to ask for help. This escalation strategy should be built into the system’s core, treating human attention as its most valuable and expensive resource.
Tiered Escalation Pathways. The system must differentiate between events that are informational (FYI), those requiring optional oversight (a soft check), and those demanding mandatory approval (a hard gate). A single, monolithic “ask for permission” model creates bottlenecks and review fatigue.
Asynchronous by Default. Human review should not block agent execution unless absolutely necessary. The system should favor asynchronous notifications and veto-based timeouts over synchronous, blocking calls to preserve automation throughput.
Human Authority as a System Primitive. The requirement for human approval on high-stakes actions is not an edge case; it is a core feature. The architecture must treat human intervention as a primary, auditable event, not an informal interruption. The final authority of the human user must be structurally guaranteed.