You have two AI agents. Agent A is given a tool with a classic programmatic API: a Python function get_user_data(user_id: int, fields: list[str]) -> dict. Agent B is given a different kind of tool: a command-line interface (CLI) that it can execute in a shell: userctl get --id=<user_id> --fields=<comma,separated,list>.
Both agents are asked to retrieve a user’s name and email. Agent A, the Python user, sometimes struggles. It might pass a string for user_id, forget to wrap the fields in a list, or try to interpret the complex dictionary that is returned. It needs to understand Python’s type system and data structures.
Agent B, the CLI user, finds the task much easier. It constructs a string: userctl get --id=123 --fields=name,email. The output it gets is simple, human-readable text. The complexity of the underlying system is hidden behind a clean, composable, text-native interface.
This difference is not trivial. It points to a fundamental principle of reliable agent engineering: agents are more effective when their tools speak the same language they do—text.
When we give an agent tools in the form of native programming language functions (like Python or TypeScript), we are forcing it to operate in two different domains at once. It has to “think” in natural language and then translate its intent into the rigid, typed syntax of a programming language.
This creates several points of friction:
We are asking the agent to be a programmer. While modern models are certainly capable of writing code, it’s an added layer of complexity that often gets in the way of the actual task.
A common anti-pattern is to provide the agent with a software development kit (SDK) full of opaque function calls. The agent is given a prompt like, “You have access to a commerce_api object with methods like list_products and create_order.”
The agent can’t see the source code of these methods. It can’t inspect how they work. It’s interacting with a black box. If a call fails, the agent gets a stack trace, which can be hard to interpret without the underlying context. It can’t self-correct by examining the tool’s implementation. It’s flying blind.
The alternative is to model your agent’s tools on the design philosophy of the Unix shell: a collection of simple, text-oriented, and composable command-line tools.
In this model, every tool is a “program” that the agent can execute.
--verbose, --output=json).This approach has profound benefits for agent reliability:
tool --help to get the full documentation. This makes the system self-documenting and allows the agent to recover from errors by teaching itself the correct usage.userctl list --all | grep 'status=active' | userctl-format --template ' <>'. This allows the agent to construct complex workflows from simple, modular parts without needing a new, specialized tool for every task.userctl commands in their own terminal. The tools are shared, understood, and vetted by both humans and AIs, leading to a more robust ecosystem.The agent harness’s job becomes much simpler. It’s not a Python interpreter or a JSON parser. It’s a shell. It takes the text generated by the model, executes it, and returns the standard output, standard error, and exit code back to the model for its next reasoning step.
monstertool that does everything. Build user-list, user-get, user-create.--output=json) to get structured, machine-readable output. This gives the agent the option to get raw data when it needs it.--help Text: The built-in documentation for your tool is part of its API. It should be clear, comprehensive, and provide examples. The agent will read it.By designing text-native tool surfaces, we are meeting the agents where they are. We are creating an environment that is natural to them, one that leverages their core strength: generating and understanding text. We stop forcing them to be mediocre Python programmers and instead empower them to be expert users of a powerful, composable toolset.
This shift in interface design is a crucial step in building more reliable and resilient agents. It simplifies the agent harness, improves discoverability, enables powerful composition, and creates a seamless synergy between the agent and its human supervisors. Before you write another Python function for your agent to call, ask yourself: could this be a CLI instead? The answer will often lead you to a more robust and elegant system.
The interface between an agent and its tools is a critical control surface. Forcing an LLM, a native text processor, to conform to rigid programmatic APIs creates unnecessary friction and fragility.
Text as the Universal Interface. An agent’s native language is text. Tools should, whenever possible, expose a text-native interface (e.g., a command-line interface) rather than a programmatic one (e.g., a Python function). This minimizes the “impedance mismatch” between the agent’s text-based reasoning and the tool’s structured input requirements.
Embrace Composability. Following the Unix philosophy, tools should be small, single-purpose, and designed to be composed. An agent that can pipe the text output of one tool to the text input of another can create complex workflows dynamically, without requiring a new, monolithic tool to be written for every unique task.
Enable Introspection for Self-Correction. A key advantage of text-native (specifically, CLI-style) tools is that they can be self-documenting. The agent can be taught to run tool --help to retrieve its own instructions, a powerful pattern for self-correction and adaptation that is impossible with opaque, black-box SDK functions.