What’s Still Missing Before AI Agents Can Replace Entire Roles in the Enterprise?
The latest advances in AI agent frameworks like Claude Cowork, OpenClaw and models like Opus 4.6 raise an uncomfortable question: Are we planning too conservatively in our transformation strategies?
The Magical Feeling Is Back
In my last blog post on Monday , I described how I installed and tried out OpenClaw myself, with use cases in communication, note management, and tool integration. Now we finally habe AI Agents that really feel like actual agents. What’s special about this agent framework: it doesn’t work via the MCP protocol, but through the command-line interface. The AI operates the operating system directly, at the OS level, and can therefore fundamentally control all applications that a computer offers.
Not always the exact apps we use in everyday life—not the Chrome browser with its graphical interface, but the counterparts optimized for the command line. But what emerges from this truly feels magical.
It’s a feeling I’ve only experienced at a few moments in my personal technology biography: Holding the first smartphone. The first personal website that was suddenly accessible to the entire world. Or in the early ’90s, dialing into a BBS server with a modem to chat and exchange files. A moment where you sense: something fundamentally new is emerging here.
The Starting Position in Most Enterprises
Of course, I didn’t run this experiment purely out of personal curiosity. I’m interested in the implications for our AI strategies.
In the vast majority of enterprises, the starting position today looks like this: planning is based on incremental productivity gains. And that’s understandable. Technology development is a black box. You don’t know for certain whether it will continue to evolve exponentially. You don’t know for certain whether the organization is ready. So you plan conservatively: 10 percent in the first year, 20 the next, 30 after that, and perhaps target 35 percent by 2030.
For highly standardized business models—an insurer, for example, where business processes like claims management always follow similar patterns—you can perhaps be bolder in your projections. But in knowledge-work-intensive service businesses—IT, consulting, professional services—where workflows are relatively unstructured, you plan cautiously.
The consequence: When someone retires or a role opens up, you still fill it with a human. AI doesn’t yet replace an entire role. Instead, you work through scale effects: ten consultants each become 10 percent more productive—so you save yourself the eleventh. Eighty network technicians become more efficient—so you invest the salary of the 81st in automation.
That’s roughly how the discussion goes in German knowledge-work enterprises.
Why the Timelines Could Be Accelerating
And now: fast forward to the new agent systems.
I believe the timelines are potentially accelerating more than many of us (myself very much included) had previously assumed. I too had long assumed that a plateau might come and that a breakthrough in foundational technology would be needed.
I now see this more nuancedly: Yes, the plateau may come. But through the scaling pathways that are already open today—better inference, intelligent model routing, agent-based architectures—there is already enough capability to unlock significantly more than 20-30 percent productivity gains. Provided organizations can manage the rollout. And provided the frameworks get better.
That’s exactly what I want to explore today.
The Central Question: What’s Still Missing?
I tested Claude Code as an agent system. There are others by now as well. The question that’s been on my mind:
What is actually still missing before these systems can be deployed in the enterprise? Not for marginal productivity gains, but to completely take over the tasks of knowledge workers?
In other words, no longer: “We save ourselves the eleventh consultant.” But rather: “We no longer need this role. The AI agent handles it.”
What Today’s AI Agents Already Bring to the Table
Before we get to the gaps, a look at what already works today:
Intelligent model routing. The agents can connect to different models—with varying capabilities and cost structures. Opus 4 is the most powerful model, feels highly agentic, autonomously controls the operating system—but is also relatively expensive. Cheaper models handle simpler tasks. This allows costs to be significantly reduced.
Self-managed memory. The agents independently store skills, long-term memories, emergency instructions, and knowledge about their environment—in Markdown files. They expand and maintain this knowledge autonomously during the conversation, without needing to be explicitly prompted. Though in all honesty, this still requires a lot of tinkering and maintenance.
Powerful tool access via the CLI. Unlike the MCP protocol, where each tool must be individually connected, these agents use the command-line interface. On a typical Linux system, they can do practically everything a normal user would do.
That sounds promising. But for enterprise deployment, critical building blocks are still missing.
The Six Missing Building Blocks for Enterprise Deployment
1. Windows and Office Tools via the Command-Line Interface
An enterprise runs on Windows. Daily work happens in Teams, Outlook, Excel, SharePoint, and SAP. On a Linux system, CLI-based control already works impressively well. But can this be transferred to the Windows world?
Windows has actually made significant strides in recent years regarding the power of its command line. PowerShell is now a mature automation tool deeply integrated into the Windows ecosystem. Microsoft 365 applications can be programmatically controlled via the Microsoft Graph API—reading and sending emails, managing calendar entries, sending Teams messages, editing SharePoint documents. Excel can be operated from scripts via COM automation or the Office JavaScript API. Even SAP offers programmatic access through RFC interfaces, the SAP CLI, and SAP Business Technology Platform APIs.
However: the ecosystem is more fragmented than under Linux. While on a Linux system nearly everything can be controlled through uniform CLI conventions, under Windows you need to combine different APIs, SDKs, and authentication mechanisms. For an AI agent, this means: it’s technically feasible, but integration is more complex, more error-prone, and requires more configuration per tool. We’re perhaps at 60-70 percent of the journey here. The foundations are in place, but for robust, productive deployment, quite a bit of fine-tuning is still needed—especially with heavily customized enterprise systems. You can also see that some vendors are now quickly adding CLI support after OpenClaw made headlines (e.g., Obsidian).
2. Security—and the Prompt Injection Problem
My experiment ran on an isolated system with data whose compromise would have been tolerable. For enterprise deployment, the agent would need the same access rights as the employee it replaces: emails, documents, internal systems.
And this is where the prompt injection problem lurks. If the agent reads emails, someone could write an email that tricks the AI into executing malicious code. If the agent searches the web, a prepared website could contain an injection. Essentially a honeypot, waiting for an AI to come along and execute the command.
The good news: model resistance to prompt injections has improved significantly recently. Opus 4 is now very difficult to trick into executing malicious code. And to be fair: humans aren’t flawless either. Many employees click on phishing links. Social engineering demonstrably works. The CEO fraud call that convinces someone to make a wire transfer happens every day.
So the AI doesn’t need to be perfect. It needs to be brought to a comparable security level as the human. I consider this solvable given the current technology trajectory. But as long as it isn’t solved, no productive deployment is conceivable—not even as a pilot.
3. Cost
If an AI agent works eight hours a day, routing between different models, we currently land at easily >$1,000 to $2,000+ per month even with optimized usage. When replacing a human worker, the math needs to work out.
I would say: the price point needs to be below $1,000 per month for a full-time agent to become attractive for broad adoption. And I’m confident we’ll get there. Inference costs reliably decrease from generation to generation (though token consumption rises correspondingly). What Opus 4 costs today will likely be 5-10x cheaper in a year. If by 2027 we land at one-tenth of today’s costs, the business case for full-time agentic work suddenly becomes very attractive—without even needing to argue through scale effects.
4. Communication Channels and Access to Tacit Knowledge
Here lies a problem that has received little discussion so far. In an enterprise, process knowledge is not fully documented. Customer knowledge, organizational knowledge, implicit experiential knowledge—it lives in the heads of employees.
For highly standardized processes like in insurance, everything relevant may be captured in systems and databases. But in knowledge-intensive service businesses, where daily work involves manual adjustments, consulting, and decision-making, things look different. The same applies in administrative functions.
What would be needed: an architecture in which AI agents can query not only their principal but also other people in the organization. Via Teams message, via email—the way a new colleague would when they have a question. The agent wouldn’t know everything, but it would know whom to ask.
A fascinating side effect: every time the agent retrieves tacit knowledge from a human, it can store it and access it independently next time. The “undocumentable” organizational knowledge would gradually become documented—as a byproduct of the agent’s work. The number of follow-up questions decreases over time. And with it, the necessary human capacity.
Curiously, I haven’t heard of such concepts yet. The closest thing would be OpenClaw and its Discord connector.
Additionally, a thought-provoking teaser: imagine an experienced subject matter expert eventually spending their entire day answering the knowledge-gap queries of AI agents. Job profiles could change radically. From executor to knowledge supplier. Until all the knowledge has been extracted.
5. Operating Graphical User Interfaces
Not everything has a CLI. SAP screens, proprietary business applications, heavily customized workflows. Much of this is only operable through graphical interfaces. What if the AI could simply operate the user interface the way a human does?
Google is researching this, Manus AI has demonstrated approaches. But as of today, visual UI control by large language models doesn’t yet work reliably enough. Especially not with the specialized systems that every enterprise has individually configured. Customized SAP screens, industry-specific workflows, legacy interfaces. This is a long tail of edge cases that is difficult to train for.
I’d estimate we’re at about 50 percent of the journey here. Foundational research shows that it can work. The question is how long the last 50 percent will take to reach production readiness.
6. Governance and Regulation
When it’s not one agent working in the organization but dozens, hundreds, thousands, you need an orchestration platform: monitoring, logging, deterministic guardrails on top of the non-deterministic AI logic, compliance evidence, audit trails.
Platforms like n8n are moving toward workflow orchestration. But a complete governance platform for the scenario I’m describing here—autonomous AI agents as full-time workers—doesn’t exist yet. However: in a world of agentic engineering, where the marginal cost of software development approaches zero, such a platform will emerge quickly once the other building blocks are in place. I’m not too worried about that.
What concerns me more is the regulatory dimension. A governance platform can be built quickly. But whether regulation and society will allow organizations to automate themselves to this extent—data protection law, labor law, the AI Act—that’s a different matter entirely.
That said: in a global competition where inaction means that another economic system will simply go ahead and do it, regulation won’t be a showstopper. At most a delayer.
What Does This Mean for Our Transformation Plans?
My assessment, as of today:
Does anything need to change in the plans right now? For general knowledge work: not yet. Continue planning with gradual efficiency gains. Organizations are more sluggish than you’d think. More cautious. More risk-averse.
Although—and this is an ironic thought—this sluggishness naturally comes from the people. And in the scenario I described above, you no longer need precisely these people for the rollout. The IT director could simply switch the agents on. The question, then, is whether the organizational braking factors even apply anymore when the introduction itself no longer requires organizational development.
My time horizon:
- 2026: No, nothing fundamental happens in the enterprise world yet. I’m confident about that. In the consumer space, however, it does.
- 2027: Experiments, cautious first steps, initial pilots now also in enterprises.
- 2028-2029: Increasing maturity of the building blocks, first productive deployments.
- 2030: The potential tipping point at which AI agents achieve their breakthrough.
And when that tipping point arrives, we’re no longer talking about 30 percent productivity gains. We’re talking about 80 percent. We’re no longer talking about gradual efficiency gains, but about complete societal disruption—AI agents working directly on the value-creation product, no longer humans. With market dynamics that are barely foreseeable today. One scenario has been well described in the article by Citrini (absolutely worth reading). That one also put some healthy pressure on stock prices.
My Conclusion
The six building blocks I’ve identified:
| Building Block | Maturity Level (estimated) |
|---|---|
| CLI control (Linux) | ✅ 90%+ |
| CLI/API control (Windows/Office) | 🔶 60-70% |
| Security / Prompt injection resistance | 🔶 70% |
| Cost efficiency | 🔶 50-60% |
| Communication & tacit knowledge access | 🔴 30% |
| UI operation (graphical) | 🔴 50% |
| Governance platform | 🔴 20% |
| Regulatory framework | 🔴 10-20% |
None of these building blocks requires a fundamental technology breakthrough. It’s about engineering, integration, cost reduction, and regulatory clarification. These are solvable problems. The only question is: how fast.
And precisely this question—how fast—is the one that will either validate or render obsolete our transformation plans.