Crawl, Walk, Run, Fly - The Four Phases of AI Agent Maturity

This is the second article in a five-part series on agentic AI in the enterprise. In Part 1, we explored what agentic AI is and how it differs from generative AI, highlighting the shift from hype to pragmatic reality. Here in Part 2, we focus on how organisations progress towards autonomy through distinct maturity phases, and why taking it step by step matters.

Deploying autonomous agents isn’t an overnight revolution. It’s a journey of increasing capability and trust. In practice, enterprise adoption of AI agents can be viewed as a maturity spectrum with four broad phases, progressing from basic assistive tools to fully autonomous systems. Think of it as crawl - walk - run - fly in terms of an organisation’s AI capability. Understanding where you are on this curve helps set realistic expectations and next steps for your AI projects. Most enterprises today are somewhere in the middle, experimenting with advanced assistants or narrow autonomous agents, rather than at the finish line of “AI doing everything.” Let’s define each phase and what it looks like in real life. (We’ll use the crawl/walk/run/fly metaphor to make it memorable).

Phase 1 - Assisted Intelligence (Crawl): At the base of the ladder are the traditional automation and analytics solutions that have been around for years. Think rule-based workflows, simple chatbots, robotic process automation (RPA) bots, or classical machine learning models that make isolated predictions. These systems can automate repetitive, well-defined tasks (for example, flagging a fraudulent transaction or generating a report from a template) and assist humans by handling grunt work. However, they have no dynamic planning or true autonomy - they execute predetermined rules or model outputs in a fixed way. Most enterprises already have this foundation in place (perhaps a basic customer service chatbot or an ML classifier sorting incoming emails). The impact is real (e.g. efficiency gains for narrow tasks) but it’s limited by the lack of adaptability or initiative. In short, Phase 1 is like having a scripted assistant that only does exactly what it’s pre-programmed to do, nothing more.

Phase 2 - Generative AI Assistants (Walk): The last couple of years have seen an explosion of generative AI-powered assistants that operate with far more flexibility. This is the era of tools like Microsoft’s Copilot in Office apps, Google’s Duet AI for Workspace, or custom GPT-based chatbots that can understand natural language and handle more complex requests. These assistants represent a big step up in capability - for instance, they can summarise documents, draft emails, answer free-form questions - providing a significant productivity boost across many knowledge-work tasks. However, these assistants are still mostly reactive. They work one query or command at a time and rely on the user to initiate each interaction. In other words, they’re assistive tools that enhance human work, not autonomous agents that can initiate or chain together tasks independently. Phase 2 is where many companies’ AI efforts blossomed during the generative AI boom: lots of proof-of-concepts with chatbots and helpers that can respond intelligently, but don’t truly act on their own. It’s like having a very smart colleague on call, but one who only speaks when spoken to.

Phase 3 - Goal-Driven AI Agents (Run): Here we reach true agentic AI. Systems in Phase 3 can be given a high-level goal and will proactively devise and execute a multi-step plan to achieve it. They incorporate capabilities like planning algorithms, tool use (e.g. calling APIs), memory of prior context, and dynamic learning from feedback. In practice, these are “digital colleagues” that can handle well-bounded objectives end-to-end. For example, an IT support agent at this level might autonomously handle a user’s request from start to finish: read the ticket, diagnose the issue (maybe by querying logs or a knowledge base), apply a fix, and then confirm resolution, escalating to a human only if it hits an unknown problem. In 2025, many enterprise pilots are hovering in this Phase 3 category: agents that can do non-trivial tasks (data analysis, marketing campaign optimisation, incident response, etc.) with minimal intervention. This is a major leap in capability - moving from one-step answers to multi-step autonomous execution - but it also brings major complexity. To work reliably, it demands a robust architecture and strong guardrails (those “Seven Pillars” we’ll discuss in Part 3). Most successful “agentic AI” stories today fall into this Phase 3 zone, often with a human-in-the-loop for oversight or final approval on important actions. In other words, the agent is running, but with a safety harness attached.

Phase 4 - Fully Autonomous Agentic Systems (Fly): The aspirational end-state is a system (or an ecosystem of systems) of AI agents that operate with minimal human involvement - effectively functioning as a digital workforce for certain tasks or processes. A Phase 4 scenario might be, say, an autonomous order-fulfilment agent (or a team of agents) that receives customer orders and then handles everything from inventory checks to arranging shipment and updating the customer, adapting to issues along the way, all without hand-holding. In theory, you could delegate an entire business process to AI agents. In practice, very few organisations have anything close to this in production yet. The technical, ethical, and organisational challenges are significant, and understandably, most companies aren’t ready to let an AI roam free in critical operations. Fully autonomous systems raise hard questions of control, liability, and trust that are still being figured out. For now, Phase 4 remains largely in the realm of experiments and conceptual pilots. It’s a compelling vision of the future (like “flying”) but most enterprises will get there (if ever) only after mastering the earlier phases and proving value step by step.

Where do companies stand today? In my experience (echoed by industry surveys), most companies right now cluster in Phases 2-3 - they’re using generative AI assistants to augment staff, and maybe running a pilot of a more autonomous agent in a specific use case. It’s common to start by deploying a GenAI assistant to help employees (Phase 2), then pilot a goal-driven agent for one high-value process (Phase 3). Each step up the maturity curve requires not just better tech, but stronger processes and cultural readiness. Not every organisation will need or want to reach Phase 4 in all areas - the aim isn’t autonomy for its own sake, but improved outcomes. In many cases, a Phase 3 agent with a human overseer delivers the best balance of efficiency and risk management. The maturity model is a guide to help you decide where to apply agentic AI next and how to chart a safe path forward.

Crucially, knowing your current phase helps manage expectations. For example, if your firm is still “learning to crawl” with basic RPA bots, jumping straight to a fully autonomous agent managing critical tasks would be asking for trouble. It might be wiser to introduce a generative assistant first, get comfortable with AI outputs, then gradually give the AI more autonomy in a controlled area. Conversely, if you’ve done successful Phase 2 pilots, you might be ready to experiment with a Phase 3 agent - but you’ll need to invest in the architecture and governance to support it. The message is: walk before you run (and certainly before you fly).

In the next part of this series, we’ll move from this conceptual roadmap to the architecture needed for success. What does it take under the hood to turn a nifty prototype agent into a production-grade solution? As it turns out, successful agentic AI systems share a common DNA. In Part 3, we’ll break down the seven key pillars of an enterprise AI agent’s architecture, from how it perceives input to how it is governed, and share design tips for each.