AI for Enterprise: A Practitioner's Guide

What's real, what to build, what to buy, and how to wire it all together

TL;DR

Don't slap AI on everything, Most enterprise problems are still best solved with good engineering and automation.

Where AI does belong, five patterns cover every use case and knowing which one you're in determines what to build, what to buy, and how to govern it.

5 patterns: Search & Discovery, Conversational, Agentic Workflows, Autonomous Agents , Predictive.
Build vs buy: You buy understanding, you build action. If the AI writes to a System of Record, you are building.
Building blocks : Experience, Control, Reasoning, Integration, Data. Governance wraps everything.
Ship early, iterate fast. A slack bot calling LLM and querying your data warehouse is a valid v1.
Read “What’s real“ before you build anything. The hard truths and traps will save you six months of pain.

What's real

AI is not automation

Automation does exactly what you tell it. Every time. The same way. System integrations, bots, cron jobs, workflow engines. A human authored every decision.

AI handles things it's never been programmed for, it interprets intent, reasons over unstructured data, and deals with ambiguity.

The test: hand it a request it's never seen before. Does it figure it out? If yes → AI. If it falls over → automation.

In practice, you use both. AI decides what needs to happen (reads the Slack message, understands the intent, figures out which systems to hit). Automation executes the deterministic parts (calls the API, updates the record, sends the email). If you can replace the LLM with a switch statement and get the same result, you didn't need AI.

Hard truths

If your foundation isn't solid, AI will make it worse. AI amplifies what's underneath it. If your data is inconsistent, your domains are tangled, your integrations are spaghetti - the AI will reflect all of that mess back at your users, confidently and at scale.
Hallucination is real and it matters most where the stakes are high. LLMs will confidently generate plausible-sounding answers that are factually wrong. For a search query, that's annoying. For compliance determination, it's a material business risk. Never let an LLM be the sole decision-maker on anything that moves money, affects customers, or touches compliance.
You have to decide how much risk you're willing to take with your corporate data. Every AI system needs access to your data to be useful. That means customer records, financial data, and proprietary business logic flowing through LLM providers, vector stores, and potentially third-party platforms. Know exactly where your data goes, who can access it, what gets retained.
Eval is harder than building. Getting a prototype working takes days. Knowing whether it's giving good answers takes months. Most teams ship without a real evaluation framework and wonder why users stop trusting it. If you can't measure quality, you can't improve it. Build your eval before you build your second feature.
Your AI is only as fresh as your data pipeline. Users will ask "what's our current MRR" and expect a real-time answer. If your vector index was refreshed three days ago and your Snowflake view syncs nightly, the AI will confidently give a stale answer with no indication it's outdated. Freshness SLOs directly determine whether users trust the system.

Traps to avoid

The demo is not the product. Every AI demo looks magical, the LLM nails the happy path, and we green light the project. Then you hit production: edge cases, stale data, permission errors, users who phrase things in ways you never anticipated. The gap between demo and production-grade AI is 10x the effort.
You will want to put AI everywhere. Don't. Each new use case is a new data pipeline, new guardrails, new eval, new support burden. Pick 2-3 high-impact use cases, nail them, then expand.
Don't wait for the perfect architecture. The architecture is a target state, not a prerequisite. Start with a basic use case. Ship it. Learn. Improve.
Agents without a kill switch. An agent confidently executing the wrong plan at 2am, creating records in CRM with nobody able to stop it, that's a production incident. Every agent needs budget caps, time limits, scope boundaries, and a manual halt
AI doesn't replace good software engineering, it demands more of it. AI systems are harder to reason about, the failure modes are subtle, and the blast radius spans multiple systems. Treat your AI codebase like what it is - production software that happens to call an LLM.

The Core AI solution patterns

Every AI use case I have encountered falls into one of five patterns. Each has a different architectural footprint, different infrastructure, governance, and integration needs.

Search & Discovery “What do I need to know?" One question in, one answer out. Stateless retrieval across enterprise data, synthesized into a response. The foundation everything else builds on.

"What's our refund policy for enterprise customers? "Find the latest SOC 2 report"
Conversational "Help me think through this" Multi-turn, investigative, contextual. Two sub-types: single-shot (one query → direct answer) and investigative (multi-step reasoning using ReAct loops: Thought → Action → Observation → Repeat).

"What's our churn rate?" → "How does that compare to last quarter?" → "Draft a summary for the sales team"

3. Agentic Workflows "Do this for me, with guardrails" The AI acts within defined boundaries. Known process, AI orchestrates the steps, loops in humans at approval points. Shares the same ReAct engine as Conversational-Investigative, the difference is scope.

"Triage this production ticket and run RCA"

4. Autonomous Agents "Handle this end-to-end" More autonomy. The AI plans its own approach, selects tools, self-corrects. The workflow shape isn't predefined, the agent figures it out.

"Research these 50 vendor proposals and give me a shortlist with rationale"

5. Predictive "What's likely to happen?" ML/analytics pattern. Forecasting, anomaly detection, scoring. Feeds signals into the other four.

Pattern comparison

Pattern	User Control	AI Autonomy	Speed	Best For
Search & Discovery	High	Low	Real-time	Information retrieval
Conversational	High	Low	Real-time	Q&A, exploration, generation
Agentic Workflows	Medium	Medium	Minutes - hours	Process automation
Autonomous Agents	Low	High	Hours–days	Complex goals
Predictive	Low–Medium	Medium	Real-time / batch	Forecasting & alerts

The moment you cross from #2 to #3, governance requirements jump significantly. That's why the build vs buy line sits between these two patterns.

Build vs buy : the understand → act spectrum

The answer isn't per-pattern it's based on where your use case sits on the understand → act spectrum

The dividing line: you buy understanding, you build action.

Pattern	Decision	Rationale
Search & Discovery	Buy	Retrieval, connectors, permission sync are commodity. Maintaining 20+ enterprise connectors isn't where your engineers should spend time.
Conversational	Buy (mostly)	Multi-turn being added to search platforms. Build only when conversation needs to act on your systems.
Agentic Workflows	Build	Buy infrastructure primitives. Build the orchestration no vendor knows your business processes.
Autonomous Agents	Build	No credible platform exists. Governance and boundary enforcement are too critical to outsource. Don't start here until #3 is solid.
Predictive	Buy	ML platforms are mature. Build feature engineering and domain models on top.

What "build" actually means

"Build" doesn't mean writing everything from scratch. It means assembling and orchestrating mature components with your business logic as the glue. The ecosystem has matured significantly.

Orchestration frameworks are production-ready. LangGraph, CrewAI, AWS Agent Core, Semantic Kernel, Haystack - these give you the state machine, checkpointing, tool calling, and agent patterns out of the box. You're not building the framework, but defining the workflow graph that's your business logic, and it's where the value lives.
LLM access is commodity. You don't need to build an LLM gateway from scratch. AWS Bedrock gives you multi-model access (Claude, Llama, Mistral, Cohere) with built-in guardrails, logging, and cost tracking behind a single API.
Vector search is solved infrastructure. You don't need a standalone vector database for every use case. Amazon Bedrock Knowledge Bases gives you managed RAG with S3 as the document source - upload your docs, it handles chunking, embedding, indexing, and retrieval. Pgvector adds vector search to your existing PostgreSQL. OpenSearch has native vector capabilities. The choice depends on your scale and existing infrastructure, but none of these require significant build effort.
Governed system access is the real build. This is where your effort goes. The connectors to your Systems of Record that let AI safely read from and write to your CRM, billing platform, ERP, and internal services. No vendor provides this for your specific systems with your specific business rules, field-level access controls, and approval thresholds. This is the moat.

So when we say "build" for pattern #3, the actual work breakdown looks like this:

Component	Effort	How
LLM access	Buy	Bedrock, Azure OpenAI, or direct API with LiteLLM
Vector search / RAG	Buy / configure	Bedrock Knowledge Bases, Pgvector, Pinecone
Orchestration framework	Buy / adopt	LangGraph, CrewAI, AWS Agent Core, Semantic Kernel
Orchestration logic	Build	Your workflow graph, decision nodes, routing rules
System connectors	Build	Governed, parameterized tool interfaces to your SoRs
Approval workflows	Build	HiL gates, thresholds, escalation paths
Domain-specific prompts	Build	System prompts, few-shot examples, output schemas
Eval harness	Build	Golden test sets, regression suites, quality metrics
Observability	Buy	Datadog, LangSmith → commodity

The building blocks

You've picked your components: Bedrock for LLM access, LangGraph for orchestration, Pgvector for retrieval, custom connectors for your systems. Now the question is, how do you organize all of this so it doesn't become a tangled mess six months from now?

Most production AI architectures converge on a similar layered model. Five layers, each with a clear responsibility. When something breaks and it will, you know exactly which layer to look at. When you swap a component - new LLM provider, different vector DB - only one layer changes.

Interaction layer: Where users and triggers enter the system. Chat UIs, Slack bots, approval portals, webhooks, scheduled jobs. Keep this layer thin, it captures the request, identifies the user, injects context (role, permissions, tenant), and hands off to orchestration. The same AI capability should work through Slack, a web app, or an API without rebuilding any logic underneath.
Orchestration layer: The brain that decides what happens. Routes requests to the right workflow, enforces policies and budgets, manages agent state, runs HiL gates, handles retries and error recovery. This is where your LangGraph or CrewAI workflows live. It doesn't do the reasoning or call external systems directly, it coordinates the layers that do.
Intelligence layer: Where the AI thinks. Your LLM calls (via Bedrock, Azure OpenAI, or direct API), your RAG pipeline (Bedrock Knowledge Bases, Pgvector, Pinecone), your prompt registry, and your tool/policy registry that defines what the AI is allowed to do. Everything here is policy-aware,no unguarded LLM calls, no retrieval without access filters.
Connectivity layer: Governed gateway to your Systems of Record. This is where your custom-built system connectors live parameterized, idempotent, audited interfaces to CRM, billing, ERP, and internal services. Strict schemas, field-level access controls, pre/post-conditions on every call. This layer is why you can let an AI agent write to production systems without losing sleep.
Data layer: Everything the system needs to remember or reference. Knowledge store (vector DB, curated data marts - the long-term memory), state store (session data, workflow state, audit logs - the working memory), and event transport (Kafka, EventBridge - for publishing domain facts, keeping indexes fresh, and triggering downstream workflows).
Governance isn't a layer it wraps everything. Access control, policy enforcement, HiL gates, end-to-end tracing, data management, and cost tracking. If you can't trace a request from user input to final response and prove what happened at every step, you're not production-ready.

Why this layering matters: When your CRM connector breaks, you fix the connectivity layer, the orchestration and intelligence layers don't change. When you swap from OpenAI to Claude, you update the intelligence layer, everything else stays the same. When you add a new channel (Teams bot alongside Slack), you add to the interaction layer, nothing downstream cares. Clean separation means you can move fast without breaking things.

Putting it together - Building a Specialized Support Agent

Everything above is abstract until you see it work. Let's design a specialized support agent that handles customer concerns related to Product support, Billing, subscriptions, contract etc. "Customer says they were charged twice" and map every piece to the layers and the build vs buy decisions we just covered.

The problem

A support agent gets a ticket. Today they manually check the back end systems - CRM, ERP, Payment Gateway, Billing, cross-reference consumption records in the data warehouse, read the refund policy in Confluence, and decide whether to issue a credit. It takes very long time and and requires deep system knowledge.

How can we design, layer by layer

Interaction layer: The support agent types in Slack: "Customer Acme Corp says they were double-charged. Invoice INV-X." The layer identifies the user, confirms they have billing permissions, and passes the request to orchestration.
- Buy decision: Slack (existing), no custom UI needed for v1.
Orchestration layer: A LangGraph workflow receives the request and plans the investigation. It knows this is a billing discrepancy - route to the billing investigation graph. The nodes: retrieve policy → query billing system → query CRM → query data warehouse → synthesize → decide → act (if approved).
- Policy enforcer checks: this agent can investigate but writes need HiL approval above $4,000.
- Buy decision: LangGraph (adopt the framework), build the workflow graph and business rules.
Intelligence layer: RAG pipeline (Bedrock Knowledge Bases backed by S3) retrieves the refund and credit memo policies. LLM (Claude via Bedrock) interprets the request, plans which system connectors to call, and later synthesizes the findings:
- "Invoice INV-X and INV-Y both cover same period, INV-X was generated from a mid-cycle amendment that didn't cancel the original.
- Per refund policy, this qualifies for a $1,500 credit.
- Buy decision: Bedrock for LLM + RAG (buy), prompt templates and few-shot examples (build).
Connectivity layer: Governed connectors execute the calls to the billing platform, CRM, a governed read to the data warehouse for consumption records. Every call is parameterized (no open-ended queries), idempotent, and stamped with trace_id.
- Buy decision: this is the real build. No vendor provides governed, parameterized access to your specific systems with your field-level access rules.
- Fast MCP - Governed MCP to ERP, billing, CRM, data warehouse, etc
Data layer: Vector store holds the indexed policy documents. Data warehouse provides the curated data. Session store tracks the investigation state. After resolution, event transport publishes credit_memo.created and case.resolved
- Buy decision: Pgvector or Bedrock Knowledge Bases (buy), event transport on Kafka or EventBridge (buy/configure), domain event schemas (build).
- Bedrock Knowledge Bases with S3, or Pgvector

What's next

Start with pattern #1 - get retrieval right before anything else
Stand up shared services early - LLM gateway, observability, eval from day one
Build the thin client first - A slack bot calling LLM and querying your data warehouse is a valid v1.
Graduate through the patterns - each builds on the last
Revisit build vs buy quarterly - the landscape moves fast

The goal isn't the perfect AI platform. It's getting AI into users' hands in a way that's useful, safe, and maintainable - then iterating from there.