Why Bolting AI Onto Existing Processes Doesn't Work And What to Do Instead Deloitte is a world-class consulting firm with decades of expertise. They know how to deliver complex compliance reviews. They have quality processes, multiple review layers, and partner sign-offs. So how did they ship a government report with fabricated citations, invented court cases, and references to academic papers that don't exist? The answer isn't that their AI was bad. The answer is that they bolted AI onto an existing process without redesigning the system around it. And that's the mistake hundreds of organizations are making right now. What "Bolting On" Looks Like Here's what probably happened at Deloitte: Someone said: "What if we use AI to help write reports? It could speed up research and drafting." That sounds reasonable. So they got access to Azure OpenAI, told people they could use it, and assumed the existing review processes would catch any problems. The architecture looked something like this: Consultant uses AI to draft sections AI generates citations and legal references Consultant includes that content in the report Report goes through normal review process Report goes to client Consultant uses AI to draft sections AI generates citations and legal references Consultant includes that content in the report Report goes through normal review process Report goes to client The problem: steps 2–4 assumed humans would carefully verify every AI-generated citation. But the system didn't require it, didn't make it easy, and didn't track whether it happened. Result: a report with hallucinated content made it to a government client. Why This Architecture Fails Why This Architecture Fails The fundamental mistake is treating AI like a fancy word processor instead of what it actually is: a probabilistic system that generates plausible-sounding text regardless of whether it's accurate. Think about how the architecture should work: For Human-Written Content: For Human-Written Content: Human makes claim Human provides evidence Reviewer checks claim against evidence If accurate, approve; if not, reject Human makes claim Human provides evidence Reviewer checks claim against evidence If accurate, approve; if not, reject For AI-Generated Content: For AI-Generated Content: AI makes claim AI generates "evidence" (sometimes fabricated) Reviewer needs to verify both claim AND evidence But AI-generated evidence looks legitimate Reviewer may not realize it needs independent verification AI makes claim AI generates "evidence" (sometimes fabricated) Reviewer needs to verify both claim AND evidence But AI-generated evidence looks legitimate Reviewer may not realize it needs independent verification The process was designed for humans who don't fabricate sources. It doesn't work when the content generator confidently invents references that sound real. What Deloitte Should Have Built Instead Here's the architecture that would have prevented this problem. It's not complicated, it's just designed around how AI actually works. Layer 1: Separate Facts from Generation The Problem: AI was generating citations from its training data memory, which is unreliable. The Problem: The Solution: Separate where facts come from and what AI does with them. The Solution: Consultant needs to support a claim ↓ Search Deloitte's knowledge base (past reports, verified research) ↓ System returns actual citations from verified sources ↓ AI formats those citations into readable text ↓ Consultant includes in report Consultant needs to support a claim ↓ Search Deloitte's knowledge base (past reports, verified research) ↓ System returns actual citations from verified sources ↓ AI formats those citations into readable text ↓ Consultant includes in report In this architecture, AI never generates citations. It only formats real citations from real sources. Can't hallucinate what it doesn't control. Layer 2: Validation Gates The Problem: Review processes existed but weren't enforced. The Problem: The Solution: Make validation automatic and mandatory. The Solution: Content enters system ↓ Automated check: Does every citation have a verifiable source? ↓ If yes → proceed If no → flag for human review (can't proceed without it) ↓ Human verifies flagged items ↓ Only then can content be included in final report Content enters system ↓ Automated check: Does every citation have a verifiable source? ↓ If yes → proceed If no → flag for human review (can't proceed without it) ↓ Human verifies flagged items ↓ Only then can content be included in final report This isn't about trusting people to do reviews. It's about making the system incapable of producing a final report until validations pass. Layer 3: Risk-Based Controls The Problem: All reports were treated the same, regardless of stakes. The Problem: The Solution: Different controls for different risk levels. The Solution: New project starts ↓ System asks: Who's the client? What's the subject? What's the risk? ↓ If high risk (government, regulatory, legal): - AI can only be used for drafting, not final content - All citations must be independently verified - Partner must review and approve every section - System tracks compliance with all requirements ↓ If low risk (internal report, preliminary analysis): - AI can generate more freely - Spot-check verification acceptable - Standard review process sufficient New project starts ↓ System asks: Who's the client? What's the subject? What's the risk? ↓ If high risk (government, regulatory, legal): - AI can only be used for drafting, not final content - All citations must be independently verified - Partner must review and approve every section - System tracks compliance with all requirements ↓ If low risk (internal report, preliminary analysis): - AI can generate more freely - Spot-check verification acceptable - Standard review process sufficient Deloitte's report was high risk: government client, compliance framework, legal citations. It should have triggered maximum controls. Instead, it was treated like any other project. Layer 4: Audit Everything The Problem: When errors emerged, couldn't quickly identify what AI generated. The Problem: The Solution: Track everything at the time of creation. The Solution: Every piece of content tagged with: Source: Human wrote this / AI generated this / AI assisted If AI: what prompt was used, what was generated, what was edited Verification status: Verified / Needs review / Not yet checked Approver: Who signed off on including this Source: Human wrote this / AI generated this / AI assisted If AI: what prompt was used, what was generated, what was edited Verification status: Verified / Needs review / Not yet checked Approver: Who signed off on including this This isn't about surveillance. It's about being able to answer basic questions: "Where did this claim come from? Who verified it? Who approved it?" The Pattern That Works Strip away the details and here's the architecture that prevents these problems: Use AI for what it's good at AI excels at language tasks: understanding questions, formatting information, summarizing content, suggesting phrasing. AI is terrible at facts: It will confidently state things that aren't true because it's generating plausible text, not retrieving verified information. Architecture implication: Use AI for interfaces and formatting. Use databases and search systems for facts. Validate before, not after Don't generate content and then try to check if it's accurate. Get accurate information first, then use AI to present it. This is the difference between: "AI, write a section about compliance frameworks" (and then verify everything it said) "Here are three past compliance reviews. AI, synthesize the common findings" (facts are already verified) The second architecture is cheaper, faster, and more reliable. Make validation automatic Humans are busy. Humans make assumptions. Humans skip steps when deadlines loom. Design systems that won't proceed without validation, not systems that rely on people remembering to validate. Match controls to risk Not every use of AI needs extensive controls. Internal brainstorming? Low risk, minimal controls. Client deliverables? High risk, extensive controls. Build the architecture to enforce different workflows based on what's at stake. Use AI for what it's good at AI excels at language tasks: understanding questions, formatting information, summarizing content, suggesting phrasing. AI is terrible at facts: It will confidently state things that aren't true because it's generating plausible text, not retrieving verified information. Architecture implication: Use AI for interfaces and formatting. Use databases and search systems for facts. Use AI for what it's good at AI excels at language tasks: understanding questions, formatting information, summarizing content, suggesting phrasing. Use AI for what it's good at AI is terrible at facts: It will confidently state things that aren't true because it's generating plausible text, not retrieving verified information. Architecture implication: Use AI for interfaces and formatting. Use databases and search systems for facts. Validate before, not after Don't generate content and then try to check if it's accurate. Get accurate information first, then use AI to present it. This is the difference between: "AI, write a section about compliance frameworks" (and then verify everything it said) "Here are three past compliance reviews. AI, synthesize the common findings" (facts are already verified) The second architecture is cheaper, faster, and more reliable. Validate before, not after Don't generate content and then try to check if it's accurate. Get accurate information first, then use AI to present it. Validate before, not after This is the difference between: "AI, write a section about compliance frameworks" (and then verify everything it said) "Here are three past compliance reviews. AI, synthesize the common findings" (facts are already verified) "AI, write a section about compliance frameworks" (and then verify everything it said) "Here are three past compliance reviews. AI, synthesize the common findings" (facts are already verified) The second architecture is cheaper, faster, and more reliable. Make validation automatic Humans are busy. Humans make assumptions. Humans skip steps when deadlines loom. Design systems that won't proceed without validation, not systems that rely on people remembering to validate. Make validation automatic Humans are busy. Humans make assumptions. Humans skip steps when deadlines loom. Make validation automatic Design systems that won't proceed without validation, not systems that rely on people remembering to validate. Match controls to risk Not every use of AI needs extensive controls. Internal brainstorming? Low risk, minimal controls. Client deliverables? High risk, extensive controls. Build the architecture to enforce different workflows based on what's at stake. Match controls to risk Not every use of AI needs extensive controls. Internal brainstorming? Low risk, minimal controls. Client deliverables? High risk, extensive controls. Match controls to risk Build the architecture to enforce different workflows based on what's at stake. Track everything, always You can't manage what you don't measure. You can't debug what you didn't log. You can't improve what you don't track. Track everything, always This isn't optional for AI systems. The technology is too new, the failure modes too unpredictable, and the stakes too high to run blind. The Real Competitive Advantage Here's what most organizations miss: Deloitte's actual advantage isn't access to GPT-4. Anyone can pay for that. Deloitte's advantage is decades of past reports, methodologies, case studies, and expertise. That's proprietary. That's valuable. That's what clients pay for. The right architecture would have: Taken all those past reports and built a searchable knowledge base Trained specialized models on Deloitte's specific methodologies Used AI to help consultants find relevant past work instantly Used AI to ensure new reports are consistent with Deloitte's standards Used AI to draft sections based on verified past content Taken all those past reports and built a searchable knowledge base Trained specialized models on Deloitte's specific methodologies Used AI to help consultants find relevant past work instantly Used AI to ensure new reports are consistent with Deloitte's standards Used AI to draft sections based on verified past content That architecture would make consultants faster and more consistent while eliminating hallucination risk. Why? Because the AI would be working with Deloitte's actual proprietary knowledge, not trying to recreate it from training data. Instead, they used generic GPT-4 and hoped it would "know" about compliance frameworks and Australian case law. It didn't. It couldn't. And the architecture didn't prevent that from causing problems. The Three Questions Every Leader Should Ask Question 1: "Where does our AI get its information?" Question 1: "Where does our AI get its information?" If the answer is "from its training data" or "it generates it," you have hallucination risk. Better answer: "It retrieves information from our verified databases and formats it for users." Question 2: "What happens if AI produces something wrong?" Question 2: "What happens if AI produces something wrong?" If the answer is "someone should catch it in review," you're relying on humans to catch machines being confidently wrong. That fails. Better answer: "The system validates outputs against source data before they can be used." Question 3: "Can we show what AI did and who approved it?" Question 3: "Can we show what AI did and who approved it?" If the answer is "probably not in detail," you can't do incident analysis or compliance audits. Better answer: "Yes, we log all AI interactions and track the approval chain for anything AI-touched." What This Means for Your Organization You're probably not writing government compliance reports. But you might be: Using AI to draft customer communications Using AI to analyze business data Using AI to generate code Using AI to summarize documents Using AI to make recommendations Using AI to draft customer communications Using AI to analyze business data Using AI to generate code Using AI to summarize documents Using AI to make recommendations In every case, the same architectural principles apply: Don't let AI generate facts. Let it work with facts you've verified. Don't let AI generate facts. Don't rely on human review to catch errors. Design systems that validate before content is used. Don't rely on human review to catch errors. Don't treat all use cases the same. High-risk activities need different architecture than low-risk ones. Don't treat all use cases the same. Don't skip the audit trail. You need to know what AI did and who approved it. Don't skip the audit trail. The Path Forward Good news: you don't need to build everything at once. Start with your highest-risk AI usage and ask: Where could this go wrong? Would we know if it did? Could that happen without someone catching it? Where could this go wrong? Would we know if it did? Could that happen without someone catching it? If the answer to #3 is yes, you need architectural changes, not better training or clearer policies. Deloitte learned this lesson publicly and expensively. You can learn it privately and proactively. The choice is yours. The fundamental principle: AI should enhance human work within a system designed for its limitations, not replace human work within a system designed for human strengths. Get the architecture right, and AI becomes a powerful tool. Get it wrong, and you're one mistake away from a very public, very expensive failure. The fundamental principle: AI should enhance human work within a system designed for its limitations, not replace human work within a system designed for human strengths. Get the architecture right, and AI becomes a powerful tool. Get it wrong, and you're one mistake away from a very public, very expensive failure.