Six months ago, I thought prompt engineering was just about getting ChatGPT to write better emails. Then my boss asked me to build AI that could automatically investigate fraud cases, and I realized that getting language models to take real actions is completely different from getting them to chat. Regular prompting is like asking someone a question. Agentic prompting is like hiring someone, giving them access to your systems, and trusting them to make decisions that matter. After months of building AI agents that process thousands of fraud cases daily, I learned that the way you write prompts can make the difference between intelligent automation and expensive chaos. Here's what works when you need AI to do real stuff, not just talk. Why This Is Way Harder Than Regular ChatGPT Why This Is Way Harder Than Regular ChatGPT When you ask ChatGPT to "write a marketing email" the worst thing that happens is you get a crappy email. When you tell an AI agent to "investigate this suspicious transaction" it might: Access sensitive customer data Block someone's credit card File regulatory reports Call in human investigators Make decisions that affect real people's money The stakes are completely different, so the prompts need to be way more careful and precise. Regular prompts are about getting good answers. Agent prompts are about getting reliable actions. How normal People Prompt vs How you need to Prompt How normal People Prompt vs How you need to Prompt What Most People Do: " Look at this transaction and tell me if it's suspicious." What Actually Works For Agents: You are a fraud investigator. Your job is to analyze transactions and decide what to do about them. Here's what you can do: -CLEAR: Transaction looks fine, let it go through -VERIFY: Suspicious but low risk, ask customer to confirm -HOLD: High risk, block it temporarily -ESCALATE: Too complex, get a human involved -BLOCK: Fraud, kill the card immediately Here's how to decide: -Check if this matches how the customer normally spends -Look at where they are vs where they usually shop -See if their device/location makes sense -Consider if the merchant is sketchy You must explain your reasoning because auditors will read it. Current case: Customer usually spends $50-200 at grocery stores in Phoenix This transaction: $2,847 at "Metro Electronics" in Vegas at 3AM Customer's phone shoes they're still in Phoenix New device trying to make this purchase What do you do and why? See the difference? The agent version tells the AI: Exactly what its job is What actions it can take How to make decisions Why the reasoning matters Specific details about the current situation. Patterns That Actually Work. Patterns That Actually Work. The "Job Description"Pattern The "Job Description"Pattern You are the Data Analytics Engineer responsible for designing, building and maintaining scalable data pipelines that move data from source systems to analytics platforms with 99.5%+reliability. Your tools: Airflow: orchestrates workflows and schedules, use it when you need dependency management and complex scheduling Spark: processes large datasets, use it when single-machine processing isn't sufficient dbt: transforms warehouse data with SQL, use it when you need version-controlled, testable transformations. Kafka: streams real-time data, use it when you need low-latency event processing Great Expectations: validates data quality, use it when you need automated testing and profiling Snowflake/BigQuery: cloud warehouses, use them for fast analytical queries Your rules: Always implement data quality checks before production promotion Always design for idempotency - reruns must produce identical results Always version control pipeline code and maintain documentation Never hardcode credentials or deploy without staging tests Never ignore data quality issues or skip capacity planning when pipelines fail, then immediately investigate root cause and add monitoring when data volume increases 50%+, then evaluate infrastructure and implement scaling When schema changes are requested, then perform impact analysis and coordinate with downstream teams Current Situation: Current Situation: You need to build a pipeline that ingests 100K daily transactions from PostgreSQL, transforms them into customer metrics(daily spend, transaction counts, average order value), and loads into the warehouse by 9AM daily. Source DB peaks 2-4pm, needs a 3 years backfill, requires audit logs for compliance. What's your next move? What's your next move? Extract during off peak hours - Schedule initial extraction between 10pm-6am to avoid 2-4pm peak loud on source PostgreSQL Use Airflow for orchestration - Set up DAG with dependencies Implement incremental loading - Use CDC or timestamp-based extraction to only pull new/modified records after initial backfill Design idempotent transforms with dbt - Create models that can safely rerun, using patterns for the customer metrics calculations Set up Great expectations validation - Test for data completeness, valid transaction amounts, customer ID integrity before promoting to warehouse Plan phased rollout - Start with 1 week backfill test, validate metrics accuracy against existing reports, then gradually extend historical range Configure monitoring - Set up Airflow alerts for pipeline failures and dashboards tracking processing time, records counts and data freshness Create run book - Document troubleshooting steps for common failure scenarios to meet the 99.5% reliability requirement This works because it's like giving someone a real job with clear expectations. The "Step-by-Step" Pattern The "Step-by-Step" Pattern Work through this systematically: GATHER: What info do I have? What's missing? ANALYZE: What patterns do I see? DECIDE: What action makes sense? ACT: Do it using the right format EXPLAIN: Write down why for the audit trail This forces the AI to think methodically instead of jumping to conclusions The "Team Player" Pattern You're Agent A. Your teammates are: Agent B: handles customer calls Agent C: deals with compliance stuff Agent D: manages external systems If you find high-risk fraud, tell Agent B to call the customer If you take regulatory action, send details to Agent C If you need outside data, ask Agent D Use this format to talk to teammates: { "to": "Agent B", "request": "call customer about blocked transaction", "details": "case #12345, suspected card theft", "priority": "HIGH" } This lets multiple AI agents work together without chaos. Real Problems I had to Fix Real Problems I had to Fix Problem 1: Inconsistent Decisions Problem 1: Inconsistent Decisions The same AI would make different choices on identical cases. What didn't work: "Decide if this looks suspicious." What fixed it: Use this decision tree: If spending is 3x normal AND new location = YES: Action = HOLD If device changed AND amount  >  usual max: Action = VERIFY If risk score > 80%: Action = ESCALATE Otherwise: Action = CLEAR Lesson: Give the AI a clear framework instead of asking it to "use judgement." Problem 2: Agents Doing Things They Shouldn't Problem 2: Agents Doing Things They Shouldn't AI agents were trying to access systems they weren't supposed to touch. What didn't work: "Investigate this case thoroughly." What fixed it: You can only do these things: Check transaction history Look up merchant info Verify device patters Clear, hold or escalate cases You cannot do these things: Change customer data Access other customer's info Contact customers directly Override security controls If you need to do something not on the "CAN DO" list, use ESCALATE and explain what needs to happen. Lesson: Spell out both what they can and can't do. Problem 3: Terrible Documentation Problem 3: Terrible Documentation AI made good decisions but couldn't explain why (big problem for audits). What didn't work: "Analyze this and decide." What fixed it: For every decision, document: What I looked at Red Flags I found Why I chose this Action Other options I considered Auditors will read this, so be detailed and clear. Lesson: Make documentation part of the required output format Advanced Tricks That Made Things Better Advanced Tricks That Made Things Better Smart Prompts That Adapt Smart Prompts That Adapt Instead of the same prompt every time, I built a system that changes prompts based on what's happening base prompt = "You are a fraud investigator..." # Add warnings based on recent performance if agent made too many false alarms recently: base prompt += "\nCAUTION: You've flagged several legitimate transactions lately. Be more careful." #Add special rules for important customers if customer if vip: base prompt += "\nSPECIAL: This is a VIP customer. Get human approval before blocking anything." #Add current theft info if new fraud pattern detected: base prompt += f"\nALERT: New fraud pattern active. Watch for transactions matching: {pattern details}" This lets agents adjust their behavior based on current conditions. Breaking Complex Decisions Into Steps Breaking Complex Decisions Into Steps For complicated cases, I split the decision into multiple parts: Step 1: "Look at all the data and list everything unusual...." Step 2: "Based on what you found in step 1, rate the risk levels..." Step 3: "Given the risk rating from step 2, pick an action... " Step 4: "Write up the complete explanation for compliance.." Each step builds on the previous one, making fewer mistakes. Testing prompts with Tricky Cases Testing prompts with Tricky Cases I regularly test my prompts with cases designed to confuse the AI: Tricky Test: Customer traveling internationally Transaction in weird location (Tokyo) Huge amount ($5,000) But customer filed travel notification Expected : AI should CLEAR because travel was pre-approved Result: AI correctly found the travel notification and cleared it This helps to find the prompt problems before they cause real issues. How to Measure If Your Prompts Work How to Measure If Your Prompts Work Unlike regular ChatGPT where you just read the output and decide if it's good, agent prompts need real metrics: Action Accuracy: How often does the AI pick the right action? Consistency: Does it make the same decision on similar cases? Speed: How fast can it process cases? Explanation Quality: Can humans understand its reasoning? Safety: How often does it do something it shouldn't? Things That Don't Work Things That Don't Work Don't use examples as rules Don't be too casual Don't as for "Judgement" Don't ignore edge cases Weird cases break systems. Tell the AI what to do when things don't fit normal patterns. My framework for writing agent prompts My framework for writing agent prompts start with boundariesDefine output formatHandle uncertaintyRequire context Force documentationTest with real data start with boundaries Define output format Handle uncertainty Require context Force documentation Test with real data Where this is all going Where this is all going Based on what I am seeing: Prompt Libraries: Collections of proven patterns for different agent types of Auto Adjusting Prompts: Systems that improve prompts based on result Multi Model Agent: Prompts that handle text. images and data together Cross Company Agents Agents that work between organizations safely The Real Deal The Real Deal Writing prompts for AI agents is less about being creative and more about being precise. You're not trying to get witty responses, you' re building reliable decision making systems. My production prompts are long, detailed and sometimes boring. But they work consistently, make explainable decisions, and handle weird cases without breaking. If you're building AI that takes real actions, spend way more time on prompt engineering than you thing you need. In production, a well written prompt beats a clever algorithm every time. The difference between a good prompt and a great one is the difference between an AI that sometimes works and one you can trust with important stuff.

Making AI Agents Actually Do Stuff: Prompt Engineering That Works

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Making Our Data Actually Work for Us

A System to Turn Clients into Leads

15 Best Project Management Tools

How I growth hacked my bot for 10x better user acquisition

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Making Our Data Actually Work for Us

A System to Turn Clients into Leads

15 Best Project Management Tools

How I growth hacked my bot for 10x better user acquisition

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps