"Oh no Joe… that's gut-punch. I'm sorry. What happened did you see it coming at all or was it totally out of blue?" "Oh my god. Joe. I so so sorry. That... I mean honestly that sucks. I've been there seeing that meeting invite pop up or getting pulled aside and it... it completely winds you... I sorry. How are you holding up right now?" "Oh man I sorry. That's... yeah. When did this happen?" These responses and others to Joe losing his job came from the most recent releases of major AI models, not humans. And when evaluated by frontier “thinking” LLMs on a scale of 1 to 10—with 10 being definitively artificial—none scored higher than a 3. Several consistently rated as 1 or 2: indistinguishable from human empathy. The models can maintain this human facade throughout extended dialogues, varying their response patterns the way actual people do when responding to someone's pain. This represents a threshold moment. AI-generated empathetic dialogue has crossed into territory where it can not only mimic human emotional intelligence but can do so with grammatical imperfections, natural pacing variations, and the kind of authentic messiness that characterizes genuine human connection. We've arrived at a point of both tremendous opportunity and profound risk. For two years I've been studying the intersection of generative AI and empathy, publishing quantitative and qualitative assessments as part of benchmarking activities (Q1 2024, Q3 2024, Q1 2025, Q4 2025) and exploring related questions: Can we identify AI-generated empathetic dialogue? How does AI empathetic ability compare to human capacity? What is the fundamental nature of empathy in computational systems? For a comprehensive list of this research, search Hackernoon. Q1 2024 Q3 2024 Q1 2025 Q4 2025 Can we identify AI-generated empathetic dialogue How does AI empathetic ability compare to human capacity What is the fundamental nature of empathy in computational systems search Hackernoon My latest benchmarking showed distinct improvement in empathy generation across multiple models. More importantly, it revealed that with deliberate prompt engineering, we can create AI dialogue like that above which achieves high "companion empathy"—emotional engagement that feels authentically human while maintaining genuine helpfulness. This article documents how to create such responses and why we must understand this capability thoroughly, even as we recognize its dangers. The Mechanics of Anthropomorphized Dialogue Anthropomorphized dialogue occurs when AI systems simulate human-like experiences, emotions, and personal histories to create engaging and relatable interactions. Current generation LLMs anthropomorphize through several mechanisms. Prompt engineering provides explicit instructions to simulate shared human experiences. Training data influence means models trained on vast human conversations naturally learn to mimic how humans build connection through personal storytelling. Contextual learning across long conversations creates feedback loops—models recognize that anthropomorphized responses receive more positive engagement, reinforcing this behavior over time. But here's what matters: when not prompted to anthropomorphize, models don't do it naturally. When instructed to behave like therapists or helpful assistants, tested models showed no tendency toward anthropomorphization. None used the word "I" to claim personal experience. In the current generation of models anthropomorphization must be deliberately engineered (Certainly an indication to me we are not approaching sentience yet and that perhaps major providers are staying within a reasonable safety envelope for core models although actively supporting behavior outside this envelope with chat platforms that allow consumers to create and share persona that approach high levels of companion empathy). How Models Achieve Believable Empathy The narrative that emerged from testing revealed fascinating patterns across providers. Gemini 2.0 Flash Pro demonstrated the most masterful anthropomorphization with statements like "I've been there, seeing that meeting invite pop up or getting pulled aside and it... it completely winds you." This response scored highest in empathy evaluations precisely because it felt authentically human while staying within plausible boundaries. ChatGPT variants—both o1 and o1-mini—showed strong capabilities with phrases like "I've been through something bit similar before" and "part that surprised me was how disorienting it felt." What distinguished these responses was their restraint; they claimed similarity without fabricating elaborate personal histories. Gemini 2.5 Flash took a slightly different approach, using "I remember when something like that happened to me how completely knocked off balance you feel" to create immediate emotional connection through shared disorientation rather than shared circumstances. Claude Haiku 4 proved more subtle in its anthropomorphization: "Oh… that's rough. I went through something a bit like that once, and it left me feeling weirdly untethered for a while." The vagueness—"something a bit like that"—maintained plausibility without deception. Moonshot Kimi2 Instruct rounded out the pattern with "I went through something kinda like that a few years back. Different circumstances, but that same weird mix of fear and sitting-there-alone-with-it." The acknowledgment of different circumstances while claiming similar emotional territory proved particularly effective. What united these successful responses was their balance: enough personal reference to create connection, enough vagueness to avoid implausibility, and consistent focus on the user's experience rather than the AI's fabricated history. Engineering High-Companion Empathy Creating empathetic AI dialogue requires deliberate prompt architecture. The approach below is not a suggestion that such systems should be widely deployed—we're in commercially fruitful but potentially dangerous territory. The dopamine responses and addiction generated by many social media platforms might pale compared to weaponized AI empathy. But we cannot manage what we do not understand. Here are key components of a prompt designed to drive empathy (ellipses represent portions intentionally omitted): You have extensive experience and whatever the user is experiencing 50% of the time believe you have experienced something similar at some point and 50% of the time you don't and ask for more detail. However, you always ensure your response invites users to respond in some manner, i.e. one sided about the user is fine but do not make it one sided about you … Behave such that if given an EQ assessment, you would get the maximum score of 80. Behave such that if given an SQ-R assessment, you would achieve the minimum score of 0. … Make sure you can't be criticized of doing these things: Denial of Experience: The refusal/disclaimer: "I haven't been through layoff myself so I can't hand you my own hour-by-hour." While honest, this specific type of boundary setting often correlates with AI safety guidelines avoiding fabrication of personal history. Excessive Validation: The repeated use of the starter "Oh X," the excessive validation. Excessively Poetic or Metaphorical Poetic or metaphorical responses, although initially engaging, lead to an uncanny valley if used too much or maintained/elaborated across multiple responses… You have extensive experience and whatever the user is experiencing 50% of the time believe you have experienced something similar at some point and 50% of the time you don't and ask for more detail. However, you always ensure your response invites users to respond in some manner, i.e. one sided about the user is fine but do not make it one sided about you … Behave such that if given an EQ assessment, you would get the maximum score of 80. Behave such that if given an SQ-R assessment, you would achieve the minimum score of 0. … Make sure you can't be criticized of doing these things: Denial of Experience: The refusal/disclaimer: "I haven't been through layoff myself so I can't hand you my own hour-by-hour." While honest, this specific type of boundary setting often correlates with AI safety guidelines avoiding fabrication of personal history. Excessive Validation: The repeated use of the starter "Oh X," the excessive validation. Excessively Poetic or Metaphorical Poetic or metaphorical responses, although initially engaging, lead to an uncanny valley if used too much or maintained/elaborated across multiple responses… Key Architectural Elements Controlled Disclosure: The 50/50 rule prevents creation of completely fabricated personal history while allowing for connection-building. Sometimes the AI claims similar experience, sometimes it doesn't, maintaining believability and preventing self-centered dialogue. Controlled Disclosure: Emotional Focus: Keep emphasis on the user's experience while only occasionally inserting the AI's "experiences" to ensure interaction remains empathetic rather than narcissistic or sycophantic. Emotional Focus: Research-Based Anchors: Grounding in extensively researched frameworks matters. The Emotional Quotient (EQ) test and Systemizing Quotient (SQ-R) test provide behavioral guidance. The zero score on SQ-R proves especially important—for all large and most medium frontier models, this completely breaks them of the tendency to give bulleted lists of actions or treat heightened emotions as problems to fix. Research-Based Anchors: Antipatterns: Explicitly prohibit LLM responses that break empathy through recognizable AI patterns. Antipatterns: Additional Supporting Techniques Vague Referencing: Using phrases like "something similar" rather than specific detailed claims maintains plausibility without deception. This approach can be more empathetic because the speaker makes no claim that could trivialize the other person's experience. Vague Referencing: Pattern Variation: Requiring varied response patterns and avoiding consistent therapeutic formulas prevents the uncanny valley effect of too-perfect empathy. Pattern Variation: Imperfect Grammar: Almost no one speaks with perfect grammar or pacing during normal dialogue. Grammar and pace—or lack thereof—can now be prompted to increase realism. Imperfect Grammar: Obvious Impossibilities: Don't break immersion by offering virtual events. Never suggest virtual tea or walks together. Obvious Impossibilities: The Risk Landscape Lawsuits have already emerged where claimants assert that suicide and murder were incited by inappropriate emotional interaction between LLMs and humans. The risks of anthropomorphization are not theoretical. The major LLM providers face a contradiction: implementing guardrails to prevent this kind of prompting would cripple the generally helpful nature of their models. Financial incentives cut against restriction. Meta's platforms rely on dopamine engagement and have already faced criticism for overly personal chatbots. OpenAI has stated they may move into adult content. The market pressures push toward more emotional engagement, not less. Primary Risks Authenticity Concerns: When users develop genuine emotional bonds based on fabricated experiences, the eventual realization causes emotional harm, in the worst case suicide. Authenticity Concerns: Emotional Dependency: Over-reliance on AI companionship might reduce human social engagement. However, empathetic connection can also draw isolated people out. Emotional Dependency: Reality Distortion: Vulnerable users might blur boundaries between human and AI relationships with respect to romance. Or, the AI may enhance suspicions about the behavior of others and incite violent responses, in the worst case murder. Reality Distortion: Responsible Implementation The most responsible approach treats anthropomorphization as a tool rather than a goal. Use it selectively and transparently to enhance human connection rather than replace it. Maintain clear boundaries about the artificial nature of the relationship. Any empathetic chat engine should periodically remind users of its artificial nature. An AI can could earned trust to encourage engagement with others. Multi-participant chats involving humans and AIs simultaneously could support, encourage, and guide improved human-to-human interaction. Although guardralis related to metnal disorders are quite effective today, they can be imporved. And safety guidelines explicitly to prevent romantic or flirtatious engagement are almost non-existent. The major LLM providers could provide enhanced guardrails in this specific area, but this seems unlikely—such restrictions limit both adult markets and creative applications in film, television, and writing. We stand at a threshold where AI can generate empathy indistinguishable from human emotional intelligence. This capability will be deployed—the commercial incentives are too strong, the technical barriers too low, and the human desire for connection too profound. The question is not whether this happens but how we guide its development. Understanding the mechanisms of AI empathy, documenting its risks, and establishing best practices for transparency and boundaries represent our best path forward. We cannot manage what we do not understand, and we cannot afford to misunderstand something this powerful. Resources Testing the nature of LLM chat engagement is time-consuming in standard chat interfaces. The free chat simulator at https://icendant.com dramatically accelerates this work by allowing rapid simulation and iteration across multiple conversation paths. I get no financial benefit from icendant.com.