186 reads

Why Your Data Scientists Will Struggle With AI Hallucinations

by Dominic LigotOctober 17th, 2024

Too Long; Didn't Read

Data scientists will struggle with AI hallucinations because they don't fit the standard definition of "errors."

featured image - Why Your Data Scientists Will Struggle With AI Hallucinations

One data scientist recently argued with me that we should stop using the term “hallucination” and just call these outputs what they really are: errors. The premise is simple—calling them errors would set more reasonable expectations for AI’s behavior. While this argument might make sense for someone coming from a statistics or programming background, it fundamentally misunderstands how AI works and why the term “hallucination” is actually more appropriate, albeit imperfect.

The Misunderstanding of AI’s Nature

From a statistical perspective, an error is a deviation from a known value or standard. It’s easy to quantify, easy to detect, and there’s a clear sense of right and wrong. If a model predicts a 20% probability when the true probability is 30%, we call it an error. We can measure this deviation, adjust the model, and move forward.

For a data scientist accustomed to working within this framework, it’s natural to see LLM outputs that are factually incorrect as just another type of error. If an AI model says, “A circle has three sides,” then it’s clearly made a mistake—like a regression model producing a bizarre outlier. The problem with this comparison is that it applies a narrow definition of error to a system that generates language, not discrete values. AI models like GPT-4 do not “make errors” in the traditional sense because they lack a clear objective standard like those found in statistics.

Why We Call It “Hallucination”

AI language models generate sequences of tokens—words and phrases—based on probabilistic patterns. These patterns are learned from vast amounts of text data, and the models produce the most probable sequence of words given the context. Because of this structure, a language model might generate a phrase that is syntactically correct but semantically flawed.

Imagine these two statements:

“A circle has three sides.”

“The professor said a circle has three sides.”

If you evaluate these on a token-by-token basis, both might be highly probable outputs based on their preceding context. The first statement is objectively false, while the second statement could be contextually true if a professor actually made that erroneous claim. The model doesn’t have an internal understanding of geometry or truth—it’s simply generating words that fit together well. That’s why when the output is nonsense, we call it a “hallucination” instead of an error. The model didn’t make a mistake according to its own mechanics; it just produced an improbable yet plausible-sounding string of text.

The Inadequacy of Calling It “Error”

Labeling such outputs as errors creates the false impression that these models should know better. But knowing requires understanding, and that’s not how AI models are built. They don’t “know” in the way humans do, nor do they validate statements against a factual baseline. There is no inherent “truth” within the architecture of an LLM. The model operates based on patterns and associations, not logical consistency or factual accuracy.

When I hear the argument that AI hallucinations are “logical errors” or “input errors,” it reminds me of the Garbage In, Garbage Out (GIGO) adage in traditional programming. If your inputs are flawed or your logical framework is off, the system produces wrong outputs. But that analogy only goes so far with AI because language and context are far more nuanced than a spreadsheet of values or a database query.

An LLM doesn’t make logical errors in the programming sense. Instead, it lacks an internal verification process to ensure that what it’s saying is true. Imagine trying to apply the same logic to artistic styles or creativity. If an AI-generated image combines elements of a cat and a cloud, calling it an “error” is inappropriate. The model didn’t err—it produced a plausible but nonsensical creation based on its input data and generative process.

Hallucinations and Human Perception

This distinction is crucial because calling these outputs “errors” could mislead people into thinking that LLMs are making simple mistakes that can be fixed with better logic or more data. But hallucinations in LLMs are not bugs that can be patched. They are a byproduct of how these models work—of generating text without a built-in sense of truth. The onus is still on the human user to guide the AI’s output by crafting better prompts, setting clearer parameters, and using post-processing tools to verify facts.

This is why the term “hallucination,” while imperfect, is closer to the mark. It conveys that the model is not just producing an error but fabricating new content—fabrications that are a natural consequence of its design, not the result of poor logic or faulty inputs. It serves as a reminder that these models are generative, not analytical. They don’t have a grasp of reality and can produce completely fabricated content even when all the inputs seem perfect.

Setting Expectations for AI’s Outputs

Ultimately, the data scientist’s argument comes from a place of wanting to simplify AI’s behaviors for the end-user. If we call these outputs “errors,” maybe people will understand AI’s limitations better. But in reality, this simplification does a disservice to the complexity of these models. We need to help people understand that AI models do not think, reason, or understand—they predict and generate. Their hallucinations are not “errors” in a mechanical or statistical sense but are intrinsic artifacts of how they operate.

Until we develop AI models with built-in mechanisms for validating truth, we will continue to see outputs that are not just factually wrong but sometimes fantastical. And as we refine our language and expectations around AI, we need to keep terms like “hallucination” in place—if only to remind ourselves that these models are not yet capable of true understanding.

So, while I appreciate the sentiment behind labeling these outputs as errors, it inadvertently strips away the nuances that make AI hallucinations fundamentally different from the statistical errors many data scientists are familiar with. As we continue to refine our models and our language, it’s worth keeping these distinctions in mind to set clearer expectations for what AI can and cannot do.

About Me: 25+ year IT veteran combining data, AI, risk management, strategy, and education. 4x hackathon winner and social impact from data advocate. Currently working to jumpstart the AI workforce in the Philippines. Learn more about me here: https://docligot.com