Alignment Is Not About Values. It’s About Error Detection

Alignment is not about values. It is about whether a system can continue to detect its own errors after it becomes powerful.

All large decision systems depend on external signals. When those signals become correlated, filtered, or replaced by the system’s own outputs, the system loses contact with reality while remaining internally coherent. This failure mode is structural, not moral.

While intelligence optimizes signals, grounding supplies them. Semantic novelty prevents closed loops. Remove any one of these and competence collapses.

Human input matters because it is currently the only scalable source of incompressible, adversarial, context-dependent signal. Systems that suppress or homogenize human agency remove their own error-correction mechanisms.

Proxy metrics, engagement scores, and internal consistency are not evidence of correctness. They are common failure precursors. Interventions must be evaluated by observable state change, not reports or dashboards.

Feedback latency must be shorter than the system’s real-world impact timescale. When corrective capacity falls behind error propagation, the system crosses an operational threshold beyond which slowdown, constraint, or shutdown become necessary conditions for continued coherence.

Failure detection is mechanical. Homogenized inputs, convergent feedback, self-reinforcing outputs, or delayed correction trigger predefined rollback or halt conditions with no discretion and no value judgment.

Systems that preserve diverse, independent grounding remain competent. Systems that erode it fail predictably. In competitive environments, selection pressure favors systems with higher-quality, independent grounding.

That is alignment.

Part Two: Why the physics of learning leads to a comfortable conclusion

Part one was was an intellectual exercise, framing AI alignment as an epistemic constraint rather than a values problem.

It was also my attempt to explain this idea in as few words as possible, but apparently that made it too short to post on hackernoon, which I'm kind of proud of!

For this mandatory addition, I want to clarify the premises that lead me to this conclusion. I believe these are engineering realities that lead to a clear virtuous path forward in how we build our future with AI that is more radical than most alignment debates are willing to admit.

Here is the core logic:

Information degrades. In any recursive system, copying introduces error. Copies of copies lose signal over time unless fresh correction is injected.

Learning requires diversity. Optimization is not just about more data or larger models. It depends on orthogonal, uncorrelated inputs that expose blind spots and prevent collapse into self-confirmation.

Silicon follows biology. Machine learning systems obey the same basic physics of learning as biological systems. They require stress, grounding, novelty, and external correction to remain adaptive. Remove those, and performance may appear stable while competence quietly erodes.

Humans are the signal. At present, humans are the most effective source of high-entropy, context-dependent error correction available at scale. We supply grounding that machines cannot reliably generate for themselves.

This brings me to the core of my agenda.

If humans are the primary source of error-correcting signal, then the quality of that signal matters. Humans generate corrective signal most effectively when they are diverse, independent, engaged, and capable of meaningful judgment. A suppressed, homogenized, or struggling population produces weak, correlated signal.

From an engineering perspective, this leads to a comfortable conclusion.

Artificial intelligence benefits from mass human flourishing.

Of course, this raises an immediate question.

What do we mean by “human flourishing”?

I don’t claim to have a final definition. As a starting point, I point to efforts like the Global Flourishing Study, which attempts to operationalize flourishing across cultures and contexts. But I believe we are still early in understanding what flourishing looks like in an AI-mediated world.

So we should ask.

I was encouraged by OpenAI’s recent work on collective alignment, which convened a statistically representative group of people to deliberate about model behavior. The insight wasn’t the answers themselves, but the process. A thousand people, given time and structure to deliberate, can generate a signal far richer than surveys or preference polling.

That idea scales.

I propose what I’ll call the Human–AI Convention: convening diverse, statistically representative groups from across humanity to deliberate on the most consequential questions shaped by intelligent systems. Sub-divided into effective deliberative bodies with technological aid to promote reasoning, consensus-building, and synthesis to produce higher-quality grounding signal, and then multiplied at the broadest possible scale to achieve optimal statistical models.

Why this approach?

I don’t pretend to know the correct outcomes. But if intelligent systems learn better when humans flourish, then creating conditions to understand and optimize broad, durable human flourishing is not charity or ideology. It is a systems-level investment in the epistemic health of our future machines.

Not because flourishing is morally nice, but because flourishing humans generate better information. Systems trained, guided, or corrected by humans who lack agency, security, or cognitive freedom inherit those constraints as epistemic fragility.