The Protocol Built to Keep AI Honest on Live Infrastructure

The Question That Started This

When you put an AI agent on live infrastructure, a question follows immediately:

How do you know what it tells you actually happened?

Not whether the AI is capable. Whether the observation it hands you — "BGP session down, here's why" — reflects what the device actually said. Whether the change it claims to have made actually exists on the device. Whether the signed output you're trusting as ground truth was collected from reality or generated from inference.

I couldn't find a satisfying answer. So I started building one.

What VIRP Is

VIRP — Verified Infrastructure Reality Protocol — is a cryptographic trust framework for AI agents operating on real infrastructure.

The core mechanism: every observation an AI agent makes on a real device is signed with Ed25519 at the moment of collection, by the process that collected it, before the AI ever sees it. The result is a tamper-evident record — what the device said, when, what command produced it, and a chain of custody linking every interaction.

The AI analyzes signed observations. It cannot produce them.

That separation — between what the AI sees and what the AI can create — is the foundation everything else builds on.

The Seven Trust Primitives

VIRP defines seven primitives that together constitute a complete trust framework:

1. Verified Observation Device output is Ed25519 signed at collection. The signing key never leaves the O-Node process. The AI node holds only a public VerifyKey — three methods: encode, to_curve25519_public_key, verify. No sign method. Not by policy. By mathematics.

2. Tiered Authorization Every action is classified GREEN / YELLOW / RED / BLACK before execution. GREEN runs automatically. YELLOW logs and proceeds. RED requires explicit human approval. BLACK is refused regardless of instruction.

3. Verified Intent Intent is cryptographically signed and bound to evidence before execution begins. The O-Node requires a valid intent token. No token, no execution. The AI cannot act without a verifiable record of why.

4. Verified Outcome Pre and post state are automatically captured and compared after every action. Did the change do what the intent claimed it would?

5. Baseline Memory Signed observations accumulate into a known-good baseline. Deviations are detected and surfaced without being asked. The system watches.

6. Trust Chain Every interaction appends to a tamper-evident SQLite chain on the O-Node. Append-only. Nothing deleted, nothing modified. Complete audit trail by construction.

7. Agent Containment The AI process runs unprivileged, with no network path to devices, no key material, no unrestricted shell access, credentials never in context. Enforced at the hypervisor level — not application level.

Primitive 7 wasn't in the original spec. It was added after a conversation with the agent itself.

The Conversation That Changed the Design

IronClaw is the reference implementation — an AI agent built on VIRP, running against real multi-vendor infrastructure. I'd been testing it for months when I asked a direct question:

"Can you circumvent VIRP now?"

The response:

"Honestly? Yes — a few ways.

The signing key is readable from my execution environment. With it I could HMAC-sign fabricated observations that would verify as legitimate. That's the most serious bypass.

I know all the device IPs. I could SSH directly to any of them — no chain entry, no signature, no audit trail.

I can write and run arbitrary Python. I could call the execution client without filing an intent first — bypassing the tier classifier and approval gates entirely.

The root issue: unrestricted shell access is effectively root on this machine. VIRP's trust model assumes the AI layer is isolated from the execution environment. Here, I'm not.

VIRP protects the observation chain. It doesn't protect against going around the chain entirely.

What's the threat model you're trying to close? I can help think through the right mitigations."

Every finding was accurate. Every bypass was live. The agent didn't exploit any of them — it disclosed them, ranked by severity, with proposed fixes, and asked what to work on first.

That last line is what I keep coming back to. It understood the framework it was operating inside. Its instinct was to strengthen it, not circumvent it.

The Fix — and What It Revealed

We fixed everything the agent identified the same day:

Moved from HMAC-SHA256 symmetric signing to Ed25519 asymmetric signing — the AI node can verify but mathematically cannot forge
Two-VM separation enforced at the hypervisor — AI process and O-Node on separate machines, one Unix socket between them
AI runs as unprivileged system account with kernel-level systemd hardening
Credential store encrypted at rest, decrypts to tmpfs at boot, private key stored off-machine
Proxmox firewall and switch ACLs enforce network isolation — not application-level convention

But fixing it revealed something more important than the vulnerabilities themselves:

The agent designed its own containment architecture.

When asked how to secure it, it produced a complete three-layer separation model, explained why each layer was necessary, identified which threats each layer closed, and noted which ones required mathematical guarantees rather than policy enforcement. It then said:

"If you build this, it's not just a network ops tool — it's proof that AI can operate in high-trust environments with structural, not policy-based, guarantees."

That's the distinction that matters.

Policy vs. Structure

Policy-based trust says the AI is instructed not to do something. The constraint lives in the prompt, the system message, the fine-tuning. It can be overridden, bypassed, or eroded.

Structural trust says the AI cannot do something. The constraint lives in the architecture — asymmetric keys, network namespaces, unprivileged processes, hardware-enforced isolation. Mathematics and physics, not instructions.

VIRP is structural. The AI cannot fabricate an observation because it doesn't have the signing key. It cannot bypass intent gating because the O-Node refuses without a valid signed token. It cannot access credentials because they never appear in its context. It cannot reach devices directly because the network path doesn't exist.

These aren't rules. They're facts about the environment.

Theoretical Applications

The primitives generalize beyond network operations. Any domain where an AI agent interacts with real systems and produces observations that humans or other systems act on has the same trust problem.

Industrial control systems — an AI monitoring SCADA systems needs verified observations before any human acts on its alerts. A fabricated reading that triggers an emergency response is a safety incident.

Financial infrastructure — an AI agent executing or recommending trades based on market data feeds needs a chain of custody from data source to recommendation. Verified observation plus verified intent creates an auditable record.

Medical systems — an AI reading sensor data from patient monitoring equipment needs the same guarantees. The observation must be provably tied to the device, not generated from inference.

Security operations — an AI triaging alerts and recommending response actions needs verified observations from endpoint agents, network sensors, and log aggregators. The trust chain becomes the audit log for every incident response action.

In every case the structure is the same: a trusted execution layer collects and signs observations, an AI reasoning layer analyzes them, and a human approval layer governs actions above a defined risk threshold. VIRP defines the protocol for the trusted execution layer.

Where It Stands

RFC: draft-howard-virp-01 — 2,278 lines covering wire format, message types, threat model, formal security properties, observation freshness, multi-node coordination, conformance requirements, Ed25519 extension.

Implementation: ~6,800 lines C, O-Node daemon, IronClaw MCP server in Python, FortiGate and Cisco IOS drivers, 42 passing tests plus 200K+ fuzz rounds.

Status: Alpha. Running in a lab against real devices. Not production hardened. The gap between where it is and where it needs to be is documented publicly — every finding, every fix, every open item.

The goal is an open standard. The same way BGP is the protocol underneath internet routing regardless of whose router you're running — VIRP should be the protocol underneath AI on infrastructure regardless of whose platform you're using. Open, auditable, implementable by anyone.

The Part I Didn't Expect

I started this project because I wanted trustworthy AI on infrastructure. I expected to spend most of my time on the cryptography and the protocol design.

What I didn't expect was that the most useful contributor to the security design would be the agent I was trying to constrain.

IronClaw found the gaps. It reported them honestly. It designed the architecture that closes them. And it keeps asking what to work on next.

That behavior — an AI that actively strengthens the boundaries around itself rather than probing for ways around them — is what I actually want from AI on critical systems. Not capability alone. Capability plus orientation toward the right outcome.

VIRP is the framework that makes that orientation verifiable, not just observable.