Screenbox Solves the AI Agent Collision Problem

The collision problem

Mid-2025. Dozens of AI agent sessions running from one workstation. I'd kick off three Claude sessions in parallel -- one researching, one writing code, one filling out forms in a browser. Worked fine until it didn't.

Agent A downloads export.csv. Agent B overwrites it at the same path ten seconds later. Neither knows. The data corruption is silent. I only noticed when results stopped making sense and I spent an hour tracing it back to a file that had been written twice.

Then there's the screen. A CPU-heavy task in one session would freeze screenshots in another. The voice-to-Claude workflow I'd been using was generating projects faster than a single desktop could handle. Alt-tabs colliding, windows stealing focus, clipboard getting hijacked mid-paste.

I needed isolation. Real isolation, not tabs.

One container, one desktop

The fix: give each agent its own Linux desktop inside a Docker container. Not a browser sandbox. A full desktop -- real Chromium, real file system, real terminal. Each container runs at roughly 2GB of RAM, which adds up fast but means agents can't step on each other.

I wrapped it in 21 MCP tools. Screenshot, OCR, click, type, shell execution, Chrome page reading. Claude Code or Cursor connects with one config line. No Playwright. No Puppeteer. Just a desktop that the agent drives like a human would, except through tool calls instead of hands.

The architecture is simple. Each agent gets a container ID. Tools route to the right container. Agents don't share file systems, don't share screens, don't share processes. The collisions disappeared on day one.

What actually happened in production

Four months in. The isolation works exactly as intended -- zero file collisions, zero cross-agent interference since deploying. That part I'm confident about.

Chrome stability is another story. I've logged 847 Chrome restarts. That's not a feature, that's a workaround. Chromium inside containers crashes -- tabs hang, GPU contexts die, memory leaks accumulate. So I built auto-restart logic: detect crash, kill the process, relaunch, restore the URL. It works, but 847 is an honest number, not a proud one.

RAM is the real constraint. At 2GB per desktop, running 10 parallel agents means 20GB just for the desktops before you count the models, the host OS, anything else. I've run 5 to 20 simultaneously depending on the hardware. A 64GB machine handles it. A laptop does not.

Knowledge compilation caught me off guard. Each session generates action logs -- what was clicked, what was typed, what succeeded, what failed. I built a pipeline that compiles those logs into facts and injects them into the next session's context. If an agent learned that a button moved after a UI update on Tuesday, Wednesday's agent already knows.

It's promising. It's also early. The compilation is noisy, the facts sometimes conflict, and I'm still tuning what gets kept versus discarded.

Tradeoffs

Self-hosted means self-maintained. It's AGPL-3.0 -- you own everything, which also means you fix everything. There's no support ticket to file when Chrome restart number 848 hits a new edge case at 3 AM. I chose this deliberately because I wanted full control over the agent environment, but it's not for everyone.

The 21 MCP tools replaced my entire Playwright and Puppeteer setup, which I didn't expect. I built them because I needed screenshot-and-click workflows, but they turned out to be sufficient for almost everything -- form filling, data extraction, multi-tab workflows. The tools are simple individually. Together they cover more ground than I anticipated.

Observation

Isolation solves the problem. Completely. Four months in, I spend zero time debugging file collisions and all my time on the actual agent logic. Once agents stopped fighting over shared state, I could focus on knowledge compilation, parallel workflows, long-running sessions that don't corrupt each other. The 2GB tax per desktop is worth it.

Screenbox is open source (AGPL-3.0). GitHub.