A real experiment with GPT-5.3 Codex, a browser game arcade, and a methodology gap nobody seems to have solved yet.
Let's be honest about the two camps developers fall into when using AI coding agents today.
Camp 1: Vibe Coders. Single prompt, see what happens, iterate reactively, ship something that mostly works. Fast. Messy. Surprisingly effective for simple projects.
Camp 2: Spec Driven developers. Formal requirements. Structured documentation. Change proposals. Review cycles before touching a line of code. Rigorous. Slow to start. Built for scale.
Most tutorials and hot takes online will push you firmly into one of these camps. But there's a real problem neither side fully addresses: what do you do when you start a project and genuinely don't know how complex it is?
That's the problem I ran into. And trying to solve it broke more things than I expected.
Watch Full Video
https://youtu.be/WXz1MWYho30?embedable=true
The Setup: A Browser Game Arcade
I wanted AI to build a browser-based arcade of classic games — Tetris, Pac-Man, Space Invaders. I'd done a version of this before using Claude Sonnet 4.5 inside GitHub Copilot with Visual Studio 2026. That went well enough.
This time: GPT-5.3 Codex in VS Code. Same challenge. Different agent.
My hypothesis was that this should be straightforward. These are well-known games. The rules are universally understood. There's no ambiguous business logic to figure out. Surely this is a solved problem for a frontier model in 2026.
Spoiler: it was not.
First Attempt: The Instructions.md Approach
Rather than raw vibe coding, I started with an instructions.md file — a single document telling the agent what to build. Four games, separate feature files, .NET 10, C# Blazor, separation of concerns.
This felt like a reasonable middle ground. More intentional than a chat prompt, less formal than a full spec.
What Actually Happened
The agent spent 13–15 minutes trying to scaffold the .NET solution. It couldn't do it automatically. Basic project creation — the thing you'd expect to take 30 seconds with a CLI — became a multi-minute failure loop.
Then, when the build finally ran: Tetris was not playable. The first game out of the gate. Complete fail.
The Uncomfortable Takeaway
Being explicit about the tech stack — Blazor, C# — was the wrong call. A pure vibe coding approach, with no technology constraints, would likely have produced working games faster. Specificity backfired.
The Pivot: Let the AI Plan First
Rather than retrying with the same approach, I asked GPT-5.3 a different question entirely: what technology stack would you actually recommend for building browser-based games?
The answer was not Blazor.
This opened up a more interesting question: if the AI can recommend the stack, why not have it go further and plan the entire approach — the architecture, the file structure, the execution order — before writing a single line of game code?
So that's what I did.
What the AI Generated
GPT-5.3 produced a docs folder containing a high-level spec, an architecture overview, and a sequenced execution plan. All AI-generated. No developer-written requirements.
Suddenly the project had real structure. There was context to work against. The agent had something to anchor its decisions to beyond a one-line prompt.
So Is This a New Methodology?
Here's where it gets philosophically messy.
What I ended up with wasn't vibe coding. There was too much intentional pre-planning for that label to fit. But it also wasn't Spec Driven Development. I had no intention of reviewing formal specs every time a bug needed fixing or a feature needed changing. The games themselves have well-known rules — Tetris doesn't need a requirements document.
What it was: AI-generated planning artifacts used as a lightweight execution context, without the overhead of a formal spec review cycle.
Is that a thing? Does it have a name? I genuinely don't know.
Why Existing Tools Don't Quite Cover This
Plan Mode (Claude Code)
Claude Code's Plan Mode has real overlap with this approach — the agent reasons about the task before acting. But it's not universal. It lives inside one specific tool and doesn't persist across sessions or agents.
GitHub Copilot's Pre-Execution Review
Copilot lets you see what the agent intends to do before it does it. Useful. But you can't edit or shape that plan directly. You're a reviewer, not a co-author.
Full Spec Frameworks (openSpec, SpecKit)
These are well-designed for teams and complex projects. But they come with real overhead — change proposals, version control for specs, formal review gates. For a solo developer building a game arcade, that's overkill.
None of these land exactly where this experiment ended up.
The Slippery Slope Problem
Here's the risk with this middle-ground approach that I can't ignore.
The moment you need to update something — fix a bug, change a mechanic, add a feature — you face a choice. Do you update the planning docs? If yes, you're now maintaining living documentation, and you've drifted into lightweight Spec Driven Development. If no, your docs fall out of sync and become noise.
What starts as a pragmatic shortcut can quietly become a formal process you never agreed to. The overhead creeps in without you noticing.
This isn't an argument against having planning artifacts. It's an argument for being deliberate about what they are, what they're for, and when they stop being useful.
Where This Lands
At the end of Part 1 of this experiment, the AI has produced:
- A recommended technology stack (not Blazor)
- A high-level architecture doc
- A sequenced execution plan
- A docs folder to work from
The developer wrote none of it. The agent planned its own attack. Whether that plan produces working games is the subject of Part 2.
But the bigger open question — what to actually call this approach, and whether it holds up under the pressure of iteration — is one I'm still working through.
If you've landed on a clean definition, a workflow, or even just a name for this space between reactive prompting and formal spec management, I'd genuinely want to hear it.
Part 2 coming soon — where the code actually gets written and we find out if AI-generated planning produces AI-generated games that work.
