I Asked GitHub Copilot to Plan My Next Sprint: It Failed Spectacularly

Written by incompletedeveloper | Published 2026/04/08
Tech Story Tags: ai-coding | agile-software-development | scrum | github-copilot | ai-programming | openai-codex | ai-assisted-coding | ai-coding-tools

TLDRTried using GitHub Copilot in Visual Studio 2026 to generate a full Agile sprint plan for rewriting a legacy application. Results: Codex Mini: Produced a vague, waterfall-style plan → clear misunderstanding of Agile. Full Codex: Better terminology (Definition of Done, backlog, etc.) but still mostly fluff and unrealistic timelines. The plans focused on mechanical code conversion, missing the actual domain logic rewrite. Effort estimates were not realistic, especially without historical sprint velocity or team input. *Conclusion: AI can generate something that looks like a sprint plan, but realistic Agile planning still requires human developers who understand the codebase, requirements, and team velocity.via the TL;DR App

Testing whether AI can replace your Scrum Master (spoiler: it can't)

Here's a question that's been bouncing around tech Twitter: Can AI do sprint planning?

Not "can it generate user stories" or "can it estimate story points" — I mean actual, realistic sprint planning that a human Scrum Master would do. The kind where you account for developer velocity, context switching, technical debt, and all the messy reality of building software.

I decided to find out by throwing GitHub Copilot into the deep end.

The experiment: Give Copilot a legacy codebase, ask it to plan a complete rewrite using Scrum methodology, and see if the estimates match reality.

The result: It was like watching someone who read a book about Agile try to plan sprints without ever actually working in one.

Let me show you exactly where it went wrong.


Watch Video

https://youtu.be/ErwuATHHXw4?embedable=true


The Setup: A Fair(ish) Test

I gave Copilot in Visual Studio 2026 a real-world scenario:

The codebase:

  • Legacy .NET application (older framework)
  • Needs a complete rewrite to .NET 10 with Clean Architecture
  • Domain entities, services, repositories, the whole stack

The constraints:

  • 1 developer (me)
  • 5 hours/day of actual coding time (realistic for senior devs with meetings)
  • 2-week sprints where only 7 out of 10 days are development days
  • No previous sprint velocity data (this is important later)

The task: Review the code using templates, then generate a complete sprint plan with effort estimates.

I tested two models:

  1. ChatGPT 5.1 Codex mini
  2. ChatGPT 5.1 Codex (full version)

Round 1: ChatGPT 5.1 Codex Mini — The Waterfall Disaster

The mini model gave me what it proudly called a "detailed Sprint and iterations plan."

What I got:

Sprint 1: Foundation & Domain Entities
- Set up .NET 10 project structure
- Migrate domain entities
- Configure dependency injection

Sprint 2: Repositories & Data Access
- Implement repository pattern
- Set up Entity Framework Core
- Database migrations

Sprint 3: Service Layer & Testing
- Build service layer
- Write unit tests
- Integration tests

Sprint 4: Documentation & Final Sign-Off
- API documentation
- Code documentation
- Final review and deployment

The problem?

This isn't Agile. This is a textbook waterfall disguised as sprints.

Let me break down the anti-patterns:

Anti-Pattern #1: No Working Software Until Sprint 3

Sprint 1 builds entities. Sprint 2 builds repositories. Cool story — what does the user see?

Nothing. Zero. Nada.

In real Agile, every sprint should deliver something demonstrable. Even if it's just one feature end-to-end. This plan gives you infrastructure for two sprints before anything actually works.

Anti-Pattern #2: Testing is a Separate Phase

"Sprint 3: Write unit tests."

Excuse me, what?

In Agile, tests are written during development, not after. This is literally the waterfall "testing phase" approach that Agile was invented to replace.

Anti-Pattern #3: Documentation Sprint

Sprint 4 is just documentation and sign-off. No new features, no bug fixes, just... paperwork.

This is what happens when you give AI a list of tasks and ask it to group them into timeboxes without understanding why we timebox in the first place.

Verdict: ChatGPT 5.1 Codex mini gets an F in Agile 101.

It doesn't understand the philosophy. It's just playing Mad Libs with sprint terminology.


Round 2: ChatGPT 5.1 Codex (Full) — Better, But Still Wrong

Okay, maybe the mini model was too... mini. Let's try the full version.

This time, I split it into two steps:

  1. Code review first
  2. Sprint planning second

The Code Review

Cost: 3 premium requests (approximately $0.45 in API credits)

The review was actually decent. It identified:

  • Architectural patterns in use
  • Technical debt hotspots
  • Complexity metrics
  • Areas needing refactoring

No complaints here. This part worked.

The Sprint Plan

This is where things got interesting. The full model generated:

Definition of Done:

"Code reviewed, tested, integrated, and deployed to staging."

Okay, that's... generic. Every sprint has the same DoD? What about feature-specific acceptance criteria?

Definition of Ready:

"Requirements clear, dependencies identified, estimates agreed upon."

Cool. Standard Scrum terminology. Sounds professional. But also completely useless without specifics.

Sprint 1 Breakdown: The Devil in the Details

Sprint Goal: "Establish .NET 10 Clean Architecture foundation."

Tasks:

  • Set up project structure (4 hours)
  • Configure dependency injection (3 hours)
  • Implement base domain entities (8 hours)
  • Set up logging framework (2 hours)
  • Configure application settings (1 hour)
  • Initial database schema (4 hours)

Total: 22 hours across 7 working days

At first glance, this looks reasonable. But let me tell you why it's not.

The Reality Check

I actually did this rewrite. Here's what the AI missed:

What the AI got right (about 80%):

  • The basic tasks exist
  • Hour estimates aren't completely insane
  • Logical grouping of related work

What the AI completely missed:

  1. Automation wasn't factored in properly
    • I told Copilot some tasks would be automated
    • Actual time for project setup + DI + entities: ~10 hours max (by day 3)
    • AI estimated 22 hours (entire sprint)
  2. No actual domain logic
    • Sprint 1 and Sprint 2 were just "convert old entities to new format."
    • This is repetitive labor, not building features
    • Where's the business logic? Where's the value?
  3. The dreaded end-of-sprint testing phase
    • Milestone: "Service & Test Coverage" at the end of Sprint 3
    • There it is again — testing as a separate phase
    • This is Agile cosplay, not Agile

Sprint 2: More of the Same

Sprint 2 continued the entity conversion work. Still no working features. Still no demonstrable value.

The pattern: The AI treated this like a checklist migration, not a product rewrite.

The Fundamental Problem: AI Doesn't Understand Context

Here's what became painfully obvious:

AI is great at:

  • Listing tasks
  • Using correct Scrum terminology
  • Making estimates that sound plausible
  • Generating structured output

AI is terrible at:

  • Understanding why you're rewriting the code
  • Knowing the business domain
  • Judging the complexity of logic vs. boilerplate
  • Factoring in real-world developer constraints
  • Planning for value delivery instead of task completion

The Missing Pieces

Even a human developer wouldn't know all the details without poking around the codebase for a few days. But a human would:

  1. Ask clarifying questions

    • "What's the most critical feature to deliver first?"
    • "Are there any risky technical assumptions?"
    • "What's the user-facing priority?"
  2. Plan for vertical slices

    • Pick one feature end-to-end
    • Build it through all layers
    • Get feedback
  3. Adjust based on sprint velocity

    • Oh, wait, I didn't give the AI historical velocity data
    • (This was intentional — most teams don't have clean velocity metrics anyway)
  4. Account for unknowns

    • Buffer time for unexpected complexity
    • Technical debt discovered mid-sprint
    • Dependency issues

Was This Test Fair?

Short answer: No.

Long answer: That's the point.

Real-world sprint planning is done by:

  • Developers who know the codebase
  • Teams with historical velocity data
  • Planning poker sessions where multiple people debate estimates
  • Scrum Masters who understand team capacity and context

I gave Copilot:

  • Code it had never seen before
  • No velocity data
  • No team context
  • No business domain knowledge

But guess what? That's actually a realistic scenario for:

  • New teams
  • New projects
  • Consultants were brought in mid-project
  • Startups without established processes

And in those scenarios, human developers still do better than AI because they:

  • Ask questions
  • Make assumptions explicit
  • Plan iteratively
  • Adjust based on feedback

The Fluff Factor: AI's Secret Weapon

Here's something I noticed across both models:

AI generates a LOT of impressive-sounding fluff.

  • "Definition of Done"
  • "Definition of Ready"
  • "Sprint Goals"
  • "Milestones"
  • "Acceptance Criteria"

It all looks professional. It sounds like someone who knows Agile wrote it.

But when you actually analyze the content:

  • Definitions are too vague to be useful
  • Goals don't align with value delivery
  • Milestones reveal waterfall thinking
  • Acceptance criteria are missing or generic

It's the software development equivalent of a student padding a paper to hit the word count.

What AI Actually Did Well

To be fair, there were useful outputs:

1. Task Decomposition

Breaking "rewrite application" into concrete steps is helpful, even if the sequencing was wrong.

2. Hour Estimates (Sort Of)

Individual task estimates weren't terrible — they were just missing automation factors and developer experience.

3. Structured Output

Having everything in a standardized format is better than nothing.

4. Starting Point for Discussion

If a human reviewed this plan, they could quickly identify and fix the issues.

So AI sprint planning isn't useless — it's just not autonomous.

The Real Use Case: AI as a Junior PM

Here's where I landed:

Don't ask AI to do sprint planning.

Ask AI to draft a sprint planning that a human will review and fix.

The workflow should be:

  1. AI generates an initial plan
  2. Human identifies anti-patterns
  3. Human adjusts for reality
  4. Team discusses and commits

This is basically how you'd work with a junior PM who:

  • Knows the theory
  • Doesn't have domain experience
  • Needs supervision

And that's fine! Junior PMs are valuable. They do the grunt work of structuring information. Then senior people refine it.

My Recommendations

If you're considering using AI for sprint planning:

DO:

  • Use it to generate task lists
  • Let it estimate individual tasks
  • Have it structure backlog of items
  • Generate templates and frameworks

DON'T:

  • Trust its understanding of Agile philosophy
  • Accept sprint sequencing without review
  • Assume estimates account for your context
  • Skip human validation

DEFINITELY DON'T:

  • Use it as a replacement for experienced PMs
  • Let it make the final decisions on priorities
  • Trust it with value-based planning

The Bigger Picture: What This Means

This experiment reveals something important about current AI limitations:

AI is great at pattern matching, terrible at pattern breaking.

Sprint planning requires:

  • Understanding trade-offs
  • Challenging assumptions
  • Adjusting based on context
  • Optimizing for human factors

These are all areas where AI struggles because they require:

  • Domain expertise
  • Emotional intelligence
  • Strategic thinking
  • Real-world experience

The models are getting better at generating correct syntax. They're not getting better at understanding semantics.

Final Verdict

Can AI do realistic sprint planning?

No.

Not yet. Maybe not ever.

Sprint planning isn't just about dividing work into timeboxes. It's about:

  • Understanding team capacity
  • Managing risk
  • Delivering value iteratively
  • Adapting to feedback

AI can help with the mechanical parts. But the strategic thinking that makes Agile actually work?

That still requires humans.


TL;DR:

  • Tested GitHub Copilot for sprint planning a .NET rewrite
  • ChatGPT 5.1 Codex mini produced a pure waterfall disguised as Agile
  • ChatGPT 5.1 Codex (full) was better but still missed critical context
  • AI generates impressive-sounding fluff without strategic thinking
  • Use AI as a junior PM who needs supervision, not as a replacement
  • Sprint planning requires human judgment AI doesn't have

Hot take: If your sprint planning can be automated by AI, your process is probably too rigid anyway.


Learn About Spec-Driven Development

https://www.youtube.com/watch?v=0atkW_janVg&list=PLphsQTGN5DbJnaiy-89QitCMkg-8toQac&embedable=true


Written by incompletedeveloper | .NET C# developer Writing about AI-assisted software development. Focused on how modern tools change productivity
Published by HackerNoon on 2026/04/08