Engineers come into system design interviews well-prepared. They've studied architecture diagrams, read the blog posts, watched the breakdowns. They know the components. They draw them confidently — load balancer, API gateway, a few services, a database, Redis somewhere in the mix.
And then the interviewer starts asking why.
That's where it falls apart. The components were never the point. This article is about what interviewers actually evaluate — and what to focus on instead.
What Most Candidates Get Wrong
Most system design prep teaches the same lesson: system design equals drawing the right boxes. So candidates practice drawing boxes.
Load balancers: check. API gateway: check. Message queue: check. Cache layer: check.
The problem is that every candidate draws the same boxes. A load balancer conveys nothing about judgment — it's table stakes. Saying "I'll add a load balancer here" is about as informative as saying "I'll write this in a programming language."
What interviewers are actually evaluating is whether candidates think like engineers who have shipped real systems — someone who asks the right questions, identifies the hard parts, makes defensible trade-offs, and can explain why each decision was made.
The Real Anatomy of a System Design Interview
Phase 1: Foundation — Requirements and Estimates
Every good design starts here.
Requirements
The biggest differentiator between weak and strong candidates is what happens in the first five minutes.
Consider the prompt: "Design Twitter." Most candidates immediately start sketching. Strong candidates stop and ask:
- Which features - search? tweets?
- What scale are we designing for? 1M users or 1B?
- Read-heavy or write-heavy?
- What's the consistency requirement? Can users see a tweet a few seconds late?
- What are the top 2-3 features to support? Everything, or just the feed?
- Any latency SLAs?
The constraints determine the design. "Design a URL shortener for 100 requests/day" has a completely different answer than "design one for 10 billion requests/day." Two types worth capturing:
-
Functional requirements: what the system does
-
Non-functional requirements: how well it does it — latency, availability, consistency, scale
Capacity Estimation
Most candidates treat back-of-envelope math as something required for performance. They either skip it or consider it as an afterthought. The goal of this step is to figure out what the hard problem actually is — and to show the interviewer that the analysis is grounded in real constraints, not intuition. Some things to cover:
- Requests per second (peak)
- Data written per second
- Total storage over N years
- Bandwidth in/out
Once these questions are answered, it's time to think — what problem are they pointing to?
Storing 10TB/day means the storage tier is the bottleneck. Handling 500K writes/second means the database write path is the constraint. If 1% of data is serving 90% of reads, the cache strategy is where the design work lives.
This exercise is not to impress the interviewer — but rather to aim the conversation at the right problem. Interviewers want to see that design decisions follow from data, not from habit.
Phase 2: Prioritize — Ask What's Primary and What's Secondary
System design problems are intentionally broad — broad enough that covering everything in 50-55 minutes isn't realistic. After establishing requirements and estimates, candidates should ask the interviewer directly: which of these requirements are primary, and which are secondary? In real world, we do this for every new product/feature, what should be the MVP?
Some interviewers want to go deep on data modeling. Some want the API design first. Some want a high-level end-to-end system. The answer shapes everything that follows — and most interviewers will say exactly what they care about. Candidates should utilize that.
This is not a sign of weakness. It's a signal that the candidate understands scope and is being deliberate about where to invest time. The goal is to produce the design the interviewer is looking for — not to run through a checklist.
Phase 3: Address Primary Requirements
With priorities established, candidates should go deep on the primary requirements. The areas most interviewers care about:
Data Modeling
How candidates model their data determines almost everything else about the system, and it's often where interviewers are paying the closest attention.
For any design, we should start with:
- What are the core entities and their relationships?
- What are the access patterns — how will this data be read and written?
- What consistency guarantees are needed?
- How does data evolve over time — versioning, history, soft deletes?
Here's a concrete example: designing Instagram's feed. The core data modeling question isn't "SQL or NoSQL?" It's: do we precompute feeds (fan-out on write) or compute them at read time (fan-out on read)?
That single decision — driven by the read/write ratio and distribution of follower counts — shapes the entire architecture. It determines whether we need a write-time job queue, what storage looks like, how we handle celebrity accounts with 50M followers, and how fresh feed data is.
Many times, candidates pick the wrong data model and then try to fix it with a cache. That doesn't work. A cache is a read optimization on top of the data model — if the model itself is wrong, caching doesn't help. We end up returning the wrong answer faster. Cache is an optimization. Data model is the foundation. No amount of cache is helpful with a bad foundation.
API Design
Defining APIs before building the system forces clarity on:
- What operations does the system expose?
- What are the inputs and outputs?
- What are the failure modes?
- How does a client know something succeeded?
For a URL shortener:
POST /shorten
body: { url: string, expiry?: timestamp }
returns: { short_code: string }
GET /{short_code}
returns: 301 redirect to original URL
or: 404 if not found / 410 if expired
Working through the API surfaces questions that might otherwise get missed. What happens if the same URL is shortened twice? What if a short code collides? What does the client do on failure? These aren't edge cases—they're design questions that belong in the conversation from the start.
Naming the Hard Parts
Interviewers want to know: does this candidate understand what makes this problem hard? Strong candidates surface that early.
- For a ride-matching service: the hard part is efficient geospatial queries at scale.
- For a notification system: the hard part is guaranteed delivery with exactly-once semantics across millions of users.
- For a search typeahead: the hard part is sub-50ms prefix matching over a large dataset.
Once the hard part is named, the conversation shifts. Now we can talk about what data structures apply, what trade-offs exist, what the real design decisions are. That's the conversation interviewers are hoping to have.
Phase 4: Bottlenecks and Trade-off Analysis
Once the primary requirements are addressed, the next step is to stress-test the design—not add more features to it. Interviewers don't expect a perfect design. They expect a defensible one. What they're evaluating is the reasoning — not the outcome.
For every major decision, candidates should be able to say: "I chose X over Y because [specific reason]. The downside is [honest acknowledgment]. We'd revisit this if [observable signal]."
Some trade-offs that come up in most designs:
Consistency vs. availability: When a node goes down, do we fail requests or serve stale data? For a social feed, stale is fine. For a bank balance, it isn't.
Normalization vs. denormalization: Normalized is cleaner but requires joins. Denormalized duplicates storage but reads faster. Depends on the read/write ratio and latency needs.
SQL vs. NoSQL: Not which is "better" — do we need transactions? Do access patterns map to documents, graphs, or wide columns? What query flexibility is required?
Synchronous vs. asynchronous: When can we afford eventual consistency? What's the failure mode if a message is delayed?
A good engineer doesn't just walk through the happy path. They acknowledge when and under what conditions the system fails, then evaluate whether that's something we can live with or needs to be addressed now. Interviewers are not looking for a system that never fails — they're looking for a candidate who understands where the system will fail and has thought about whether that's acceptable.
Phase 5: Secondary Requirements — If Time Permits
If time remains after the primary requirements are addressed and the bottleneck analysis is complete, candidates should move to secondary requirements and repeat the same process — design, then bottlenecks and trade-offs. The depth here will naturally be less, and that's fine. Interviewers understand the time constraint. What matters is that the primary areas were done well.
We should never sacrifice depth on primary requirements to squeeze in secondary ones. A shallow end-to-end design is worse than a thorough design of the parts that matter most.
A Practical Interview Framework
1. Requirements (5 min) Functional and non-functional. Capture scale, consistency needs, top features, and latency SLAs.
2. Capacity Estimation (3-5 min) Back-of-envelope math. Find the hard number — the one that actually constrains the design.
3. Prioritize (2 min) Candidates should ask the interviewer which requirements are primary and which are secondary, and confirm what a good answer looks like in their eyes.
4. Address Primary Requirements (20-25 min) This is where candidates go deep on what the interviewer cares about — data model, API design, core logic. Not just plumbing.
5. Bottlenecks and Trade-offs (5-10 min) Where does this design break? What's the hot path? What are the failure modes? Which are acceptable? This is where infrastructure components like caching or CDN come in — each solving a specific problem that's already been named.
6. Secondary Requirements — if time permits (remaining time) Same process as primary. Design first, then bottlenecks and trade-offs. If time runs out, say so explicitly and summarize what the approach would be.
What Interviewers Remember
After the debrief, a few things come up consistently:
- Did the candidate ask smart, clarifying questions, or just dive in?
- Did they correctly identify the hard part of the problem?
- Were their design choices connected to the constraints, or arbitrary?
- Could they reason about failure modes?
- Were they honest about trade-offs?
Nobody remembers whether the load balancer was drawn in the right place.
What interviewers do notice: candidates who throw around concepts they can't back up. Consistent hashing, CAP theorem, zookeeper — these get name-dropped constantly. Once probed, it becomes clear the understanding is completely superficial. Candidates should stick to what they know. If something is outside their domain, say so: "I know this area exists, but I'd need to research it before recommending it here." That's not a weakness — it's the judgment to know the difference between understanding something and having heard of it.
The Mindset Shift
System design interviews aren't a test of whether famous architectures have been memorized. They're a test of whether candidates can think through an ambiguous problem, identify what matters, make decisions under uncertainty, and explain the reasoning.
Every design decision should have a reason — a constraint it satisfies, a trade-off it accepts, a problem it solves.
Engineers tend to retrofit designs to what they're most familiar with. The stack we've shipped on becomes the default answer for every new problem. But a distributed message queue isn't the right tool for every async workflow, and a relational schema isn't the right model for every data problem. Having knowledge of various architectural patterns is what lets us pick the right design instead of reaching for a one-size-fits-all solution. The more patterns we've internalized — event sourcing, CQRS, log-structured storage, graph models, stream processing — the more likely we are to recognize which one fits, rather than bending the problem to suit what we already know.
For every component in the diagram, there should be an answer to: why is this here? What does it solve? What does it cost? What breaks if it goes down?
If those questions have answers, that's system design. If they don't, it's just boxes.
