System Design Interview from Scratch: 5-Step Method and 6 Classic Problems
Master the 5-step method for system design interviews, break down 6 classic system design problems from URL shorteners to message push, with architecture diagram approaches and interviewer scoring criteria.
What Are System Design Interviews Really Testing?
In technical interviews, the system design round is the dividing line between junior and senior engineers. Unlike coding problems, system design interviews have no standard answers — interviewers want to see how you derive a feasible architecture from ambiguous requirements, step by step.
System design interviews truly assess four dimensions: requirements analysis ability, architectural trade-off ability, scalability thinking, and communication skills. Many candidates jump straight to drawing architecture diagrams, missing that interviewers care most about your reasoning process of "why you designed it this way."
This article breaks down the 6 most frequent problem types in architecture interviews, paired with a 5-step method, to help you build a complete thought chain from requirements to solutions.
5-Step Method: Requirements Analysis → High-Level Design → Detailed Design → Scalability → Summary
No matter what system design problem you encounter, you can structure your approach with these 5 steps:
- Requirements Analysis: Confirm functional and non-functional requirements with the interviewer. Functional requirements define "what the system does"; non-functional requirements define "what scale it handles, what latency, what availability." Spend at least 3-5 minutes on this step — don't rush to draw diagrams.
- High-Level Design: Draw the core components and data flow of the system, typically including clients, API gateway, application services, databases, and caches. This step doesn't require diving into each component's details — the focus is demonstrating the overall architecture's rationality.
- Detailed Design: Choose 1-2 core components to discuss in depth, including data model design, API design, key algorithm selection, and storage solutions. Interviewers typically guide you deeper in a specific direction — follow their lead.
- Scalability Analysis: Discuss how the system scales when user volume grows 10x or 100x. This involves horizontal scaling, sharding, caching strategies, and asynchronous processing. This is the most differentiating环节.
- Summary: Spend 1-2 minutes reviewing your design decisions, explaining trade-offs, and pointing out potential improvements. This demonstrates your big-picture thinking and self-reflection ability.
Master this 5-step method, and you'll have a universal framework for any system design interview. Next, let's practice with 6 classic problem types.
Problem 1: URL Shortener System
Requirements Analysis
URL shortening is the most classic entry-level problem in system design interviews. Core functionality: given a long URL, generate a short URL; when accessing the short URL, redirect to the long URL.
- Functional requirements: Generate long URL → short URL mapping, short URL → long URL redirection, custom short links (optional), link expiration (optional)
- Non-functional requirements: 100 million daily read/write operations, redirection latency <100ms, 99.9% availability, short link length as brief as possible
High-Level Design
Core components: API Service (receives long URL generation and short URL access requests), ID Generator (generates unique short link IDs), Database (stores long-short URL mappings), Cache (caches hot short links to accelerate redirection).
Data flow: User submits long URL → API service calls ID generator to get unique ID → encodes ID to Base62 short link → stores in database → returns short link. When accessing short link: parse short link to get ID → check cache → if miss, check database → 301/302 redirect to long URL.
Detailed Design
ID generation is the core challenge, with three options:
- Auto-increment ID + Base62 encoding: Simple and reliable, but short links can be enumerated. Suitable for scenarios with low security requirements.
- Pre-generated ID pool: A separate service pre-generates a batch of unique IDs into a pool, and the API service draws from it. Avoids the predictability of auto-increment IDs but adds system complexity.
- MD5/SHA1 hash + take first N characters: No centralized ID generation needed, but collision risk exists. Typically taking the first 6-7 Base62 characters gives an extremely low collision probability.
301 or 302 for redirection? 301 is a permanent redirect — browsers cache it, reducing server load but preventing click tracking. 302 is a temporary redirect — every request goes through the server, enabling click tracking. Most URL shortener services choose 302.
Scalability Analysis
Database sharding: Shard by the hash value of the short link ID to support horizontal scaling. Cache hot short links: Use Redis to cache the top 20% popular short links, achieving 80%+ hit rate. Global deployment: Use CDN to accelerate redirection, deploy API services in multiple data centers.
Problem 2: Message Push System
Requirements Analysis
The message push system is a high-frequency problem in architecture interviews, testing real-time communication and large-scale connection management capabilities.
- Functional requirements: Support single push, group push, broadcast, message read/unread status, offline message push
- Non-functional requirements: 10 million concurrent connections, message latency <500ms, no message loss, message ordering
High-Level Design
Core components: Connection Management Service (maintains user long connections), Message Routing Service (routes messages to the server node where the target user is connected), Message Storage (persists messages), Push Gateway (integrates with third-party push channels like APNs/FCM).
Detailed Design
Long connection options:
- WebSocket: Full-duplex communication, lowest latency, suitable for high-frequency messaging. The server needs to maintain a large number of connection states.
- Server-Sent Events (SSE): Server-side unidirectional push, simple implementation, suitable for scenarios requiring only server push.
- Long Polling: Best compatibility, but higher latency, suitable for scenarios with low real-time requirements.
Key message routing problem: When User A sends a message to User B, how do you know which node User B is connected to? Option 1: Use consistent hashing to map users to fixed nodes, then just check the routing table. Option 2: Use Pub/Sub pattern — publish messages to channels, and nodes subscribed to that channel receive them.
Offline message handling: When a user is offline, messages are stored in the database. When the user comes online, pull offline messages and push them. Key: ensure ordering and deduplication of offline messages.
Scalability Analysis
Single-machine WebSocket connections are limited (typically 100K-500K). 10 million connections require 20-100 connection servers. Use a connection routing table to record which node each user is currently connected to. When routing messages, check the routing table first, then forward. Connection servers are stateless, enabling horizontal scaling through the routing table.
When preparing for the system design round of technical interviews, many candidates focus only on technical depth and neglect how they present project experience on their resume. A strong technical resume should clearly showcase your architecture design experience and technical decision-making ability — use our resume tool to quickly generate a professional resume that highlights your architecture capabilities, building the interviewer's confidence before the system design round even begins.
Problem 3: News Feed
Requirements Analysis
News Feed is the problem in system design interviews closest to real-world business, testing your understanding of core social product functionality.
- Functional requirements: Publish posts, view posts from followed users, like/comment, timeline sorted in reverse chronological order
- Non-functional requirements: 100 million users, 100 million daily posts, Feed loading latency <200ms, support follower counts ranging from 1 to 10 million
High-Level Design
Core components: Publishing Service (writes posts), Feed Generation Service (assembles user timelines), Social Graph Service (manages follow relationships), Content Storage (stores post content).
Detailed Design
Two core Feed models — the soul of this problem:
- Push model (Fan-out on write): When a user publishes a post, write it to all followers' Feed lists. Advantage: O(1) retrieval when reading Feed. Disadvantage: Massive write volume when a celebrity posts (1M followers = 1M writes).
- Pull model (Fan-out on read): When a user reads their Feed, pull the latest posts from followed users in real-time and merge-sort them. Advantage: Lightweight writes. Disadvantage: High latency when reading Feed due to querying multiple users.
- Push-Pull hybrid: Use push model for regular users, pull model for celebrities. This is the industry's actual solution, balancing read and write performance.
Data model design: Post table (post ID, author ID, content, timestamp), Follow relationship table (follower ID, followee ID), Feed table (user ID, post ID, timestamp). The Feed table is the core of the push model, maintaining an inbox for each user.
Scalability Analysis
The Feed table is a write hotspot. Use Redis Lists or Sorted Sets to store each user's Feed, supporting O(1) insertion and paginated reads. Post content is persisted in the database, with the Feed table accelerated by caching. Social graphs use graph databases (e.g., Neo4j) or relational databases + caching. Pagination: Use cursor-based pagination instead of offset pagination to avoid deep pagination performance issues.
Problem 4: Flash Sale System
Requirements Analysis
The flash sale system is the highest-pressure scenario in system design problems, testing inventory deduction and overselling prevention under high concurrency.
- Functional requirements: Product flash sale, inventory deduction, order creation, payment
- Non-functional requirements: Peak QPS 100K+, absolutely no overselling, flash sale result latency <1s, anti-bot and anti-scalper
High-Level Design
Core components: Flash Sale API Gateway (rate limiting + authentication), Inventory Service (deducts inventory), Order Service (asynchronously creates orders), Message Queue (peak shaving), Payment Service (processes payments).
Detailed Design
Preventing overselling is the core challenge, with three approaches:
- Database optimistic locking: UPDATE stock SET count=count-1 WHERE id=? AND count>0. Simple but puts heavy pressure on the database, QPS ceiling around 5,000.
- Redis atomic deduction: Use Redis DECR command for atomic inventory deduction, reject when inventory reaches 0. QPS can reach 100K+, but requires handling Redis-database consistency.
- Distributed lock + pre-deduction: Load inventory into Redis before the flash sale starts, use Lua scripts to ensure atomicity. After successful deduction, asynchronously create orders and eventually sync to the database.
Peak shaving strategy: User requests first enter the message queue, and backend services consume at their processing capacity. This way, even if 100K QPS floods in instantly, the backend processes at its own pace without being overwhelmed.
Anti-bot strategy: CAPTCHA + IP rate limiting + user-level rate limiting + quiz verification. Core idea: make bot costs exceed benefits.
Scalability Analysis
The bottleneck of the flash sale system is inventory deduction. Redis single-node DECR can reach 100K QPS. If higher throughput is needed, shard by product ID across multiple Redis nodes. After order creation is made asynchronous, the database is no longer the bottleneck. After the flash sale ends, async tasks sync Redis inventory back to the database.
Problem 5: Search Engine
Requirements Analysis
Search engine is one of the most complex problems in system design interviews, testing inverted indexing, ranking algorithms, and distributed storage capabilities.
- Functional requirements: Web crawling, index building, keyword search, search result ranking, search suggestions
- Non-functional requirements: Index 10 billion web pages, search latency <200ms, 1 billion daily queries, daily index updates
High-Level Design
Core components: Crawler Service (fetches web pages), Indexing Service (builds inverted index), Query Service (processes search requests), Ranking Service (ranks results), Cache Layer (caches popular query results).
Detailed Design
Inverted index is the cornerstone of search engines: Forward index maps "document → terms"; inverted index maps "term → document list." During search, look up query terms in the inverted index, take the intersection to get matching documents, then sort by relevance.
Ranking algorithms:
- TF-IDF: Term Frequency-Inverse Document Frequency, measuring a term's importance to a document. Simple but doesn't consider term position or semantics.
- PageRank: Calculates webpage importance based on link relationships between pages. Google's core algorithm in its early days.
- Machine Learning Ranking: Trains models using user behavior features like click-through rate and dwell time for dynamic ranking. The mainstream approach for modern search engines.
Index sharding: The inverted index for 10 billion web pages can't fit on a single machine. Shard by document ID hash across multiple nodes. Queries search all shards in parallel, merge results, sort, and return. This is the MapReduce pattern.
Scalability Analysis
Index update strategy: Rebuild the full index once daily, update incremental index in real-time. Query service is stateless and horizontally scalable. Popular query result cache hit rate can reach 30-50%, significantly reducing backend load. Search suggestions use Trie trees or prefix matching, independent of the main search service.
Problem 6: Distributed Cache
Requirements Analysis
Distributed cache is an infrastructure problem in architecture interviews, testing understanding of cache consistency, eviction policies, and distributed systems.
- Functional requirements: KV read/write, support TTL, support eviction policies, support cluster mode
- Non-functional requirements: Read/write latency <1ms, QPS 1M+, availability 99.99%, eventual data consistency
High-Level Design
Core components: Client (routes requests to the correct node), Cache Node Cluster (stores KV data), Configuration Center (manages node lists and routing info), Monitoring Service (monitors hit rates and node health).
Detailed Design
Data sharding strategies:
- Consistent hashing: Map keys and nodes onto a hash ring, find the nearest node clockwise. When nodes are added or removed, only adjacent nodes' data needs migration — the industry's mainstream approach.
- Virtual nodes: Solves the data skew problem of consistent hashing. Each physical node corresponds to multiple virtual nodes, making data distribution more even.
- Range sharding: Shard by key range, suitable for range query scenarios, but may create hotspots.
Cache eviction policies:
- LRU (Least Recently Used): Evicts the least recently accessed data. Simple to implement, but unfriendly to sporadic access patterns.
- LFU (Least Frequently Used): Evicts data with the lowest access frequency. Better suited for business scenarios, but maintaining frequency counters is expensive.
- Redis's actual approach: Approximate LRU — randomly samples several keys and evicts the least recently accessed one. Balances performance and precision.
Cache consistency: The Cache-Aside pattern is the most commonly used approach — read requests check the cache first; on miss, query the database and write to cache. Write requests update the database first, then delete the cache. Why delete instead of update? Because updating the cache can introduce concurrency issues — deletion is safer.
Scalability Analysis
Cache penetration: Queries for non-existent keys go straight to the database. Solution: Bloom filter interception + cache empty values. Cache avalanche: Large numbers of keys expire simultaneously, overwhelming the database. Solution: Add random offsets to expiration times + multi-level caching. Cache stampede: A hot key expires, causing a flood of requests to hit the database. Solution: Mutex lock + hot keys that never expire.
4 Bonus Tips for System Design Interviews
- Draw diagrams before talking: In system design interviews, architecture diagrams are more intuitive than text descriptions. Draw and explain simultaneously, keeping the interviewer following your thought process. Don't dive into details first — start from the big picture.
- Proactively discuss trade-offs: Every design decision has pros and cons. Proactively say "the benefit of choosing A is X, the cost is Y, and I choose A because X is more important in this scenario." This demonstrates your architectural trade-off ability — far better than giving solutions without reasoning.
- Use numbers: Don't say "many users" — say "10 million DAU." Don't say "high concurrency" — say "peak QPS 100K." Quantify your design to show your sensitivity to scale.
- Proactively suggest improvements: After completing your design, proactively say "given more time, I would also consider direction X." This demonstrates your big-picture thinking and continuous optimization mindset — interviewers will see you don't stop at "good enough."
FAQ
Do I need to draw diagrams in system design interviews?
Yes. Architecture diagrams are the core expression method in system design interviews. If the interview platform supports a whiteboard, definitely use it; for phone interviews, describe components and data flow in text. Diagrams don't need to be beautiful, but component relationships and data flow must be clear.
What if I don't have real architecture experience?
System design interviews don't require you to have built real distributed systems. The focus is demonstrating your analysis process and reasoning ability. Starting from requirements and reasoning step by step — even if the final solution isn't optimal, as long as the reasoning process is sound, interviewers will recognize it. We recommend reading "Designing Data-Intensive Applications" and open-source project design documents.
What if I run out of time in the system design interview?
In a 45-minute system design interview, time management matters more than a perfect solution. Requirements analysis: 5 minutes. High-level design: 10 minutes. Detailed design: 15 minutes. Scalability: 10 minutes. Summary: 5 minutes. If you're stuck on a section, proactively say "I'll give a preliminary solution for this part and dive deeper if time allows," then keep moving forward.
What does it mean when the interviewer keeps asking about details?
Usually two reasons: either your solution has a gap and they're guiding you to discover and fix it, or they're interested in a specific direction and want to see your depth. Either way, don't panic — follow their direction deeper. If you truly don't know, honestly say "I have limited experience in this area, but my understanding is..." — it's better than making things up.
Do system design interview expectations differ by level?
Yes. Junior engineers focus on requirements analysis and basic component design; mid-level engineers focus on detailed design and trade-off analysis; senior engineers focus on scalability, consistency, and fault-tolerance design; architect-level focuses on global architecture and technology selection. Adjust your answer depth based on your target level.
The system design round of technical interviews tests not just technical depth, but your ability to structurally decompose complex problems. A strong technical resume needs the same structured presentation — use our resume generator to clearly present your architecture design experience and technical decisions on your resume, making interviewers take notice before the system design round even begins.