Big Tech System Design Interview Guide: 8 Real Questions from Flash Sales to Message Queues
A comprehensive guide covering 8 frequently asked system design questions at big tech companies, including flash sale systems, URL shorteners, message queues, social feeds, search engines, recommendation systems, rate limiters, and distributed caches
Big Tech System Design Interview Guide: 8 Real Questions from Flash Sales to Message Queues
Background
I started preparing for system design interviews during the 2024 spring hiring season. Before that, I'd spent 3 years as a backend developer at a mid-size internet company. I'd worked on some high-concurrency scenarios, but I'd never systematically organized a framework for answering system design questions. Honestly, the first time someone asked me to "design a flash sale system," my mind went completely blank. I rambled for a while without hitting the key points. After that wake-up call, I spent two months going through every system design question I could find and distilled them into the 8 most frequently asked ones. Each one is broken down using the Requirements Analysis → Architecture Design → Core Components → Scaling Solutions framework. This article is my complete compilation — I hope it helps you.
Interview Process Review
I interviewed at four FAANG-tier companies for system design rounds. The overall takeaway: interviewers don't expect a perfect system. They want to see your analytical thinking and your ability to navigate trade-offs. The typical flow looks like this:
Step 1: Requirements Clarification (5 min) — After the interviewer poses the question, do NOT jump straight into designing. Ask about user scale, expected QPS, consistency requirements, and availability SLAs. In my first interview, I didn't clarify and designed for millions of QPS, only for the interviewer to say "our scenario is 10K QPS" — instant over-engineering red flag.
Step 2: High-Level Design (10 min) — Sketch the core architecture diagram, explain the main components and data flow. I recommend organizing around the Client → Gateway → Service Layer → Storage Layer four-tier model, which covers most scenarios.
Step 3: Deep Dive (20 min) — The interviewer will pick one or two areas to drill into. "How do you prevent overselling inventory?" "How do you handle cache-database consistency?" You need to explain concrete implementation approaches here.
Step 4: Scalability Discussion (5 min) — What happens when the system scales 10x? Where are the single points of failure? How do you handle disaster recovery? Nailing this section can earn significant bonus points.
Question 1: Design a Flash Sale System
Requirements Analysis: The core challenge of a flash sale system is burst traffic. A typical scenario: 100K users competing for 100 items. Key metrics: QPS can hit 100K+, inventory must never oversell, and users should not experience widespread timeouts.
Architecture Design: The overall pipeline follows Client-side Throttling → CDN → API Gateway → Flash Sale Service → Inventory Service → Order Service. The core principle is progressive filtering — block most requests upstream so only a tiny fraction reaches the inventory service.
Core Components:
1. Redis Pre-deduction: Load inventory into Redis before the sale starts. Requests first deduct from Redis; only on success does the system asynchronously create orders. Lua scripts ensure atomicity.
2. Message Queue for Peak Shaving: After successful deduction, publish an MQ message. The order service consumes it asynchronously to create orders, preventing the database from absorbing peak traffic directly.
3. Client-side Throttling: Button graying + CAPTCHA + random drop — reduces QPS to 1/10 of original.
Scaling Solutions: For multi-datacenter deployment, use a distributed Redis cluster with sharded inventory, where each shard handles a subset of products. Add a local cache as a secondary filter to reduce Redis access pressure.
Question 2: Design a URL Shortener
Requirements Analysis: Convert long URLs to short ones. Core requirements: short codes should be as brief as possible, redirects must be fast, and codes must be unique. Assume 100M DAU, 1000 short URLs generated per second.
Architecture Design: Write Path: Long URL → Hash/ID Generator → Short Code → Store Mapping. Read Path: Short Code → Lookup Storage → 302 Redirect.
Core Components:
1. ID Generator: Over MD5 hashing, I recommend auto-increment ID + Base62 encoding — the short code is controllable and guaranteed unique. For distributed scenarios, use Snowflake or Redis INCR.
2. Caching Layer: Cache hot short URLs in Redis. Hit rate can exceed 95%, reducing read latency from 10ms to 1ms.
3. Bloom Filter: Before writing, check the bloom filter to determine if a short code already exists, preventing duplicate generation.
Scaling Solutions: For global deployment, use GeoDNS to route users to the nearest data center. Data centers synchronize short URL mappings via asynchronous replication.
Question 3: Design a Message Queue
Requirements Analysis: Core functionality: producers send messages, consumers receive messages, messages are neither lost nor duplicated. Key metrics: throughput, latency, message reliability.
Architecture Design: The core model is Topic → Partition → Consumer Group. Messages are sequentially appended to Partitions; consumers pull by Offset.
Core Components:
1. Broker Cluster: Each Broker handles several Partitions. Replication ensures high availability — the Leader handles reads/writes while Followers sync data.
2. Consumer Offset Management: Offsets are stored in an internal Topic. Consumers commit periodically. Exactly-once semantics require transactions + idempotence.
3. Partitioning Strategy: Key-based hashing ensures ordering for the same Key; keyless messages use round-robin for load balancing.
Scaling Solutions: When a single Partition's throughput is insufficient, increase the Partition count. For cross-datacenter scenarios, use MirrorMaker for asynchronous replication.
Question 4: Design a Social Feed (Moments/Timeline)
Requirements Analysis: Core features: post updates, browse feed, like and comment. The key challenge is choosing between push-on-write vs. pull-on-read. Assume 100M users, average 200 friends per user.
Architecture Design: I recommend a hybrid approach: push-on-write for regular users + pull-on-read for influencers. When a regular user posts, the update is fanned out to all friends' inboxes (push). When a celebrity posts, followers pull on demand (pull).
Core Components:
1. Feed Storage: Each user maintains a Timeline table sorted in reverse chronological order. Implemented with Redis Sorted Sets, where the Score is the timestamp.
2. Fan-out Service: After posting, asynchronously fan out to all friends' Timelines via a message queue for decoupling.
3. Interaction Service: Likes and comments are stored separately and aggregated in real time on the detail page.
Scaling Solutions: Users with 5000+ friends use pull-on-read — followers check cache first, then database. Cold data can be downgraded to HBase storage to reduce costs.
Question 5: Design a Search Engine
Requirements Analysis: Core functionality: crawl web pages → build index → query and return results. Key metrics: query latency < 100ms, index scale in the tens of billions.
Architecture Design: Four-layer architecture: Crawler → Document Store → Inverted Index Builder → Query Service.
Core Components:
1. Inverted Index: The mapping from Term → DocID list is the core data structure of a search engine. Use skip lists to accelerate multi-keyword AND operations.
2. Sharding & Distribution: Index is sharded by DocID across multiple nodes. Queries search all shards in parallel and merge TopK results.
3. Ranking Model: Start with TF-IDF + PageRank; advance to machine learning models (e.g., LambdaMART) for Learning to Rank.
Scaling Solutions: Real-time indexing uses NRT (Near Real-Time) mechanism, refreshing Segments every second. Hot/cold separation — cache hot query results in Redis.
Question 6: Design a Recommendation System
Requirements Analysis: Core functionality: recommend content based on user behavior. Key challenges: recall rate, diversity, real-time performance.
Architecture Design: Classic three-layer architecture: Recall Layer → Ranking Layer → Re-ranking Layer. The recall layer filters from massive candidates down to thousands; the ranking layer refines to hundreds; the re-ranking layer adjusts for diversity and business rules.
Core Components:
1. Multi-path Recall: Collaborative filtering recall, content-based recall, popularity recall, and vector recall (ANN) run in parallel; results are merged and deduplicated.
2. Feature Platform: Unified management of user features, item features, and context features. Real-time features are computed via Kafka + Flink streaming.
3. Ranking Model: Coarse ranking uses a two-tower model (low latency); fine ranking uses deep models like DIN/DIEN (high accuracy).
Scaling Solutions: An A/B experimentation platform supports multiple concurrent experiments using layered experimentation to avoid traffic mutual exclusion. Cold start uses Explore-Exploit strategies to balance exploration and exploitation.
Question 7: Design a Rate Limiter
Requirements Analysis: Core functionality: limit the number of requests within a given time window. Key requirements: distributed, low latency, approximately accurate.
Architecture Design: The rate limiting algorithm is the core. The four common approaches are Fixed Window, Sliding Window, Token Bucket, and Leaky Bucket.
Core Components:
1. Token Bucket Algorithm: Tokens are added to the bucket at a fixed rate; requests consume tokens; if the bucket is empty, the request is rejected. The advantage is allowing burst traffic — this is the most commonly used approach.
2. Sliding Window Log: Record the timestamp of each request and count requests within the window. Accurate but memory-intensive; suitable for low-QPS scenarios.
3. Distributed Rate Limiting: Use Redis + Lua scripts for centralized counting, or a token sharding approach where each node is allocated a portion of tokens.
Scaling Solutions: Multi-tier rate limiting: Gateway-level global limiting → Service-level limiting → Per-instance limiting. When rate-limited, return HTTP 429 + Retry-After header for graceful client-side degradation.
Question 8: Design a Distributed Cache
Requirements Analysis: Core functionality: KV storage, high throughput, low latency. Key challenges: cache penetration, cache breakdown, cache avalanche.
Architecture Design: Client → Consistent Hashing → Cache Node → Database. Consistent hashing solves the data migration problem during scaling.
Core Components:
1. Consistent Hashing: Virtual nodes solve the data skew problem. Each physical node maps to 100-200 virtual nodes. Scaling only affects data on adjacent nodes.
2. Caching Strategy: LRU/LFU eviction policies. Write strategy uses Cache-Aside pattern (update database first, then delete cache), combined with delayed double-delete for consistency.
3. Protection Mechanisms: Cache penetration — use Bloom filters to block illegal queries. Cache breakdown — use mutex locks to prevent concurrent cache misses. Cache avalanche — use random expiration times to spread the load.
Scaling Solutions: Multi-replica for high availability. Primary-replica sync uses asynchronous replication (performance-first) or Raft consensus (consistency-first). Hot keys can use a local cache + remote cache two-tier architecture.
Key Takeaways
1. Always clarify requirements first. Don't jump straight into architecture diagrams. Interviewers value your analytical ability far more than your ability to recite solutions.
2. Master a universal framework: Requirements Clarification → High-Level Design → Deep Dive → Scalability Discussion. This framework works for all system design questions.
3. Prepare versatile building blocks: Load balancers, caches, message queues, and sharding strategies. These components appear in most system designs.
4. Proactively discuss trade-offs: For example, "Strong consistency here would sacrifice availability; I'd lean toward eventual consistency with a compensation mechanism." This kind of statement earns major points.
5. Draw diagrams: A system design interview without diagrams is like a presentation without slides. Practice expressing architecture with simple boxes and arrows.
FAQ
Q: How deeply do I need to prepare for system design interviews?
A: You don't need paper-level depth, but you should be able to explain the rationale and trade-offs behind core component choices. For example, why Redis over Memcached, why a message queue instead of synchronous calls.
Q: What if I don't have large-scale system experience?
A: System design interviews test design thinking, not hands-on experience. You can supplement by reading papers (e.g., Dynamo, Kafka papers) and engineering blogs. Being upfront — "I don't have direct experience, but here's my design approach..." — is perfectly fine.
Q: How should I allocate time when it's tight?
A: Requirements clarification 5 min, high-level design 10 min, deep dive 15-20 min, scalability 5 min. If time is tight, prioritize a complete high-level design and go deep on one specific area.