Amazon System Design Interview: Designing Amazon Shopping Cart System Complete Review

System DesignJune 25, 2024Author: BeautyResume Team

4-year Java backend engineer's Amazon system design interview experience, detailing the complete process of designing Amazon's shopping cart system including MySQL+Redis dual storage, caching strategies, add-to-cart workflows, high-concurrency optimization, and peak event handling

Amazon System Design Interview: Designing Amazon Shopping Cart System Complete Review

Background

I interviewed at Amazon in April 2024 for a Java backend developer role with 4 years of experience. Honestly, this was the most hardcore system design interview I've ever experienced — the interviewer didn't just ask for an architecture diagram; they drilled down to database schema design, distributed transactions, and cache consistency at a very granular level. The team I was interviewing for was Amazon's core shopping experience team, so the interviewer asked me to design Amazon's shopping cart system. The problem seems simple on the surface, but digging deeper reveals a wealth of technical challenges: high-concurrency read/write, data consistency, distributed locks, caching strategies, message queues... I spoke for about 40 minutes and was追问'd at least 20 times. Here's my complete review.

Interview Process Review

The interviewer was a Principal Engineer (L7) who introduced themselves as being on Amazon's Shopping Experience Architecture team. The entire interview lasted 55 minutes at an extremely tight pace with barely a moment to breathe.

First 3 minutes: The interviewer briefly outlined the format and dove straight in: "Design a shopping cart system similar to Amazon's." As usual, I started with requirements clarification.

3-12 minutes: Requirements clarification. I asked the following key questions:

1. User scale? — "Assume 200M DAU, with QPS reaching 500K during peak events like Prime Day."

2. Cart capacity? — "Maximum 120 items per user."

3. Core features? — "Add to cart, update quantity, remove, select/deselect, merge carts."

4. Consistency requirements? — "Eventual consistency is fine; strong consistency is not required."

5. Multi-device sync? — "Need to support desktop, mobile app, and mobile web synchronization."

The interviewer acknowledged my questions: "Good understanding of requirements — let's start designing."

12-40 minutes: This was the core segment. I presented in the order of Storage Design → Cache Design → Core Workflows → High-Concurrency Optimization → Scaling Solutions. The interviewer drilled into very specific questions at each stage — some I handled smoothly, others left me grasping for words.

40-50 minutes: The interviewer shifted to special handling during peak events: "What happens when shopping cart QPS spikes 100x during Prime Day?" "How does the cart integrate with the inventory system?" "How do you keep cart prices updated in real time?"

Last 5 minutes: The interviewer let me ask questions. I asked about the team's technical challenges and the shopping cart system's next evolution.

Detailed Question: Design Amazon's Shopping Cart System

1. Storage Design

Cart storage is the foundation of the entire system. I designed a MySQL + Redis dual-storage approach:

1. MySQL Primary Storage: Cart table design — user_id, item_id, sku_id, quantity, checked (selected flag), create_time, update_time. Core index: (user_id, item_id, sku_id) composite unique index.

The interviewer asked: "Why MySQL instead of pure Redis?" — I answered: 1. Redis data isn't guaranteed against loss, and cart data is a user asset that can't be lost; 2. MySQL supports complex queries (e.g., grouping by store), which Redis requires multiple queries to assemble; 3. MySQL enables data analytics and reconciliation.

The interviewer followed up: "Then why not all MySQL?" — Because MySQL's read/write performance can't sustain 500K QPS; Redis is needed for cache acceleration.

2. Redis Cache: Use Hash structure. Key is cart:{user_id}, field is item_id:sku_id, value is JSON (containing quantity, checked, etc.). This way, a single HGETALL retrieves the entire cart — far more efficient than individual GETs with String structure.

The interviewer asked: "Why Hash instead of String?" — With String, each item is a separate key; retrieving the entire cart requires MGET, which performs poorly with many keys. Hash handles it with one HGETALL and supports HINCRBY for atomic quantity increments.

2. Cache Strategy Design

Cache strategy is one of the most critical designs in the cart system. I covered read/write strategies and consistency guarantees:

1. Write Strategy: Write-Behind + Async DB Persistence. When a user adds to cart, write Redis first, then asynchronously write MySQL. The interviewer asked: "What if the async MySQL write fails?" — I answered: 1. Failed MySQL writes retry 3 times; 2. If all 3 retries fail, write to a dead letter queue for manual handling; 3. A periodic reconciliation task runs every 5 minutes between Redis and MySQL, fixing any inconsistencies.

The interviewer followed up: "What about inconsistent data read during reconciliation?" — I acknowledged this as a trade-off. In the cart scenario, brief inconsistency is acceptable because the cart is a decision-support tool, not the transaction system itself. At checkout, real-time data from the inventory and pricing systems is used — cart data is advisory.

2. Read Strategy: Cache-Aside. When reading the cart, check Redis first; on cache miss, query MySQL and backfill Redis with a 7-day TTL.

3. Cache Breakdown Protection: For hot users' carts, use mutex locks to prevent concurrent cache misses. The interviewer asked: "How do you implement the mutex?" — Use Redis SETNX; the thread that acquires the lock queries MySQL and backfills, while other threads wait and retry Redis.

3. Core Workflow Design

1. Add to Cart Workflow:

User clicks add → Gateway authentication → Cart service → Check item validity (call product service) → Check inventory (call inventory service) → Write Redis (HSET) → Send MQ message → Async write MySQL → Return success.

The interviewer asked: "Won't checking item validity and inventory slow down the add-to-cart flow?" — It would, so I made these checks async pre-checks: don't check at add time, just write to the cart, then check asynchronously. If the item is invalid or out of stock, mark it as "invalid status" — the user sees a grayed-out prompt when opening the cart. This reduced the add-to-cart API RT from 200ms to 20ms.

The interviewer followed up: "Won't users complain about adding invalid items?" — We clearly indicate "This item is no longer available" or "Out of stock" on the cart page. Most users don't check out immediately after adding; by the time they open the cart, the async check is complete and the status is current.

2. Update Quantity Workflow:

User changes quantity → Cart service → Redis HINCRBY (atomic increment/decrement) → Send MQ message → Async update MySQL. Key point: HINCRBY is atomic — no distributed lock needed.

The interviewer asked: "If a user rapidly clicks +1 multiple times, will the quantity become inconsistent?" — No, because HINCRBY is atomic and each +1 increments correctly. However, MQ messages may arrive out of order, so MySQL updates use the absolute quantity value rather than deltas.

3. Merge Cart Workflow:

When a user is not logged in, cart data is stored locally (Cookie/LocalStorage). After login, it needs to be merged with the server-side cart. Merge strategy: compare each local item with the server; if the server already has it, take the larger quantity; if not, add it as new.

The interviewer asked: "How do you handle concurrent conflicts during merge?" — Use a distributed lock (Redis SETNX) on the user's cart. During merge, other add-to-cart requests queue and wait. Lock timeout is 5 seconds to prevent deadlocks.

4. High-Concurrency Optimization

1. Hot User Sharding: During peak events, certain top users' carts receive extremely high traffic (e.g., influencer-driven shopping). Solution: cache these users' cart data locally; read requests prefer the local cache, while writes update both local cache and Redis.

2. Batch Operation Optimization: When a user selects/deselects all items, don't update Redis one by one — use Pipeline batch execution, reducing RT from N network round-trips to 1.

3. Rate Limiting & Degradation: During peak events with surging cart QPS, use token bucket rate limiting; over-limit requests return "System busy." Degradation strategy: disable non-core features (cart recommendations, bundle suggestions), keeping only core add/view/remove functionality.

4. Data Sharding: MySQL is sharded by user_id — 16 databases × 8 tables each = 128 tables. Redis Cluster assigns slots by user_id hash.

5. Peak Event Special Handling

The interviewer specifically asked about Prime Day scenarios:

1. Cart-Inventory Integration: Adding to cart is just an "intent" — it doesn't lock inventory. Inventory is deducted at checkout. The interviewer asked: "Is the inventory shown in the cart real-time?" — No, it's near-real-time, synced from the inventory system every 5 minutes. If inventory is insufficient at checkout, the user sees "Item sold out."

2. Real-Time Price Updates: Cart item prices must reflect current promotions. Solution: the cart service subscribes to price change events via MQ and updates Redis prices on receipt. The interviewer asked: "What if MQ messages are delayed?" — When the user opens the cart, the frontend calls the pricing service for the latest price as a fallback.

3. Cart Pre-warming: 1 hour before Prime Day, pre-load active users' cart data from MySQL into Redis, avoiding a flood of cache misses that could overwhelm MySQL when the event starts.

6. Scaling Solutions

1. Multi-Device Sync: Use MQ to broadcast cart change events; each device refreshes its local cache on receipt. The interviewer asked: "What if a user modifies the cart on different devices simultaneously?" — Last Write Wins, because cart conflicts are extremely rare and the impact of conflicts is limited.

2. Cart Recommendations: Recommend related items based on cart contents. Solution: cart service calls the recommendation service, which uses feature vectors of cart items for similarity-based recommendations.

3. Cart Sharing: Generate a cart snapshot and share it with friends. Solution: serialize cart data to JSON, store in DynamoDB, and generate a share link.

Key Takeaways

1. The core of a cart system is the caching strategy. Interviewers will definitely drill into Redis-MySQL consistency. Have your write strategy (sync Redis + async MySQL) and reconciliation plan figured out.

2. Don't ignore peak event scenarios. Amazon interviewers love asking "What about Prime Day?" Prepare pre-warming, rate limiting, and degradation strategies in advance.

3. Explain your storage choices with reasoning. Why MySQL + Redis dual storage? Why Hash over String? Every choice needs solid justification.

4. Proactively discuss trade-offs. For example: "Cart data allows brief inconsistency because real-time validation happens at checkout." This kind of statement earns major points.

5. Prepare database schema design. Amazon interviewers may ask you to draw ER diagrams or write CREATE TABLE SQL. Have your cart table indexing and sharding strategies ready.

FAQ

Q: How does Amazon's system design interview differ from Google's?

A: Amazon is more engineering-focused, drilling into database schema design, distributed transactions, and cache consistency at a very granular level. Google is more architecture-focused, emphasizing your analytical framework and trade-off discussions.

Q: How much data does a shopping cart system handle?

A: 200M users, averaging 20 items each = ~4 billion records. After MySQL sharding, each table holds ~30M records — within a reasonable range.

Q: What if I don't have e-commerce experience?

A: The cart system design approach is universal: storage selection → cache strategy → core workflows → high-concurrency optimization → scaling. You can analogize from other scenarios (favorites, to-do lists).

#System Design#Alibaba#Shopping Cart#Amazon#缓存 Strategies#High Concurrency#MySQL#Redis