20 Most Frequent MySQL Interview Questions: Indexing, Transactions, and Optimization Complete Coverage
Covers 20 high-frequency MySQL interview questions on indexing, transactions, logs, and optimization, each with exam focus and answer direction to help you fully prepare for MySQL interviews.
Background
To be honest, I always thought I was decent at MySQL. After all, I'd been writing CRUD for over two years and added plenty of indexes. But when I faced big tech interviews, I realized how shallow my understanding was. The moment interviewers dug into underlying principles, I started stumbling. I then spent three full weeks going through all the high-frequency MySQL questions and compiled these 20 most frequently asked ones. I hope this helps you in your interview prep.
I. Indexing (6 Questions)
1. B+ Tree Principles? Why MySQL Uses B+ Trees Instead of B Trees?
Exam Focus: Understanding index data structures
A B+ tree is a variant of the B tree. Key differences: all data is stored in leaf nodes, non-leaf nodes only store index keys; leaf nodes are connected via a doubly linked list. Three reasons MySQL chose B+ trees: first, the tree height is lower with fewer IO operations since non-leaf nodes don't store data, allowing more index entries per page; second, range queries are efficient as the leaf node linked list enables sequential scanning; third, query performance is stable since every query must reach leaf nodes, unlike B trees where performance varies.
2. Clustered Index vs Non-Clustered Index?
Exam Focus: Index organization understanding
A clustered index is InnoDB's primary key index, where leaf nodes store entire row data. Data and index are together, so a table can only have one clustered index. Non-clustered index (secondary index) leaf nodes store primary key values. If the query needs columns not in the index, a table lookup is required — going back to the clustered index with the primary key. This is why covering indexes improve performance: they avoid table lookups entirely.
3. What is a Covering Index?
Exam Focus: Index optimization strategies
A covering index means all columns needed by the query are contained in the index, eliminating the need for table lookups. For example, with a composite index (a,b), executing SELECT a,b FROM t WHERE a=1 — the index already contains a and b, so data is returned directly from the index. Interviewers love this question because it's one of the most practical optimization techniques. I once optimized a slow query by adding a covering index, reducing query time from 800ms to 5ms.
4. What is the Leftmost Prefix Rule?
Exam Focus: Composite index usage rules
The matching rule for composite index (a,b,c) starts from the leftmost column and stops when encountering range queries (>, <, between, like prefix). So WHERE a=1 AND b=2 AND c>3 uses all three columns, but WHERE a=1 AND c=3 only uses column a — column b is skipped so c can't be used either. One easily overlooked point: index column order doesn't depend on the WHERE clause writing order; the optimizer adjusts automatically.
5. When Do Indexes Become Invalid?
Exam Focus: Index usage condition judgment
This is a must-ask question! Common invalidation scenarios: using functions or operations on indexed columns (WHERE YEAR(create_time)=2025); implicit type conversion (querying varchar column with int); LIKE starting with wildcard (LIKE '%abc'); OR condition where one column has no index; not satisfying leftmost prefix; NOT IN, NOT EXISTS in certain cases. When I interviewed at ByteDance, the interviewer asked me to list more than 5 scenarios. Good thing I was well-prepared.
6. How to Optimize Indexes?
Exam Focus: Practical optimization skills
Core optimization approach: prioritize covering indexes to reduce table lookups; composite indexes should follow leftmost prefix and place high-cardinality columns first; avoid operations on indexed columns; use EXPLAIN to analyze execution plans checking type and Extra fields; consider indexes for ORDER BY and GROUP BY too. Don't forget — more indexes isn't always better. Each index needs maintenance, and write performance degrades.
II. Transactions (5 Questions)
7. What is ACID?
Exam Focus: Transaction properties
Atomicity is implemented via undo log — transactions either fully succeed or fully rollback; Consistency is the goal, guaranteed collectively by the other three properties; Isolation is implemented via locks and MVCC; Durability is implemented via redo log. Interviewers often follow up: Can consistency be guaranteed by the database alone? No — application-level constraints are also needed, like ensuring account balances can't go negative in transfer scenarios.
8. Four Isolation Levels? MySQL Default?
Exam Focus: Transaction isolation mechanisms
Read Uncommitted (dirty reads), Read Committed (non-repeatable reads), Repeatable Read (phantom reads), Serializable. MySQL defaults to Repeatable Read (RR). Under RR, InnoDB solves most phantom read problems through MVCC + gap locks, but not all — snapshot reads won't have phantom reads, but current reads might. The "not all" detail is what interviewers love to dig into.
9. MVCC Principles?
Exam Focus: Concurrency control mechanisms
MVCC is implemented through hidden columns (trx_id, roll_pointer), undo log version chains, and ReadViews. Each row has two hidden columns: transaction ID and rollback pointer. ReadView contains the list of currently active transaction IDs, and visibility is determined by comparing each version's transaction ID in the version chain with the ReadView. RC level generates a new ReadView for each SELECT; RR level only generates one on the first SELECT. This mechanism truly helped me understand why RR solves non-repeatable reads.
10. InnoDB Lock Mechanism?
Exam Focus: Lock types and usage scenarios
By granularity: table locks, row locks, gap locks, next-key locks (next-key lock = row lock + gap lock). By mode: shared locks (S), exclusive locks (X), intention locks. InnoDB row locks are placed on indexes — if no index is used, row locks degrade to table locks. Next-key locks are InnoDB's key to preventing phantom reads at RR level, locking index records and the gaps before them.
11. How Do Deadlocks Occur? How to Avoid?
Exam Focus: Deadlock handling
Four necessary conditions for deadlock: mutual exclusion, hold and wait, no preemption, circular wait. InnoDB detects deadlocks and automatically rolls back the transaction with the lowest cost. Avoidance methods: access tables and rows in fixed order; split large transactions into smaller ones; lower isolation level; add proper indexes to prevent lock escalation. I once encountered a deadlock in production caused by two transactions updating the same set of rows in different orders. After unifying the update order, the problem was solved.
III. Logs (3 Questions)
12. What is redo log?
Exam Focus: Crash recovery mechanism
Redo log is InnoDB's physical log recording "what modification was made on which data page," used for crash recovery to ensure durability. It uses WAL (Write-Ahead Logging) strategy — write log first, then write to disk. Redo log uses fixed-size circular writing, composed of ib_logfile0 and ib_logfile1. Interviewers may follow up: Why write redo log first instead of directly writing data files? Because redo log is sequential writing, far more performant than random writes to data files.
13. What is undo log?
Exam Focus: Transaction rollback mechanism
Undo log records pre-modification data values, used for transaction rollback to ensure atomicity, and is also a core component of the MVCC version chain. INSERT produces insert undo log (can be deleted after transaction commit); UPDATE/DELETE produce update undo log (needs to be retained for other transactions' snapshot reads). This is why long transactions cause undo log bloat, which in turn affects performance.
14. Differences Between binlog and redo log?
Exam Focus: Log system understanding
Binlog is the Server layer's logical log recording all DDL and DML operations, mainly used for master-slave replication and data recovery; redo log is InnoDB's physical log recording page-level physical modifications, used for crash recovery. Key differences: binlog is append-only, redo log is circular; binlog has two formats (statement/row), redo log only has physical format; redo log is continuously written during transaction execution, binlog is written all at once at commit. Two-phase commit coordinates consistency between these two logs.
IV. Optimization (3 Questions)
15. How to Optimize Slow Queries?
Exam Focus: Performance tuning practice
Step one: enable slow query log to identify slow SQL; step two: use EXPLAIN to analyze execution plans, focusing on type (avoid ALL full table scan), key (whether index is hit), rows (scanned rows), Extra (Using filesort/Using temporary); step three: targeted optimization — add indexes, optimize SQL writing, avoid SELECT *, decompose large queries. I once had a pagination query that was very slow with large datasets — after optimizing with deferred join, it went from 3 seconds to 50 milliseconds.
16. How to Implement Database Sharding?
Exam Focus: Architecture design capability
Vertical sharding splits by business domain; horizontal sharding splits by rules (e.g., user ID modulo). Same for table sharding. Common middleware includes ShardingSphere and MyCat. Problems introduced: distributed transactions, cross-shard JOINs, global ID generation, data migration. Interviewers typically follow up: When should you shard? Generally when a single table exceeds 20 million rows or the data file exceeds 20GB. But sharding is a last resort — try index optimization, read-write splitting, and caching first.
17. How to Implement Read-Write Splitting?
Exam Focus: Architecture solution design
Based on master-slave replication — master handles writes, slaves handle reads. At the code level, dynamic data source switching can be implemented via Spring's AbstractRoutingDataSource or ShardingSphere. Issues to note: master-slave delay causing stale reads (critical business can force reads from master); reads and writes within a transaction must use the same data source. Delay is typically in milliseconds, but may be unacceptable for financial scenarios.
V. Others (3 Questions)
18. Master-Slave Replication Principles?
Exam Focus: High availability architecture understanding
Three threads: master's Binlog Dump thread (sends binlog), slave's IO thread (receives and writes to relay log), slave's SQL thread (executes relay log). Three synchronization modes: async replication (default, may lose data), semi-sync replication (at least one slave confirms receipt), fully sync replication (all slaves confirm, poor performance). Interviewers often ask: How to solve master-slave delay? Answer: parallel replication, semi-sync, force reads from master at business layer.
19. SQL Execution Flow?
Exam Focus: MySQL overall architecture understanding
Connector → Query Cache (removed in 8.0) → Analyzer (lexical/syntax analysis) → Optimizer (choose execution plan) → Executor (call storage engine interface) → Storage Engine (return data). Update statements also involve two-phase commit of redo log and binlog: first write redo log (prepare state) → write binlog → commit redo log (commit state). I drew this flow diagram more than ten times before I truly memorized it.
20. InnoDB vs MyISAM?
Exam Focus: Storage engine selection
InnoDB supports transactions, row locks, foreign keys, MVCC, crash recovery; MyISAM doesn't support transactions or row locks, only table locks, but SELECT performance can be better in some scenarios, and COUNT(*) doesn't require scanning. Nowadays InnoDB is almost universally used — MySQL 5.5+ defaults to InnoDB. The only scenario still using MyISAM might be read-only configuration tables.
Key Takeaways
For MySQL interview prep, my advice: first understand the overall architecture (how a SQL statement executes), then dive into each module (indexing, transactions, logs), and finally connect everything together. Just memorizing answers isn't enough — interviewers will keep pressing. Only by truly understanding the principles can you handle their questions. I recommend combining source code reading with hands-on practice, like using EXPLAIN to analyze various SQL execution plans and manually creating deadlock scenarios to observe behavior. These practical experiences are especially impressive in interviews.
FAQ
Q: Do I need to read MySQL source code for interviews?
Generally no, but understanding source implementations of key data structures (like Buffer Pool, B+ tree node structure) is a big plus.
Q: Any quick way to get started with index optimization?
First learn to read EXPLAIN output, understand type and Extra fields, then optimize slow queries one by one from the slow query log.
Q: How to prepare for transaction isolation level questions?
Focus on understanding MVCC implementation under RR level and how gap locks prevent phantom reads — this is the most frequently asked follow-up question.