Booking.com Java Engineer Interview: Middleware and Distributed Systems Full Assessment

Interview ExperienceAuthor: BeautyResume Team

Complete Booking.com Java engineer 3-round interview experience with 4 years of experience. Covers Spring Cloud, Dubbo, RocketMQ, distributed transactions, and latest 2026 interview experience.

Background

Let me share my background first — 4 years of Java development experience, previously at a travel company working on order and payment systems. I spent 4 years there, growing from junior to senior developer, but the company's tech stack was outdated — still using the Spring Cloud Netflix stack with many components no longer maintained. I wanted to find a platform with a better engineering culture. In May this year, a recruiter reached out saying Booking.com was hiring Java engineers for their middleware team, which aligned well with my distributed systems experience, so I scheduled an interview.

The interview process was 3 technical rounds + 1 HR round, and it took about 2 weeks from the first round to receiving the offer. Booking.com's interview efficiency was quite good — results came out 1-2 days after each round.

Round 1: Java Concurrency + JVM + MySQL

Java Concurrency Programming

The first-round interviewer was a developer on the middleware team. They started with a self-introduction, then went straight into technical questions. Java concurrency was a major focus area, with quite deep questioning.

Thread Pool: The interviewer asked "What are the core parameters of a thread pool and what does each do?" I listed the 7 core parameters: corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue, threadFactory, and handler. The interviewer followed up with "What's the workflow of a thread pool?" I said: First create core threads → when core threads are full, put tasks in the queue → when the queue is full, create non-core threads → when maximum threads are reached, execute the rejection policy. The interviewer then asked "What queue do you use in your project?" I said we use a bounded LinkedBlockingQueue with a capacity of 500, and the rejection policy is CallerRunsPolicy — having the submitting thread execute the task itself, which provides some rate-limiting effect.

Lock Mechanisms: The interviewer asked "What's the difference between synchronized and ReentrantLock?" I answered from several dimensions: implementation (synchronized is at the JVM level, ReentrantLock is at the API level), features (ReentrantLock supports fair locks, interruptibility, multiple condition variables), and performance (after JDK 6 optimizations, the performance gap is minimal). The interviewer followed up with "Explain the lock upgrade process for synchronized." I said: No lock → Biased lock → Lightweight lock → Heavyweight lock, explaining the trigger conditions and applicable scenarios for each stage. The interviewer also asked about AQS principles — I said AQS's core is a volatile int state and a CLH doubly-linked queue. In exclusive mode, state 0 means unlocked and 1 means locked; in shared mode, state represents available resource count.

JVM Tuning

The interviewer asked "Have you done JVM tuning? How?" I said we had a payment service that frequently experienced Full GC. Through GC log analysis, we found the old generation was running out of space. Our solution: adjust heap size (from 2G to 4G), adjust young-to-old generation ratio (from the default 1:2 to 1:1.5, since most of our objects were short-lived), and switch from CMS to G1 collector. After tuning, Full GC frequency dropped from 3-4 times daily to 1-2 times weekly.

The interviewer followed up with "How do you troubleshoot OOM issues?" I said first examine the Heap Dump file using MAT or VisualVM to find the largest objects and reference chains. If it's a memory leak, look for objects that should have been collected but weren't; if it's an overflow, consider increasing heap size or optimizing object creation. The interviewer also asked about common JVM parameters — I mentioned -Xms, -Xmx, -Xmn, -XX:+UseG1GC, -XX:MaxGCPauseMillis, -XX:+HeapDumpOnOutOfMemoryError, etc.

MySQL Index Optimization

The interviewer asked "What's MySQL's index structure? Why B+ trees?" I said InnoDB uses B+ trees because: leaf nodes are connected via linked lists, making range queries efficient; non-leaf nodes only store keys, so each node holds more keys, making the tree shorter and reducing IO operations; query performance is stable since every lookup reaches a leaf node. The interviewer followed up with "What's the leftmost prefix rule for composite indexes?" I said a composite index (a,b,c) effectively creates three indexes: (a), (a,b), and (a,b,c) — query conditions must start matching from the leftmost column. The interviewer gave a practical scenario: "If the query condition is a=1 AND c=3, can the index be used?" I said column a can use the index, but column c cannot because column b is skipped. However, if column c has high cardinality, consider adjusting the index column order.

Round 2: Spring Cloud + Dubbo + RocketMQ + Distributed Transactions

Spring Cloud Components

The second-round interviewer was the team's architect, with questions more focused on architecture and middleware. They first asked about Spring Cloud component usage — I said we previously used Spring Cloud Netflix (Eureka + Ribbon + Hystrix + Zuul) and were migrating to Spring Cloud Alibaba (Nacos + Sentinel + Gateway). The interviewer asked "Why migrate?" I said Netflix components have essentially stopped being maintained, and Alibaba's ecosystem has better support in the community. Nacos supports both configuration center and service registry, making it simpler than Eureka + Config Server.

Dubbo vs Spring Cloud

The interviewer asked "What's the difference between Dubbo and Spring Cloud? How do you choose?" I said Dubbo is an RPC framework while Spring Cloud is a full microservice suite — they're not directly comparable. Dubbo has better performance (TCP-based custom protocol), suitable for high-frequency internal service calls; Spring Cloud has a more complete ecosystem, suitable for quickly building microservice architectures. In practice, choose Dubbo for performance-critical, high-frequency service calls; choose Spring Cloud for development efficiency and ecosystem completeness. The interviewer followed up with "What are Dubbo's load balancing strategies?" I said Random (default), RoundRobin, LeastActive, and ConsistentHash.

RocketMQ Principles

The interviewer asked "Why choose RocketMQ over Kafka?" I gave several reasons: RocketMQ supports transaction messages, suitable for our order system's eventual consistency scenarios; RocketMQ's delayed message feature is heavily used (e.g., auto-canceling unpaid orders after 30 minutes); RocketMQ has higher message reliability and supports message tracing. The interviewer followed up with "How are RocketMQ transaction messages implemented?" I said the core flow is: send half message → execute local transaction → commit or rollback based on local transaction result → if no commit/rollback for a long time, the Broker checks back on the local transaction status. The interviewer then asked "How does RocketMQ ensure no message loss?" I said at three levels: Producer uses synchronous sending + retries, Broker uses synchronous flushing + master-slave replication, Consumer uses manual ACK.

Distributed Transactions

The interviewer asked "What are the distributed transaction solutions? What are their pros and cons?" I detailed three mainstream approaches:

2PC (Two-Phase Commit): Strong consistency, but synchronous blocking and coordinator single-point issues. Suitable for scenarios requiring extreme consistency. Rarely used directly in practice — XA is an implementation of 2PC.

TCC (Try-Confirm-Cancel): Two-phase commit at the business layer — each service must implement Try, Confirm, and Cancel interfaces. Pros: good performance and high flexibility. Cons: strong business invasiveness and high development cost. Our payment system uses TCC through the Seata framework.

Saga: A long-transaction pattern where each step has a corresponding compensation operation. Suitable for scenarios with long business processes and many participating services. Pros: good performance, no blocking. Cons: only eventual consistency, intermediate states are visible.

The interviewer followed up with "Which one do you actually use in your projects?" I said the payment system uses TCC, while the order system uses RocketMQ transaction messages for eventual consistency — the two systems have different consistency requirements, hence different solutions.

Round 3: Architecture Design + High Concurrency + Production Issues + Algorithms

Project Architecture Design

The third-round interviewer was the department's technical lead, asking more macro-level questions. They had me first describe the overall architecture of the order system I'd worked on, then followed up on several key design decisions. I said our order system uses a microservice architecture with core services including: Order Service, Inventory Service, Payment Service, and Notification Service. Services communicate via Dubbo RPC, with RocketMQ for async scenarios. The interviewer asked "What's the complete order creation flow?" I said: User places order → Verify inventory → Lock inventory → Create order → Initiate payment → Payment callback → Update order status → Release/deduct inventory → Send notification. Each step has failure handling and idempotency design.

High Concurrency Solutions

The interviewer asked "How would you design for a flash sale scenario with 100K QPS?" I outlined several layers of defense: Layer 1 — CDN + Nginx rate limiting to filter most invalid requests; Layer 2 — Redis pre-deduction of inventory, only allowing requests with sufficient stock to proceed; Layer 3 — Message queue for peak shaving, with order requests queued for async processing; Layer 4 — Database-level optimistic locking to prevent overselling. The interviewer followed up with "How do you ensure Redis pre-deduction stays consistent with the database?" I said use Lua scripts for atomicity, sync database inventory updates after successful orders, and replenish Redis inventory if ordering fails. We also have a scheduled reconciliation task to ensure eventual consistency.

Production Issue Troubleshooting

The interviewer asked "What's the most challenging production issue you've encountered? How did you troubleshoot it?" I said once our payment service's API suddenly slowed down — P99 spiked from 200ms to 5s. Troubleshooting process: First checked monitoring and found database queries were slow → checked slow query logs and found one SQL with abnormal execution time → used EXPLAIN and found index invalidation (a function wrapping the indexed column) → fixed the SQL and recovered. The root cause was someone committed code that applied a function transformation to an indexed column in the WHERE clause, causing a full table scan. The interviewer asked "How do you prevent this from happening again?" I said we added SQL review to our code review process and deployed a SQL audit platform that automatically detects slow queries and index usage.

Algorithm Questions

The algorithm section had two problems: TopK and Binary Search Tree validation.

TopK: The interviewer asked "Find the top 100 numbers from 100 million." I presented three approaches: min-heap (maintain a heap of size 100, O(nlogk) time complexity), quickselect (based on quicksort partition, average O(n)), and divide-and-conquer + merge (suitable for distributed scenarios). The interviewer asked me to write the min-heap solution — I finished in about 10 minutes, and they said it was fine.

BST Validation: Given a binary tree, determine if it's a valid BST. I used the in-order traversal + predecessor node approach — an in-order traversal of a BST should yield strictly increasing values. The interviewer followed up with "What if the tree is very large and recursion causes stack overflow?" I said you can use Morris traversal with O(1) space complexity, or iterative in-order traversal.

HR Round: Career Plan + Compensation + Start Date

The HR round was fairly relaxed, mainly covering career planning, compensation expectations, and start date. For career planning, I said short-term I want to deepen my middleware expertise, and long-term I'd like to move toward an architect role. On compensation, I gave an expected range, and HR said Booking.com's structure is base + performance bonus + equity, with specifics depending on leveling. For start date, I said at least 1 month. HR also asked if I had other offers — I said I was interviewing elsewhere but Booking.com was my top choice.

Interview Questions Summary

Java Concurrency

1. Thread pool core parameters and workflow

2. Differences between synchronized and ReentrantLock

3. Synchronized lock upgrade process

4. AQS principles

JVM

5. JVM tuning experience

6. OOM troubleshooting methods

7. Common JVM parameters

MySQL

8. B+ tree index structure and advantages

9. Composite index leftmost prefix rule

Spring Cloud and Dubbo

10. Reasons for migrating from Spring Cloud Netflix to Alibaba

11. Dubbo vs Spring Cloud selection criteria

12. Dubbo load balancing strategies

RocketMQ

13. RocketMQ vs Kafka selection criteria

14. RocketMQ transaction message implementation

15. How RocketMQ ensures no message loss

Distributed Transactions

16. Differences and use cases for 2PC, TCC, and Saga

17. Seata framework usage

Architecture Design

18. High-concurrency design for flash sale systems

19. Redis pre-deduction and database consistency

20. Production issue troubleshooting experience

Algorithms

21. TopK problem (min-heap / quickselect / divide-and-conquer)

22. BST validation (in-order traversal / Morris traversal)

Key Takeaways and Advice

1. Java fundamentals must be rock-solid. Booking.com's interview demands a lot from Java fundamentals, especially concurrency and JVM. Simply memorizing textbook answers won't cut it — you need to explain concepts in the context of real projects. I recommend preparing a real-world case for each knowledge point.

2. Deeply understand middleware principles. For middleware like RocketMQ and Dubbo, knowing how to use them isn't enough — you need to explain underlying principles. Interviewers will probe to source-code-level details, like RocketMQ's half-message mechanism or Dubbo's service export and reference flow.

3. Distributed transactions are a high-frequency topic. You must be able to clearly explain all three approaches — 2PC, TCC, and Saga — ideally with real project examples of which one you chose, why, and what challenges you encountered.

4. System design should be layered. When answering high-concurrency questions, I recommend a layered approach from outside in: CDN/Gateway layer → Application layer → Cache layer → Message queue layer → Database layer. Be clear about what each layer does and how they work together.

5. Production troubleshooting needs methodology. Don't just state conclusions — walk through the troubleshooting process: Discover the issue → Narrow the scope → Analyze the cause → Verify the hypothesis → Fix the issue → Prevent recurrence. Interviewers care more about your troubleshooting approach.

FAQ

Q1: What's the focus of Booking.com's Java interview?

The focus is on Java concurrency, JVM, middleware (RocketMQ, Dubbo), and distributed transactions. System design and high-concurrency solutions may also come up, but not in every round.

Q2: Can I interview at Booking.com without middleware experience?

Middleware team positions have higher middleware experience requirements, but for business development roles, understanding the basic principles is sufficient. I recommend adjusting your preparation focus based on the position requirements.

Q3: How difficult are the algorithm questions?

Medium difficulty — not extremely hard. TopK, LRU, and binary tree problems are high-frequency — I recommend focusing on these classic problem types.

Q4: How is Booking.com's compensation?

Booking.com's compensation is mid-range among tech companies. With 4 years of experience, you'd likely be at a mid-senior level. Check levels.fyi for specific data.

Q5: How long does the interview process take?

For me, it was about 2 weeks from the first round to offer, with results 1-2 days after each round. Overall efficiency was quite high — faster than many companies.

#携程#Java Interview# Distributed#面试 Real Questions