Elasticsearch Search Relevance Engineer Interview: Semantic Search, Vector Retrieval, and Ranking
2 years of search algorithm experience, detailed review of Elasticsearch Search Relevance Engineer three-round interview covering inverted index, semantic search, vector retrieval (ANN), ranking models, and business understanding
Background
Let me start with my situation: software engineering undergrad, master's in information retrieval, then 2 years as a search algorithm engineer at a search engine company, mainly working on relevance ranking and semantic search. While Elasticsearch is known for its search infrastructure, their search relevance team has strong technical capabilities, especially with unique challenges in enterprise search, so I really wanted to try.
I applied for the Search Relevance Engineer position at Elasticsearch, based in remote. The whole interview process took about two weeks — three technical rounds with a fast pace. Elasticsearch's interviews lean practical — many questions revolve around real business scenarios, not pure theory. Let me walk through the details.
Interview Process Review
Round 1: Information Retrieval + Inverted Index
My first interviewer was a very pragmatic engineer who asked direct questions right from the start.
First question: What's the principle and construction process of an inverted index? This was very familiar territory — I covered document tokenization, term dictionary construction, posting list organization, and skip list acceleration for merging. The interviewer asked about inverted index compression methods — I mentioned VByte encoding, Roaring Bitmap, and SIMD acceleration.
Then retrieval models: What's the principle of BM25? How does it differ from TF-IDF? I said BM25 adds document length normalization and TF saturation limits on top of TF-IDF. In the formula, k1 controls TF saturation and b controls length normalization strength. The interviewer asked how to tune BM25 parameters — I said typically k1 is between 1.2-2.0 and b around 0.75, optimizable through grid search on validation sets.
A deeper question: How do you evaluate search relevance? I covered offline evaluation (DCG/NDCG, MRR, MAP) and online evaluation (click-through rate, satisfaction, A/B testing). The interviewer asked about NDCG calculation — I detailed the gain, cumulative gain, discounted cumulative gain, and normalization process.
A system design question: Design a near-real-time index update strategy for a search engine. I described a dual-buffer strategy — main index (full) + real-time index (incremental). New documents are first written to the real-time index and periodically merged into the main index. The interviewer asked about merge strategies — I mentioned segment merge and document deletion marking approaches.
Round 1 lasted about 50 minutes. The interviewer said "solid fundamentals, deep understanding of search" and told me to prepare for Round 2.
Round 2: Semantic Search + Vector Retrieval (ANN)
Round 2's interviewer was likely a technical lead on the search team, with questions leaning toward frontier topics.
Started with semantic search: What's the difference between Dense Retrieval and Sparse Retrieval? I said sparse retrieval (BM25, QL) is based on lexical matching with poor performance on long-tail queries; dense retrieval (DPR, ColBERT) is based on semantic vectors and can handle vocabulary mismatch. The interviewer asked about DPR's training approach — I said it uses in-batch negatives or hard negatives for contrastive learning, with query and document passing through separate encoders to get vectors, then ranking by cosine similarity.
Then focused on vector retrieval: What ANN (Approximate Nearest Neighbor) retrieval methods exist? I detailed several categories: tree-based (Annoy, KD-Tree), hash-based (LSH), quantization-based (PQ, OPQ, IVF-PQ), and graph-based (HNSW, NSW). The interviewer asked about HNSW's principle — I said it constructs a multi-layer navigable small world graph, searching from the top layer downward with greedy search at each layer, balancing efficiency and accuracy. The interviewer also asked about HNSW vs IVF-PQ comparison — I said HNSW has higher recall but larger memory footprint, while IVF-PQ is memory-friendly but requires tuning the nprobe parameter.
A system design question: Design a vector retrieval system for hundreds of millions of documents with P99 latency < 50ms. I mentioned several key designs: quantization compression (OPQ+PQ compressing vectors from 768 dimensions to tens of bytes), sharding (by business to reduce per-shard data volume), caching (hot query caching), and multi-stage retrieval (coarse filtering then fine ranking). The interviewer asked about quantization's impact on recall — I said PQ compression typically reduces recall by 3-5%, which can be compensated through reranking.
A newer direction: What impact do LLMs have on search? I mentioned several aspects: query understanding (intent recognition, query rewriting), generative retrieval (directly generating document IDs), RAG (Retrieval-Augmented Generation), and LLM-based reranking. The interviewer was interested in generative retrieval and asked about DSI and GENRE approaches.
Round 2 lasted about 60 minutes — a very rewarding conversation.
Round 3: Ranking Models + Project Deep Dive
Round 3 was with the search team lead. This round focused on ranking models and projects.
Started with ranking models: What stages does search ranking typically include? I said retrieval (multi-channel recall), pre-ranking (lightweight model for fast scoring), ranking (complex model for precise ordering), and re-ranking (business rules + diversity). The interviewer asked about model selection for pre-ranking vs ranking — I said pre-ranking typically uses dual-tower models (fast but shallow interaction), while ranking uses cross models (slower but deeper interaction).
Then ranking models: What are the differences between DeepFM, DCN, and DIN? I said DeepFM is a parallel structure of FM + DNN that automatically learns feature interactions; DCN uses Cross Network to explicitly model bounded-order feature interactions; DIN uses attention mechanisms for adaptive aggregation of user behavior sequences. The interviewer asked why feature interaction is important — I said manually constructing cross features is costly and prone to omissions, while automatic interaction can discover implicit combinations.
Deep project dive: How do you handle cold start and long-tail queries in your search ranking project? I said for cold start, new items use content features (title, category, attributes) for embeddings, and new users use demographic features and short-term behavior; for long-tail queries, I use query rewriting (synonym expansion, spelling correction), semantic matching (dense retrieval as fallback), and multi-task learning (shared bottom representations). The interviewer asked how to do query rewriting — I said seq2seq models or LLMs can generate rewrite candidates, then filter using click logs.
A business question: How does enterprise search differ from general web search? I mentioned several differences: enterprise search has domain-specific vocabulary, access control requirements, structured data integration needs, and queries are often more specific but less well-formed. The interviewer appreciated this analysis.
Round 3 lasted about 55 minutes. The interviewer said "great analysis" at the end and told me to wait for the HR round.
Key Questions Summary
1. Principles and construction of inverted index?
2. BM25's principle? Differences from TF-IDF?
3. How to evaluate search relevance? NDCG calculation?
4. Design a near-real-time index update strategy?
5. Differences between dense and sparse retrieval?
6. DPR's training approach?
7. What ANN retrieval methods exist? HNSW's principle?
8. Design a vector retrieval system for millions of documents?
9. Impact of LLMs on search?
10. What stages does search ranking include?
11. Differences between DeepFM, DCN, and DIN?
12. How to handle cold start and long-tail queries?
13. How does enterprise search differ from general search?
Insights and Advice
1. Search fundamentals are core: Inverted index, BM25, NDCG — these are the cornerstones of search algorithms. They're guaranteed interview topics and must be second nature.
2. Vector retrieval is key: Semantic search and vector retrieval are hot topics in the search field. You need to master the principles and engineering implementation of HNSW, PQ, and similar methods.
3. System design must be practical: Elasticsearch's interviews lean practical. System design questions need to consider engineering metrics like latency, memory, and recall — not just algorithms.
4. Go deep on ranking models: DeepFM, DCN, DIN — don't just know the names. Understand their feature interaction mechanisms and applicable scenarios.
5. Understand business differences: Different companies' search systems have different technical challenges. Before interviewing, make sure you understand the target company's business characteristics.
FAQ
Q: What background is needed for search algorithm roles?
A: Information retrieval fundamentals + machine learning + engineering ability. Search combines algorithms and engineering — pure algorithms or pure engineering alone aren't enough.
Q: How to transition without search experience?
A: Start with classic IR textbooks (Manning's IR book), then build a few search-related projects for practice.
Q: What's Elasticsearch's tech stack?
A: Java backend, Python for algorithms, Elasticsearch for base retrieval, custom vector retrieval engine, TensorFlow/PyTorch for model training.
Q: How difficult is the interview?
A: Above average. Round 1 focuses on fundamentals, Round 2 on frontiers and system design, Round 3 on business understanding and project deep dives.
Q: What's the career outlook for search algorithms?
A: Very good. In the LLM era, search is undergoing a paradigm shift. RAG and generative retrieval are new directions with many opportunities.