NEC Face Recognition Algorithm Interview: Face Detection, Feature Extraction, and Liveness Detection
2 years of face recognition experience, detailed review of NEC face recognition algorithm engineer three-round technical interview, covering face detection, feature extraction, liveness detection, large-scale retrieval and more
Background
I've been working in face recognition algorithms for quite a while. My master's research focused on face recognition, and after graduating, I spent two years at a security AI company, working across the entire pipeline from face detection to feature extraction to liveness detection. Honestly, while face recognition is a mature field, there are still many interesting engineering challenges in real-world deployment — face detection under extreme lighting, performance optimization for large-scale face retrieval, liveness detection against novel attack methods, and more.
NEC is one of the leading players in face recognition, with deep accumulation in financial security and smart transportation scenarios. When I saw they were hiring face recognition algorithm engineers, I applied. During preparation, I focused on the latest advances in face detection (MTCNN, RetinaFace, SCRFD), face feature extraction (ArcFace, AdaFace), liveness detection (silent and interactive), and engineering optimization for large-scale face retrieval. The preparation period was about a month.
About a week after applying, HR called and scheduled the first technical interview after a brief discussion. The entire process was three technical rounds plus one HR round. Here's my detailed review.
Interview Process Review
Round 1: CV Fundamentals + Face Detection (about 60 minutes)
The first-round interviewer was a veteran CV engineer. After a brief self-introduction, we jumped straight into technical questions. The pace was fast with high question density.
Question 1: What's the difference between face detection and general object detection? Why can't you just use YOLO for face detection?
I said the core differences are: face scale variation is much larger (from tens to hundreds of pixels), small targets are more prevalent, and faces overlap heavily in dense scenarios. Problems with using YOLO directly: YOLO's anchor design doesn't suit face aspect ratios (faces are roughly 1:1 while general objects vary widely); YOLO's capability for small targets is insufficient, and FPN's feature fusion isn't optimal for faces; NMS in dense scenarios tends to incorrectly suppress detections. Face detection typically uses specially designed networks — RetinaFace uses multi-task learning (face detection + landmark localization), and SCRFD uses dynamic anchors and feature pyramid optimization.
Question 2: What's the design philosophy of RetinaFace? Why does it outperform MTCNN?
I said RetinaFace's core innovations are: first, using FPN for multi-scale feature fusion, which is more efficient than MTCNN's three-stage cascade; second, adding a face landmark branch for self-supervised learning, where landmark supervision helps feature extraction focus on face regions; third, adding a 3D face shape prediction branch for richer supervision signals. Compared to MTCNN, RetinaFace's advantages are: single-stage detection is faster, multi-task learning improves detection accuracy, and FPN feature fusion is more friendly to small targets. The interviewer followed up on RetinaFace's loss function design — I said classification uses Focal Loss, box regression uses Smooth L1, landmarks use L2 Loss, and 3D shape uses L2 Loss, with the four losses weighted and summed.
Question 3: What are the pros and cons of Anchor-Free vs. Anchor-Based face detection methods?
I said Anchor-Based methods have the advantage of prior knowledge guidance, more stable training, and better adaptability to scale variation; the downside is needing manual anchor design, many hyperparameters, and time-consuming NMS post-processing. Anchor-Free methods (like CenterFace) don't need anchor design, have simpler post-processing, and faster inference; but training is less stable, small target detection is worse, and feature alignment is less precise. In practice, use Anchor-Based for high-precision requirements and Anchor-Free for high-speed requirements, or a hybrid approach.
Coding Problem: Implement a simple NMS (Non-Maximum Suppression) function that takes detection boxes and confidence scores as input and outputs filtered results.
This is a classic problem. I implemented standard NMS: sort by confidence, sequentially take the highest-confidence box, and remove boxes with IoU exceeding the threshold. The interviewer asked about time complexity — I said O(N^2), then asked about acceleration methods. I mentioned Soft-NMS, Fast-NMS, and CUDA parallel acceleration.
Round 2: Feature Extraction + Liveness Detection (about 75 minutes)
The second-round interviewer was a senior researcher in face recognition. The questions were deeper and more cutting-edge.
Question 1: How is ArcFace's loss function designed? Why does it outperform SphereFace and CosFace?
I said ArcFace's core innovation is adding an angular margin in angular space, making same-class features more compact and different-class features more separated. Specifically, ArcFace adds angular margin m to the target logit, making cos(θ+m) < cos(θ), increasing classification difficulty. Compared to SphereFace (multiplicative margin) and CosFace (cosine margin), ArcFace's advantages are: the additive angular margin is uniform in angular space, affecting all classes consistently; and ArcFace doesn't need dynamic margin adjustment during training (SphereFace does), making training more stable. The interviewer asked about ArcFace's issues with long-tail distributions — I said ArcFace does perform worse on tail classes, which can be mitigated with adaptive margin methods like AdaFace.
Question 2: How do you handle large pose (profile face) problems in face feature extraction?
I said large pose is a classic challenge in face recognition. Main approaches include: multi-view models with different models or branches for different poses; face frontalization using GANs to generate frontal views from profile faces; pose-robust feature learning with large-pose samples and hard sample mining; and 3D face reconstruction from 2D images. The most commonly used approach in industry is the third one, since it doesn't require additional generative models and is efficient for both training and inference.
Question 3: What liveness detection methods are there? How do you defend against Deepfake attacks?
I said liveness detection falls into two categories: interactive liveness (blinking, head turning, mouth opening) and silent liveness (no user cooperation needed). Interactive liveness has high accuracy but poor user experience; silent liveness has better experience but higher technical difficulty. Mainstream silent liveness approaches include: RGB-based methods (using CNNs to extract spoofing cues like screen borders and moiré patterns), depth-based methods (using stereo or structured light for depth information — real faces are 3D while photos are 2D), and infrared-based methods (real faces and fake faces differ significantly in infrared images). For Deepfake attacks, traditional screen reflection and border-based methods may fail, requiring more advanced detection: frequency domain analysis (Deepfake images have specific patterns in frequency domain), temporal consistency detection (inter-frame inconsistencies in videos), and biosignal-based methods (detecting whether heart rate and other physiological signals exist).
Question 4: What's the overall architecture of a face recognition system? What's the complete pipeline from image input to final recognition result?
I said the complete pipeline includes: face detection → face alignment (affine transformation using landmarks) → feature extraction → feature matching. The interviewer followed up on feature matching implementation — for 1:1 verification, directly compute cosine similarity between two features and compare with a threshold; for 1:N retrieval, build an index on the feature database and use Faiss for approximate nearest neighbor search. The interviewer asked about Faiss index type selection — for small scale (under 1 million), use Flat L2 or IVF; for large scale (over 10 million), use IVF-PQ or HNSW.
Coding Problem: Implement a simple cosine similarity calculation function and Top-K retrieval.
I implemented cosine similarity calculation and heap-based Top-K retrieval. The interviewer asked about retrieval time complexity — I said brute-force search is O(N*D), where N is the number of features in the database and D is the feature dimension. When asked about acceleration, I mentioned Faiss's IVF index and PQ compression.
Round 3: Deep Project Dive + Large-Scale Face Retrieval (about 85 minutes)
The third round was with the department's technical lead, focusing more on project experience and systems-level thinking.
Question 1: What was the most challenging work you've done in face recognition projects?
I described a smart access control project: 1:N face retrieval needed to complete within 1 second, with the feature database growing from 100K to 5 million people. My optimization: first, compressing feature dimensions from 512 to 256 using PCA + quantization-aware training to maintain accuracy; second, adjusting Faiss IVF index from nlist=100 to nlist=4096, reducing the number of clusters scanned per search; third, GPU-accelerated feature matching, reducing single retrieval from 50ms to 8ms. The hardest part was feature compression — direct PCA dimensionality reduction dropped accuracy by 2%, so I used knowledge distillation with the 512-dim model as teacher to guide 256-dim model training, ultimately reducing accuracy by only 0.3%.
Question 2: How do you handle cross-domain face recognition (e.g., ID photos vs. live photos)?
I said the core challenge is domain shift — ID photos are taken in controlled environments while live photos have uncontrolled lighting, pose, and blur. Solutions include: domain adaptation using MMD or adversarial training to narrow feature distribution differences; style transfer to generate ID photo-style images from live photos; and mixed-domain training using both ID and live photo data with domain-conditional normalization. Our project used the third approach since we had sufficient data and didn't need additional generative models. The interviewer asked about handling very low-quality live photos (blurred, occluded) — I said we can add a quality assessment module to lower matching thresholds for low-quality images or request re-capture.
Question 3: How do you ensure the security of face recognition systems? What attack vectors exist?
I said face recognition systems face photo attacks, video replay attacks, 3D mask attacks, Deepfake attacks, and adversarial example attacks. Defense strategies should be layered: collection layer with liveness detection, transmission layer with encryption and tamper-proofing, matching layer with adversarial example detection, and system layer with multi-factor authentication. The interviewer asked about adversarial example defense — I said we can use adversarial training to enhance model robustness or use denoising and compression in input preprocessing to disrupt adversarial perturbations.
Question 4: What do you know about NEC's technical direction? What unsolved problems remain in face recognition?
I said NEC has deep accumulation in ID verification, financial security, and smart transportation, especially in cross-domain recognition and large-scale retrieval. I think several problems remain unsolved: recognition under extreme conditions (total darkness, heavy occlusion) is essentially unsolved; privacy protection and regulatory compliance (GDPR and personal information protection laws increasingly restrict face data usage); and fairness issues with accuracy differences across races, ages, and genders. The interviewer was very interested in fairness, saying this is also a direction they're researching.
Interview Questions Summary
1. Face detection vs. general object detection differences
2. RetinaFace design philosophy and MTCNN comparison
3. Anchor-Free vs. Anchor-Based face detection comparison
4. NMS algorithm implementation and acceleration
5. ArcFace loss function design and comparison
6. Large pose face recognition approaches
7. Liveness detection methods and Deepfake defense
8. Face recognition system overall architecture
9. Cosine similarity and Top-K retrieval implementation
10. Large-scale face retrieval performance optimization
11. Cross-domain face recognition approaches
12. Face recognition system security and attack defense
Tips and Advice
NEC's face recognition interview is very professional. Interviewers have deep domain understanding and questions hit the key points. A few tips:
1. Know the complete face recognition pipeline: From detection to alignment to feature extraction to matching, understand the principles and mainstream methods for each stage. Don't focus only on feature extraction while neglecting detection and liveness detection.
2. Read ArcFace series papers thoroughly: From SphereFace to CosFace to ArcFace to AdaFace, understand the evolution, especially loss function design motivation and mathematical derivation.
3. Engineering optimization experience is a big plus: NEC is a deployment-focused company that values engineering capability. Being able to discuss large-scale retrieval optimization experience and model compression practice details is very advantageous.
4. Pay attention to security and privacy: Ethical and security issues in face recognition are increasingly important and may come up in interviews. I recommend understanding GDPR, personal information protection laws, and privacy-preserving technologies like federated learning and differential privacy.
FAQ
Q: What does a NEC face recognition algorithm engineer do day-to-day?
A: Mainly responsible for R&D and optimization of face detection, feature extraction, liveness detection, and other algorithms, covering model training, accuracy tuning, performance optimization, and production deployment. Requires both algorithm research and engineering deployment skills.
Q: Are publications required?
A: Top conference papers aren't required, but deep understanding of mainstream methods is expected. Relevant publications are a plus but not mandatory.
Q: Can I apply without face recognition experience?
A: If you have general CV deep learning experience, transitioning to face recognition is feasible. Face detection and feature extraction fundamentals overlap significantly with general object detection and image classification.
Q: What's NEC's tech stack?
A: Training framework uses PyTorch, detection models use RetinaFace/SCRFD, feature extraction uses ArcFace series, deployment with C++ and TensorRT, retrieval with Faiss.
Q: How long until results come out?
A: For me, Round 2 was scheduled 3 days after Round 1, Round 3 was 5 days after Round 2, and results came out one week after Round 3. The entire process took about 2.5 weeks.