15 Must-Know Autonomous Driving Interview Questions: Perception, Planning, and Control Full-Chain Coverage

Interview TopicsAuthor: BeautyResume Team

15 high-frequency autonomous driving interview questions: 3D detection/BEV perception/multi-sensor fusion, behavior prediction/motion planning/Lattice Planner, MPC/PID control, HD maps/simulation testing, with assessment points and answer directions

Background

I previously worked in traditional computer vision and transitioned to autonomous driving last year. I interviewed at about five or six companies working on autonomous driving, including EV startups, traditional automakers' autonomous driving divisions, and a few L4 startups. Honestly, autonomous driving interviews cover a very broad range—from perception to planning to control, each module could have its own set of interview questions. I've compiled 15 high-frequency must-know questions that basically cover the knowledge points interviewers love to test, hoping to help anyone currently preparing.

Interview Process Review

The interview process for autonomous driving positions typically follows: resume screening → first technical round (fundamentals + perception) → second technical round (planning + control) → third technical round (system design or deep project dive) → HR round. A very obvious characteristic is that interviewers will ask targeted questions based on your background. For example, if your resume mentions 3D object detection, they'll go from BEV perception all the way to multi-sensor fusion; if you wrote about planning, they'll go from behavior prediction to motion planning to spatiotemporal joint optimization. So whatever you put on your resume must withstand deep probing. A friend of mine was asked in a second-round interview, "How much does your 3D detection performance degrade at night, and how did you address it?" He hadn't prepared for that and froze on the spot.

Question Collection

1. Perception (5 Questions)

1. What are the mainstream methods for 3D object detection? What's the difference between PointPillars and CenterPoint?

Assessment point: Understanding the technical roadmap and representative methods of 3D detection.

Answer direction: 3D detection is mainly divided into point cloud methods and multi-modal methods. Point cloud methods are further divided into voxel-based (PointPillars, VoxelNet) and point-based (PointNet++, CenterPoint). PointPillars divides the point cloud into pillar-shaped voxels, uses PointNet to extract features, unfolds them into a pseudo-image, and then applies a 2D detection head. The advantage is fast speed, suitable for real-time deployment. CenterPoint first uses a backbone network to extract BEV features, then uses CenterHead to predict object center points and other attributes through keypoint heatmaps. CenterPoint's advantage is no NMS needed (since center points are unique), with higher accuracy but slightly slower speed. In practice, PointPillars is more commonly deployed, while CenterPoint performs better on academic benchmarks.

2. What is BEV perception? Why is it the current mainstream approach?

Assessment point: Understanding the motivation and advantages of Bird's Eye View perception.

Answer direction: BEV perception converts multi-camera images into a unified bird's eye view space for perception. Reasons for becoming mainstream: unified representation—multi-camera features naturally align in BEV space without post-fusion; suitable for downstream tasks—planning and control both work in BEV space, so perception results are directly usable; convenient temporal fusion—features from different time steps align easily in BEV space. Mainstream approaches are divided into LSS-based (Lift-Splat-Shoot, explicitly predicting depth distributions for 3D projection) and Transformer-based (BEVFormer, using cross-attention for feature queries). BEVFormer has higher accuracy but greater computational cost, while the LSS series is more lightweight. Tesla's Occupancy Network is also an extension of BEV perception.

3. What are the methods for multi-sensor fusion? What's the difference between early and late fusion?

Assessment point: Understanding sensor fusion strategies and choices.

Answer direction: Fusion methods by level: Early fusion (data-level)—raw data directly fused, most complete information preservation but high alignment requirements; Feature fusion—each sensor extracts features then fuses at intermediate layers, currently the mainstream approach; Late fusion (object-level)—each sensor independently detects then fuses results, simple implementation but significant information loss. By timing: early fusion (at input) and late fusion (at output). The current trend is feature-level fusion, such as BEVFusion fusing LiDAR point clouds and camera images in BEV feature space. Early fusion has the most information but extremely high calibration and synchronization requirements; late fusion is simplest but loses complementary information.

4. How is lane line detection done? What are the challenges?

Assessment point: Understanding methods and difficulties of lane line detection.

Answer direction: Lane line detection methods: segmentation methods (segmenting lane line pixels, e.g., SCNN); detection methods (directly regressing lane line parameters, e.g., PolyLaneNet); anchor-based methods (e.g., LineCNN using row anchors); Transformer methods (e.g., CLRNet using cross-attention to refine lane lines). Main challenges: occlusion (front vehicle occlusion, curve occlusion); wear (old lane lines unclear); complex topology (merges/splits, intersections); lighting changes (backlight, nighttime, rain/snow). The industry currently prefers parametric representation (e.g., cubic polynomials or Bézier curves) to model lane lines, which is more robust than pixel-level segmentation.

5. What are the mainstream methods for Multi-Object Tracking (MOT)?

Assessment point: Understanding the technical roadmap for object tracking.

Answer direction: MOT is divided into Detection-Based Tracking (DBT) and Joint Detection and Tracking (JDT). DBT detects first then associates, with representative methods including SORT/DeepSORT (using Kalman filtering + Hungarian algorithm for data association). JDT does detection and tracking end-to-end, with representative methods including TrackFormer and MOTR (using DETR architecture for joint detection and tracking). In autonomous driving, DBT remains mainstream because it's more controllable in engineering. The core of association algorithms is feature matching (appearance features + motion features). DeepSORT uses ReID features for appearance matching, ByteTrack uses detection scores for hierarchical association. 3D MOT also needs to consider 3D position and velocity information for association.

2. Planning (5 Questions)

6. How is behavior prediction done? What uncertainties exist?

Assessment point: Understanding methods for trajectory prediction and uncertainty modeling.

Answer direction: Behavior prediction is divided into deterministic prediction and probabilistic prediction. Deterministic prediction outputs one trajectory; probabilistic prediction outputs multiple possible trajectories (multimodal). Mainstream methods: encode-interact-decode framework—first use vector encoders to extract road and agent features, then use interaction modules to model inter-agent relationships, and finally use decoders to generate trajectories. Representative methods: MultiPath (anchors + offsets), CoverNet (covering sets), MotionFormer (Transformer architecture). Sources of uncertainty: epistemic uncertainty (model's own uncertainty, estimable through ensembling) and aleatoric uncertainty (inherent randomness of the environment, e.g., pedestrian intent is unpredictable). The industry commonly uses multimodal prediction (outputting top-K trajectories with probabilities) to handle uncertainty.

7. What are the methods for motion planning? How to handle dynamic obstacles?

Assessment point: Understanding core algorithms for motion planning and dynamic scene handling.

Answer direction: Motion planning is divided into search methods (A*, Hybrid A*) and optimization methods (quadratic programming, nonlinear optimization). Search methods search for paths in configuration space, suitable for scenarios with complex constraints; optimization methods iteratively optimize near an initial solution, fast but dependent on the initial solution. Handling dynamic obstacles: spatiotemporal planning (planning in x-y-t 3D space, considering both space and time); prediction + planning (first predict dynamic obstacle trajectories, then plan on predictions); interactive planning (considering the impact of ego vehicle behavior on other vehicles, using game-theoretic planning). The industry commonly uses a hierarchical architecture: behavior planning (decision) → motion planning (trajectory generation) → control (tracking).

8. What is the principle of Lattice Planner?

Assessment point: Understanding the mechanism of Lattice Planner.

Answer direction: The core idea of Lattice Planner is sampling in endpoint space, generating a set of candidate endpoints, connecting each endpoint to the starting point using polynomials (typically quintic polynomials) to generate candidate trajectories, then performing cost evaluation on candidate trajectories (considering comfort, safety, efficiency, etc.), and selecting the trajectory with minimum cost. Lattice Planner's advantages: good real-time performance (controllable computation for sampling + evaluation); strong interpretability (cost function weights are tunable); easy to add constraints (directly add penalty terms in the cost function). Disadvantages: trade-off between sampling density and computation (too sparse may miss good trajectories, too dense increases computation); sensitive to endpoint space design (endpoint sampling strategy directly affects planning quality).

9. How are decision trees used in autonomous driving?

Assessment point: Understanding methods for behavioral decision-making.

Answer direction: Decision trees are used in the behavioral decision layer of autonomous driving to determine what behavior the ego vehicle should execute (car-following, lane change, yielding, etc.). Implementation approaches: Rule-based decision trees (building if-else rules based on expert experience, simple but hard to cover all scenarios); POMDP (Partially Observable Markov Decision Process, can model uncertainty but has high computational complexity); Reinforcement learning (end-to-end learning of decision policies, good generalization but hard to guarantee safety). The industry currently primarily uses rules + state machines, supplemented by some learning methods. The core challenges of rule-based decision trees are scenario coverage and rule conflicts, requiring extensive testing and iteration to refine the rule base.

10. What is spatiotemporal joint planning? What advantages does it have over decoupled planning?

Assessment point: Understanding the advantages of spatiotemporal joint optimization.

Answer direction: Traditional decoupled planning first does path planning (s-l space) then speed planning (s-t space), solving two sub-problems separately. Spatiotemporal joint planning simultaneously optimizes path and speed in x-y-t 3D space. Joint planning advantages: better solutions (the two sub-problems of decoupled planning may not be globally optimal, while joint planning can find better solutions); more natural handling of dynamic scenes (dynamic obstacles appear as "obstacle columns" in spatiotemporal space, which joint planning can naturally avoid); smoother (simultaneously optimizing path and speed produces smoother, more comfortable trajectories). Disadvantages: higher computation (3D optimization space is much larger than 2D); harder to solve (more complex non-convex optimization). In practice, a coarse planning + fine optimization strategy is commonly used: first find a coarse solution in spatiotemporal space using search methods, then refine with optimization methods.

3. Control (3 Questions)

11. How is MPC used in autonomous driving? What's the difference from PID?

Assessment point: Understanding the principles and advantages of Model Predictive Control.

Answer direction: MPC's core idea is to solve a finite-horizon optimal control problem at each time step, execute only the first control action, then roll forward. MPC's advantages: can handle constraints (speed constraints, acceleration constraints, road boundary constraints, etc.); can predict the future (predicts future states based on models, planning control actions in advance); can optimize multiple objectives (weighted optimization of comfort, safety, tracking accuracy, etc.). Difference from PID: PID is feedback control that only looks at current error; MPC is predictive control that looks at errors within a future horizon. MPC is smarter but more computationally intensive; PID is simpler but cannot handle constraints. In autonomous driving, longitudinal control commonly uses MPC (handling following distance constraints), while lateral control uses both MPC and PID.

12. How to tune PID control parameters? Any tips?

Assessment point: Understanding practical experience with PID tuning.

Answer direction: Basic PID tuning order: tune P first (proportional gain, increasing P makes response faster but may cause oscillation); then tune D (derivative gain, increasing D suppresses oscillation but may amplify noise); finally tune I (integral gain, eliminates steady-state error but may cause overshoot). Practical tips: incremental tuning (adjust only one parameter at a time, observe effects); frequency domain analysis (use Bode plots to analyze stability margins); adaptive PID (different parameters at different speeds/curvatures using lookup tables); anti-windup (add integral clamping to prevent windup). The difficulty of PID tuning in autonomous driving is that different operating conditions require different parameters—the optimal parameters differ greatly between high and low speeds, straight roads and curves, typically requiring parameter scheduling.

13. What are the sources of trajectory tracking error? How to reduce them?

Assessment point: Understanding accuracy issues in trajectory tracking and improvement methods.

Answer direction: Error sources: model error (imprecise vehicle dynamics model, e.g., inaccurate tire cornering stiffness estimation); delay error (sensor delay, computation delay, actuator delay, total delay can reach 100-200ms); discretization error (planned trajectory is continuous, control is executed discretely); disturbances (road grade, wind, load changes, etc.). Methods to reduce errors: delay compensation (predict vehicle state after delay period during control computation as current state); model improvement (use more precise vehicle models, e.g., models considering tire nonlinearity); feedforward control (calculate feedforward steering angle based on reference trajectory curvature to reduce tracking error); adaptive control (online estimation of model parameters, real-time controller adjustment).

4. System (2 Questions)

14. What is the role of HD maps in autonomous driving? Are mapless solutions feasible?

Assessment point: Understanding the value of HD maps and the trend toward mapless approaches.

Answer direction: HD maps provide prior environmental information: lane-level topology, road geometry, traffic sign positions, etc. Their role is to reduce perception burden, provide beyond-line-of-sight information, and assist localization. HD map issues: freshness maintenance (maps need timely updates after road changes, extremely costly); coverage (only covers surveyed areas, cannot be used immediately in new cities); regulatory risks (surveying license restrictions). Mapless approaches: online mapping (using vehicle sensors to construct local HD maps in real-time, e.g., StreamMapNet); end-to-end (skipping maps entirely from perception to planning, like Tesla's approach). The current industry trend is "light map" solutions: retaining topological information while removing geometric details, supplemented by vehicle-side perception. Completely mapless is feasible for L2+, but L4 still needs map assistance.

15. How is autonomous driving simulation testing done? What are the metrics?

Assessment point: Understanding simulation testing methods and evaluation systems.

Answer direction: Simulation testing is divided into Software-in-the-Loop (SIL), Hardware-in-the-Loop (HIL), and Vehicle-in-the-Loop (VIL). SIL is the lightest, pure software simulation; HIL connects real ECUs to test software-hardware integration; VIL uses real vehicles + virtual scenarios in closed testing grounds. Simulation platforms: CARLA (open source, commonly used in academia), VTD, PreScan, Waymo's Simulation City. Key metrics: collision rate (most basic safety metric); pass rate (percentage of scenarios completed); comfort metrics (acceleration jerk, lateral acceleration, etc.); efficiency metrics (arrival time, traffic efficiency). The core challenge of simulation is the Sim2Real Gap—imprecise sensor models, environment models, and vehicle models all lead to inconsistency between simulation results and real-world performance.

Key Takeaways

The biggest characteristic of autonomous driving interviews is their strong systematic nature. You need to understand not just your own module but also how upstream and downstream modules work together. For example, if you work on perception, interviewers will ask "how are your detection results used by planning, what format, what latency requirements"; if you work on planning, they'll ask "how do you handle false detections from perception." So when preparing for interviews, you must have full-chain thinking.

My second piece of advice is to ensure your project details can withstand follow-up questions. Interviewers especially love to ask "why did you choose this approach over that one" and "where is the performance bottleneck in this module and how did you optimize it." If you just called an API without understanding the principles, it's easy to get stuck.

My third piece of advice is to stay current with industry developments. Autonomous driving technology iterates quickly—BEV perception, end-to-end, Occupancy Network, and other new technologies are frequently asked about in interviews. I recommend reading the latest technical blogs and papers from major companies in the past six months before your interview.

FAQ

Q: Do I need to know C++ for autonomous driving interviews?

A: Basically required. Autonomous driving engineering implementation is almost entirely in C++, and you may be asked to hand-write simple algorithm implementations during interviews. Python is also needed for data processing and model training.

Q: How to transition without autonomous driving experience?

A: You can enter from your existing direction. CV folks can move to perception, optimization folks to planning, control folks are directly aligned. The key is filling in domain knowledge—for example, those doing perception need to learn 3D vision and point cloud processing.

Q: What if asked about an unfamiliar module in an interview?

A: Be honest about limited depth of understanding, but share your reasoning. Interviewers care more about whether you can quickly understand new domains than knowing everything.

Q: Which papers should I read?

A: Perception: PointPillars, CenterPoint, BEVFormer, BEVFusion. Planning: MultiPath, CoverNet, MotionFormer. Control: MPC-related textbooks. System: CARLA simulation-related. At least 3-5 representative papers per direction.

Q: Will end-to-end autonomous driving replace modular approaches?

A: Not in the short term. End-to-end has made progress in L2+ scenarios (like Tesla), but L4's safety and interpretability requirements make modular approaches still mainstream. The future may be a hybrid of modular + end-to-end.

#Autonomous Driving#BEV Perception#3D Detection#Motion Planning#MPC#HD Mapping#Interview Trivia#感知 Fusion