Robust Multi-Modal Autonomous Driving

GraphBEV++

Multi-Modal Feature Alignment
for Autonomous Driving

+8.5mAP vs. BEVFusion
under noise
69.1mAP on
nuScenes-C
51.1AMOTA for
end-to-end tracking
88.7PDMS on
NAVSIM
01

Motivation

Alignment is a prerequisite
for reliable fusion.

Feature misalignment in BEV perception is a critical yet often overlooked challenge, particularly under calibration uncertainty between LiDAR and camera sensors.

GraphBEV++ systematically mitigates projection-induced misalignment with two complementary modules. LocalAlign-v2 corrects local correspondence errors through neighborhood-aware graph matching, while GlobalAlign-v2 resolves representation-level inconsistencies through deformable offset learning or diffusion-based denoising.

The framework supports both LSS-based and query-based BEV representations, generalizes from detection to occupancy prediction, and improves perception, prediction, and planning in end-to-end autonomous driving.

Autonomous DrivingMulti-Modal FusionFeature AlignmentBird's-Eye View
02

Method

Local correspondence.
Global consistency.

GraphBEV++ treats misalignment as a hierarchical error propagation process, correcting it at both BEV construction and fusion stages.

GraphBEV++ end-to-end autonomous driving framework
Figure 2. GraphBEV++ within a multi-modal end-to-end autonomous driving framework.
A

BEV Construction

LocalAlign-v2

Builds neighborhood-aware representations to compensate for inaccurate LiDAR-to-camera projections and reference-point deviations.

  • LSS variant encodes projected and neighboring depth.
  • Query variant refines BEV queries using adjacent queries.
  • Adaptive KNN allocates context by object scale and depth.
B

BEV Fusion

GlobalAlign-v2

Aligns heterogeneous BEV representations after local errors accumulate into spatial shifts and semantic inconsistencies.

  • Deformable variant learns explicit spatial offsets.
  • Diffusion variant progressively denoises implicit features.
  • Four-step refinement balances robustness and efficiency.
LocalAlign-v2 LSS and Query pipelines
LocalAlign-v2 for LSS and Query representations.
GlobalAlign-v2 Deformable and Diffusion pipelines
GlobalAlign-v2 with deformable and diffusion alignment.
03

Experiments

Robust across tasks,
datasets, and noise.

Evaluation spans 3D detection, BEV segmentation, semantic occupancy, and end-to-end driving under clean and misaligned settings.

MethodClean mAP ↑Noisy mAP ↑Clean NDS ↑Noisy NDS ↑Relative mAP drop ↓
SparseFusion70.464.772.867.18.1%
BEVFusion-MIT68.560.871.465.711.2%
BEVFormer-M70.963.273.066.310.8%
GraphBEV++ (LSS)70.769.373.272.32.0%
GraphBEV++ (Query)71.469.173.471.23.2%
MethodAMOTA ↑minADE ↓minFDE ↓MR ↓EPA ↑Avg. collision ↓
UniAD35.90.711.0215.145.60.31
FusionAD50.10.390.628.662.60.12
GraphBEV++ (LSS)49.80.400.598.564.70.21
GraphBEV++ (Query)51.10.380.527.764.50.13
Benchmark / taskBaselineGraphBEV++Key metricGain
Waymo-C L232.44 / 29.9536.48 / 33.08mAP / mAPH+4.04 / +3.13
Argoverse243.146.7mAP+3.6
3D occupancy, noisy27.86 / 17.6329.41 / 19.37IoU / mIoU+1.55 / +1.74
NAVSIM88.388.7PDMS+0.4
Performance with respect to misalignment severity
Robustness analysis. GraphBEV++ degrades substantially more gracefully as misalignment severity increases.
04

Key findings

What the experiments establish.

01

Cross-paradigm

A single alignment principle extends to dense LSS-based and sparse query-based BEV representations.

02

Noise-resilient

Under nuScenes-C, the LSS variant retains 98% of its clean-setting mAP.

03

Task-general

Better geometric correspondence improves detection, tracking, forecasting, occupancy, and planning.

04

Efficient

GraphBEV++ (LSS) runs at 7.1 FPS with only marginal alignment overhead on an A100 GPU.

05

Citation

Cite GraphBEV++

If this work supports your research, please cite the paper.

@misc{song2026graphbevplusplus,
  title   = {GraphBEV++: Multi-Modal Feature Alignment
             for Autonomous Driving},
  author  = {Song, Ziying and Jia, Caiyan and Liu, Lin and
             Xu, Shaoqing and Yang, Lei and Luo, Yadan},
  note    = {Manuscript},
  year    = {2026}
}