Robust Multi-Modal Autonomous Driving

GraphBEV++

Multi-Modal Feature Alignment
for Autonomous Driving

Ziying Song · Caiyan Jia · Lin Liu · Shaoqing Xu · Lei Yang · Yadan Luo

PDF Paper </> Code “ BibTeX

+8.5mAP vs. BEVFusion
under noise

69.1mAP on
nuScenes-C

51.1AMOTA for
end-to-end tracking

88.7PDMS on
NAVSIM

Motivation

Alignment is a prerequisite
for reliable fusion.

Feature misalignment in BEV perception is a critical yet often overlooked challenge, particularly under calibration uncertainty between LiDAR and camera sensors.

GraphBEV++ systematically mitigates projection-induced misalignment with two complementary modules. LocalAlign-v2 corrects local correspondence errors through neighborhood-aware graph matching, while GlobalAlign-v2 resolves representation-level inconsistencies through deformable offset learning or diffusion-based denoising.

The framework supports both LSS-based and query-based BEV representations, generalizes from detection to occupancy prediction, and improves perception, prediction, and planning in end-to-end autonomous driving.

Autonomous DrivingMulti-Modal FusionFeature AlignmentBird's-Eye View

Method

Local correspondence.
Global consistency.

GraphBEV++ treats misalignment as a hierarchical error propagation process, correcting it at both BEV construction and fusion stages.

GraphBEV++ end-to-end autonomous driving framework — Figure 2. GraphBEV++ within a multi-modal end-to-end autonomous driving framework.

BEV Construction

LocalAlign-v2

Builds neighborhood-aware representations to compensate for inaccurate LiDAR-to-camera projections and reference-point deviations.

LSS variant encodes projected and neighboring depth.
Query variant refines BEV queries using adjacent queries.
Adaptive KNN allocates context by object scale and depth.

BEV Fusion

GlobalAlign-v2

Aligns heterogeneous BEV representations after local errors accumulate into spatial shifts and semantic inconsistencies.

Deformable variant learns explicit spatial offsets.
Diffusion variant progressively denoises implicit features.
Four-step refinement balances robustness and efficiency.

LocalAlign-v2 LSS and Query pipelines — LocalAlign-v2 for LSS and Query representations.

GlobalAlign-v2 Deformable and Diffusion pipelines — GlobalAlign-v2 with deformable and diffusion alignment.

Experiments

Robust across tasks,
datasets, and noise.

Evaluation spans 3D detection, BEV segmentation, semantic occupancy, and end-to-end driving under clean and misaligned settings.

Method	Clean mAP ↑	Noisy mAP ↑	Clean NDS ↑	Noisy NDS ↑	Relative mAP drop ↓
SparseFusion	70.4	64.7	72.8	67.1	8.1%
BEVFusion-MIT	68.5	60.8	71.4	65.7	11.2%
BEVFormer-M	70.9	63.2	73.0	66.3	10.8%
GraphBEV++ (LSS)	70.7	69.3	73.2	72.3	2.0%
GraphBEV++ (Query)	71.4	69.1	73.4	71.2	3.2%

Method	AMOTA ↑	minADE ↓	minFDE ↓	MR ↓	EPA ↑	Avg. collision ↓
UniAD	35.9	0.71	1.02	15.1	45.6	0.31
FusionAD	50.1	0.39	0.62	8.6	62.6	0.12
GraphBEV++ (LSS)	49.8	0.40	0.59	8.5	64.7	0.21
GraphBEV++ (Query)	51.1	0.38	0.52	7.7	64.5	0.13

Benchmark / task	Baseline	GraphBEV++	Key metric	Gain
Waymo-C L2	32.44 / 29.95	36.48 / 33.08	mAP / mAPH	+4.04 / +3.13
Argoverse2	43.1	46.7	mAP	+3.6
3D occupancy, noisy	27.86 / 17.63	29.41 / 19.37	IoU / mIoU	+1.55 / +1.74
NAVSIM	88.3	88.7	PDMS	+0.4

Performance with respect to misalignment severity — Robustness analysis. GraphBEV++ degrades substantially more gracefully as misalignment severity increases.

Key findings

What the experiments establish.

Cross-paradigm

A single alignment principle extends to dense LSS-based and sparse query-based BEV representations.

Noise-resilient

Under nuScenes-C, the LSS variant retains 98% of its clean-setting mAP.

Task-general

Better geometric correspondence improves detection, tracking, forecasting, occupancy, and planning.

Efficient

GraphBEV++ (LSS) runs at 7.1 FPS with only marginal alignment overhead on an A100 GPU.

Citation

Cite GraphBEV++

If this work supports your research, please cite the paper.

@misc{song2026graphbevplusplus,
  title   = {GraphBEV++: Multi-Modal Feature Alignment
             for Autonomous Driving},
  author  = {Song, Ziying and Jia, Caiyan and Liu, Lin and
             Xu, Shaoqing and Yang, Lei and Luo, Yadan},
  note    = {Manuscript},
  year    = {2026}
}