Feature misalignment in BEV perception is a critical yet often overlooked challenge, particularly under calibration uncertainty between LiDAR and camera sensors.
GraphBEV++ systematically mitigates projection-induced misalignment with two complementary modules. LocalAlign-v2 corrects local correspondence errors through neighborhood-aware graph matching, while GlobalAlign-v2 resolves representation-level inconsistencies through deformable offset learning or diffusion-based denoising.
The framework supports both LSS-based and query-based BEV representations, generalizes from detection to occupancy prediction, and improves perception, prediction, and planning in end-to-end autonomous driving.
Autonomous DrivingMulti-Modal FusionFeature AlignmentBird's-Eye View