| 1 |
LI H Y, SIMA C H, DAI J F, et al Delving into the devils of bird’s-eye-view perception: a review, evaluation and recipe. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2024, 46 (4): 2151- 2170.
doi: 10.1109/TPAMI.2023.3333838
|
| 2 |
LIU Z J, TANG H T, AMINI A, et al. BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. https://arxiv.org/abs/2205.13542.
|
| 3 |
LI Q, WANG Y, WANG Y L, et al. HDmapnet: an online HD map construction and evaluation framework. Proc. of the International Conference on Robotics and Automation, 2022: 4628−4634.
|
| 4 |
PAN B, SUN J K, LEUNG H Y T, et al Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, 2020, 5 (3): 4867- 4873.
doi: 10.1109/LRA.2020.3004325
|
| 5 |
ZHOU B, KRAHENBUHL P. Cross-view transformers for real-time map-view semantic segmentation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 13760−13769.
|
| 6 |
LI Z Q, WANG W H, LI H Y, et al BEVFormer: learning bird’s-eye-view representation from lidar-camera via spatiotemporal transformers. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2025, 47 (3): 2020- 2036.
doi: 10.1109/TPAMI.2024.3515454
|
| 7 |
GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning. Proc. of the International Conference on Machine Learning, 2017: 1243−1252.
|
| 8 |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770−778.
|
| 9 |
LANG A H, VORA S, CAESAR H, et al. Pointpillars: fast encoders for object detection from point clouds. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 12697−12705.
|
| 10 |
YANG Z D, ZENG A L, LI Z, et al. From knowledge distillation to self-knowledge distillation: a unified approach with normalized loss and customized soft labels. Proc. of the IEEE/CVF International Conference on Computer Vision, 2023: 17185−17194.
|
| 11 |
HAN K, WANG Y H, CHEN H T, et al A survey on vision transformer. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2022, 45 (1): 87- 110.
|
| 12 |
PHILION J, FIDLER S. Lift, splat, shoot: encoding images from arbitrary camera rigs by implicitly unprojecting to 3D. Proc. of the European Conference on Computer Vision, 2020: 194−210.
|
| 13 |
HUANG J J, HUANG G, ZHU Z, et al. BEVDet: high-performance multi-camera 3D object detection in bird-eye-view. https://arxiv.org/abs/2112.11790.
|
| 14 |
HUANG J J, HUANG G. BEVDet4D: exploit temporal cues in multi-camera 3D object detection. https://arxiv.org/abs/2203.17054.
|
| 15 |
LI Y H, GE Z, YU G Y, et al. BEVDepth: acquisition of reliable depth for multi-view 3D object detection. Proc. of the AAAI Conference on Artificial Intelligence, 2023: 1477−1485.
|
| 16 |
HU H T, WANG F Y, SU J W, et al. EA-LSS: edge-aware lift-splat-shot framework for 3D BEV object detection. https://arxiv.org/abs/2303.17895.
|
| 17 |
LIU Y F, WANG T C, ZHANG X Y, et al. PETR: position embedding transformation for multi-view 3D object detection. Proc. of the European Conference on Computer Vision, 2022: 531−548.
|
| 18 |
WANG S H, LIU Y F, WANG T C, et al. Exploring object-centric temporal modeling for efficient multi-view 3D object detection. Proc. of the IEEE/CVF International Conference on Computer Vision, 2023: 3621−3631.
|
| 19 |
VORA S, LANG A H, HELOU B, et al. Pointpainting: sequential fusion for 3D object detection. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4604−4612.
|
| 20 |
CAI H X, ZHANG Z Y, ZHOU Z Y, et al. BEVFusion4D: learning LIDAR-camera fusion under bird’s-eye-view via cross-modality guidance and temporal aggregation. https://arxiv.org/abs/2303.17099.
|
| 21 |
ZHANG H C, LIANG L, ZENG P X, et al. SparseLIF: high-performance sparse LIDAR-camera fusion for 3D object detection. Proc. of the European Conference on Computer Vision, 2024: 109−128.
|
| 22 |
ZHOU S C, LIU W Z, HU C, et al. UniDistill: a universal cross-modality knowledge distillation framework for 3D object detection in bird’s-eye view. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 5116−5125.
|
| 23 |
LI J N, LU M, LIU J M, et al BEV-LGKD: a unified lidar-guided knowledge distillation framework for multi-view BEV 3D object detection. IEEE Trans. on Intelligent Vehicles, 2023, 9 (1): 2489- 2498.
doi: 10.1109/tiv.2023.3319430
|
| 24 |
JIANG Z, ZHANG J Q, ZHANG Y N, et al. FSD-BEV: foreground self-distillation for multi-view 3D object detection. Proc. of the European Conference on Computer Vision, 2024: 110−126.
|
| 25 |
WU Z F, SHEN C H, VAN DEN HENGEL A Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recognition, 2019, 90, 119- 133.
doi: 10.1016/j.patcog.2019.01.006
|
| 26 |
CAESAR H, BANKITI V, LANG A H, et al. Nuscenes: a multimodal dataset for autonomous driving. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 11621−11631.
|
| 27 |
GEIGER A, LENZ P, STILLER C, et al Vision meets robotics: the KITTI dataset. The International Journal of Robotics Research, 2013, 32 (11): 1231- 1237.
doi: 10.1177/0278364913491297
|
| 28 |
NADERI B, CUTLER R, KHONGBANTABAM N S, et al. VCD: a video conferencing dataset for video compression. Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2024: 3970−3974.
|
| 29 |
LIU H S, TENG Y, LU T, et al. SparseBEV: high-performance sparse 3D object detection from multi-camera videos. Proc. of the IEEE/CVF International Conference on Computer Vision, 2023: 18580−18590.
|
| 30 |
JI H, NI T, HUANG X F, et al. RoPETR: improving temporal camera-only 3D detection by integrating enhanced rotary position embedding. https://arxiv.org/abs/2504.12643.
|
| 31 |
WANG Z T, HUANG Z H, GAO Y L, et al MV2DFusion: leveraging modality-specific object semantics for multi-modal 3D detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2026, 48 (1): 609- 623.
doi: 10.1109/TPAMI.2025.3609348
|
| 32 |
WANG Z C, LI W L, SUN X Y, et al Improved YOLOv5-based radar object detection. Journal of Systems Engineering and Electronics, 2025, 36 (4): 932- 939.
|
| 33 |
DU H, WANG W, WANG X R, et al Scene image recognition with knowledge transfer for drone navigation. Journal of Systems Engineering and Electronics, 2023, 34 (5): 1309- 1318.
|