Journal of Systems Engineering and Electronics ›› 2026, Vol. 37 ›› Issue (3): 767-778.doi: 10.23919/JSEE.2026.000066

• CROSS-DOMAIN ELECTROMAGNETIC PERCEPTION AND COMMUNICATION & NETWORKING TECHNOLOGY (PART I) • Previous Articles     Next Articles

RF-IRSynNet: cross-modal radio frequency-infrared fusion for robust UAV recognition

Yongsheng DUAN(), Junning ZHANG(), Lei XUE(), Ying XU()   

  • Received:2025-12-25 Accepted:2026-03-22 Online:2026-06-18 Published:2026-06-29
  • Contact: Junning ZHANG E-mail:406810103@qq.com;zjn20101796@sina.cn;eeixuelei@163.com;eeixuying@163.com
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (62201602).

Abstract:

The rapid proliferation of unmanned aerial vehicles (UAVs) has increasingly posed significant challenges for airspace security, particularly under long-range and visually degraded conditions. Effective UAV recognition is thus critical, yet current methodologies typically depend on single-sensor inputs, such as infrared (IR) imaging and radio frequency (RF) analysis, which suffer inherent limitations in complex environments. Although multimodal sensing has been explored in UAV detection, the joint exploitation of IR imagery and RF signals for UAV type recognition remains largely underexplored. The structural heterogeneity between IR and RF features presents challenges for joint representation and decision-making, which remains underexplored in previous work. To address this gap, this paper proposes RF-IRSynNet, a multimodal UAV classification framework that integrates IR imagery and in-flight RF emissions to enhance recognition performance. In RF-IRSynNet, IR images are processed using YOLOv11 to detect UAV candidates and extract structured semantic features. Meanwhile, RF signals are modeled using reservoir computing, which efficiently encodes temporal and spectral dynamics via feature sequences. These modalities are fused through an adaptive confidence-weighted soft-voting strategy, dynamically balancing their contributions based on specific tasks. Experimental results demonstrate that RF-IRSynNet outperforms both unimodal baselines and existing multimodal approaches, achieving robust classification at long ranges. The framework maintains high accuracy even with reduced training data, indicating high efficiency for real-world UAV monitoring.

Key words: unmanned aerial vehicle (UAV) detection, multimodal, reservoir computing, YOLO, kernel canonical correlation analysis