Journal of Systems Engineering and Electronics ›› 2024, Vol. 35 ›› Issue (2): 294-301.doi: 10.23919/JSEE.2023.000110

• ELECTRONICS TECHNOLOGY • Previous Articles    

Sound event localization and detection based on deep learning

Dada ZHAO1,2(), Kai DING2(), Xiaogang QI1,*(), Yu CHEN2(), Hailin FENG1()   

  1. 1 School of Mathematics and Statistics, Xidian University, Xi’an 710071, China
    2 Science and Technology on Near-Surface Detection Laboratory, Wuxi 214035, China
  • Received:2021-06-06 Accepted:2023-02-24 Online:2024-04-18 Published:2024-04-18
  • Contact: Xiaogang QI E-mail:ddzhao@stu.xidian.edu.cn;winfast113@sina.com;xgqi@xidian.edu.cn;cy0520tool@sohu.com;hlfeng@xidian.edu.cn
  • About author:
    ZHAO Dada was born in 1997. He received his B.S. degree in statistics from the School of Mathematical Sciences, Shanxi University, Taiyuan, China, in 2015. He is pursuing his M.S. degree in statistics with Xidian University. His research interests are signal processing and acoustic source localization. E-mail: ddzhao@stu.xidian.edu.cn

    DING Kai was born in 1983. He received his Ph.D. degree in weapon science and technology from Army Engineering University of the PLA in 2013. He is an engineer in the Science and Technology on Near-Surface Detection Laboratory. His current research interests include passive target recognition and intelligent network. E-mail: winfast113@sina.com

    QI Xiaogang was born in 1973. He is a professor and Ph.D. supervisor in the School of Mathematics and Statistics, Xidian University. He received his Ph.D. degree in applied mathematics from Xidian University in 2005 where he joined as a faculty member in 2002. He became an associate professor in 2006. From September 2012 to August 2013, he is a visiting scholar in the School of Electrical, Computer and Energy Engineering of Arizona State University. His research interests include system modeling and simulation, resource management and scheduling, performance evaluation and optimization algorithm design, and fault diagnosis in various networks. E-mail: xgqi@xidian.edu.cn

    CHEN Yu was born in 1980. He received his M.S. degree in physical electronics from National University of Defense Technology in 2005. He is an assistant researcher in the Science and Technology on Near-Surface Detection Laboratory. His current research interests include passive target recognition and intelligent network. E-mail: cy0520tool@sohu.com

    FENG Hailin was born in 1966. She received her B.S. degree in mathematics from Yan’an University, Yan’an, China, in 1988, and M.S. and Ph.D. degrees in applied mathematics from Xidian University, Xi’an, China, in 1991 and 2004, respectively. She is a professor in the School of Mathematics and Statistics, Xidian University. Her current research interests include system reliability modeling and survival data analysis. E-mail: hlfeng@xidian.edu.cn
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61877067), the Foundation of Science and Technology on Near-Surface Detection Laboratory (TCGZ2019A002; TCGZ2021C003;6142414200511), and the Natural Science Basic Research Program of Shaanxi (2021JZ-19).

Abstract:

Acoustic source localization (ASL) and sound event detection (SED) are two widely pursued independent research fields. In recent years, in order to achieve a more complete spatial and temporal representation of sound field, sound event localization and detection (SELD) has become a very active research topic. This paper presents a deep learning-based multi-overlapping sound event localization and detection algorithm in three-dimensional space. Log-Mel spectrum and generalized cross-correlation spectrum are joined together in channel dimension as input features. These features are classified and regressed in parallel after training by a neural network to obtain sound recognition and localization results respectively. The channel attention mechanism is also introduced in the network to selectively enhance the features containing essential information and suppress the useless features. Finally, a thourough comparison confirms the efficiency and effectiveness of the proposed SELD algorithm. Field experiments show that the proposed algorithm is robust to reverberation and environment and can achieve higher recognition and localization accuracy compared with the baseline method.

Key words: sound event localization and detection (SELD), deep learning, convolutional recursive neural network (CRNN), channel attention mechanism