Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (1): 236-246.doi: 10.23919/JSEE.2023.000010

• RELIABILITY • Previous Articles     Next Articles

Bug localization based on syntactical and semantic information of source code

Xuefeng YAN1,2,*(), Shasha CHENG1(), Liqin GUO3()   

  1. 1 College of Computer Science Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
    2 Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China
    3 State Key Laboratory of Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing 100854, China
  • Received:2021-02-22 Online:2023-02-18 Published:2023-03-03
  • Contact: Xuefeng YAN E-mail:yxf@nuaa.edu.cn;shasha@nuaa.edu.cn;guo_96@163.com
  • About author:
    YAN Xuefeng was born in 1975. He received his Ph.D. degree from Beijing Institute of Technology, China, in 2005. He is currently a professor with the College of Computer Science and Technology at Nanjing University of Aeronautics and Astronautics. His research interests include intelligent modeling, big data analysis, model based system engineering, complex system modeling and simulation theory and method. E-mail: yxf@nuaa.edu.cn

    CHENG Shasha was born in 1996. She received her B.S. degree in computer science technology from Anhui University of Technology in 2018. Now she is pursuing her M.S. degree at Nanjing University of Aeronautics and Astronautics, China. Her research interests are bug localization, system engineering, and big data analysis. E-mail: shasha@nuaa.edu.cn

    GUO Liqin was born in 1976. She is currently a senior engineer at State Key Laboratory of Intelligent Manufacturing System Technology. Her research interests include intelligent manufacture, complex system modeling and simulation theory and method. E-mail: guo_96@163.com
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018YFB1702700).

Abstract:

The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree (AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level. Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore, the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank (MRR), mean average precision (MAP) and Top N Rank.

Key words: bug report, abstract syntax tree, code representation, software bug localization