Journal of Systems Engineering and Electronics ›› 2021, Vol. 32 ›› Issue (2): 389-398.doi: 10.23919/JSEE.2021.000032

• ELECTRONICS TECHNOLOGY • Previous Articles     Next Articles

RFC: a feature selection algorithm for software defect prediction

Xiaolong XU1(), Wen CHEN2(), Xinheng WANG3,*()   

  1. 1 Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
    2 Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications, Yancheng 224000, China
    3 School of Computing and Engineering, University of West London, London W5 5RF, UK
  • Received:2020-01-20 Online:2021-04-29 Published:2021-04-29
  • Contact: Xinheng WANG E-mail:xuxl@njupt.edu.cn;1216043012@njupt.edu.cn;xinheng.wang@uwl.ac.uk
  • About author:|XU Xiaolong was born in 1977. He received his B.S. degree in computer and its applications, M.S. degree in computer software and theories and Ph.D. degree in communications and information systems at Nanjing University of Posts and Telecommunications, Nanjing, China, in 1999, 2002 and 2008, respectively. He worked as a postdoctoral researcher at Station of Electronic Science and Technology, Nanjing University of Posts and Telecommunications from 2011 to 2013. He is currently a professor in College of Computer, Nanjing University of Posts and Telecommunications. He is a senior member of China Computer Federation. His current research interests include cloud computing and big data, mobile computing, intelligent agent and information security. E-mail: xuxl@njupt.edu.cn||CHEN Wen was born in 1994. He received his B.E. degree in computer science and technology from Anhui Engineering University, Wuhu, China, in 2016. He works as an engineer in Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications, Yancheng, China. His research interest is data analysis. E-mail: 1216043012@njupt.edu.cn||WANG Xinheng was born in 1968. He received his B.E. and M.S. degrees in electrical engineering from Xi’an Jiaotong University, Xi’an, China, in 1991 and 1994, respectively, and Ph.D. degree in computing and electronics from Brunel University, Uxbridge, UK, in 2001. He is currently a professor of networks with the School of Computing and Engineering, University of West London, London, UK. His current research interests include wireless networks, Internet of Things, converged indoor positioning, cloud computing, and applications of wireless and computing technologies for health care. E-mail: xinheng.wang@uwl.ac.uk
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2018YFB1003702) and the National Natural Science Foundation of China (62072255);This work was supported by the National Key Research and Development Program of China (2018YFB1003702) and the National Natural Science Foundation of China (62072255)

Abstract:

Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predict defects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defect datasets, we propose ReliefF-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into k clusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defect prediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.

Key words: software defect prediction (SDP), feature selection, cluster