Journal of Systems Engineering and Electronics ›› 2019, Vol. 30 ›› Issue (1): 1-12.doi: 10.21629/JSEE.2019.01.01

• Electronics Technology • Previous Articles     Next Articles

Multi-scale object detection by top-down and bottom-up feature pyramid network

Baojun ZHAO1,2(), Boya ZHAO1,2(), Linbo TANG1,2,*(), Wenzheng WANG1,2(), Chen WU1,2()   

  1. 1 School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
    2 Beijing Key Laboratory of Embedded Real-time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China
  • Received:2018-05-08 Online:2019-02-27 Published:2019-02-26
  • Contact: Linbo TANG E-mail:zbj@bit.edu.cn;zhaoboya@bit.edu.cn;tanglinbo@bit.edu.cn;wwz@bit.edu.cn;wuchen@gmail.com
  • About author:ZHAO Baojun was born in 1960. He received his Ph.D. degree in electromagnetic measurement technology and equipment from Harbin Institute of Technology (HIT), Harbin, China, in 1996. From 1996 to 1998, he was a postdoctoral fellow at Beijing Institute of Technology (BIT), Beijing, China. Since 1998, he has been engaged in teaching and research work at Radar Research Laboratory, BIT. His main research interests include image/video coding, image recognition, infrared/laser signal processing, and parallel signal processing. E-mail:zbj@bit.edu.cn|ZHAO Boya was born in 1990. He received his B.Sc. degree from the School of Electrical Engineering and Information, Hebei University of Technology, Tianjin, China, in 2013. He is currently pursuing his Ph.D. degree with the School of Electrical and Information Engineering, Beijing Institute of Technology, Beijing, China. His current research interests include object detection, object tracking and machine learning. E-mail:zhaoboya@bit.edu.cn|TANG Linbo was born in 1978. He received his B.Sc. degree in resources exploration engineering from Changchun University of Science and Technology, Changchun, Chain. Then, he received his M.Sc. degree in radio physics from China University of Petroleum, Beijing, Chain. At last, he received his Ph.D. degree from the School of Electrical Engineering and Information, Hebei University of Technology, Tianjin, China, in 2005. Since 2005, he has been engaged in teaching and research work at Radar Research Laboratory, Beijing Institute of Technology. He has undertaken 863 and H863 projects. His research interests include image processing and real-time signal processing. E-mail:tanglinbo@bit.edu.cn|WANG Wenzheng was born in 1988. He received his M.Sc. degree from the School of Electrical and Information Engineering, Beijing Institute of Technology, Beijing, China, in 2014. He is currently pursuing his Ph.D. degree with the School of Electrical and Information Engineering, Beijing Institute of Technology, Beijing, China. His current research interests include hyperspectral/optical imagery target detection, feature selection and machine learning. E-mail:wwz@bit.edu.cn|WU Chen was born in 1994. He received his B.Sc. degree from the School of Electrical Engineering and Information, Xidian University, Xi'an, China, in 2017. He is currently pursuing his M.Sc. degree with the School of Electrical and Information Engineering, Beijing Institute of Technology, Beijing, China. His current research interests include object detection and machine learning. E-mail:wuchen@gmail.com
  • Supported by:
    the Program of Introducing Talents of Discipline to Universities (111 Plan) of China(B14010);the National Natural Science Foundation of China(31727901);This work was supported by the Program of Introducing Talents of Discipline to Universities (111 Plan) of China (B14010) and the National Natural Science Foundation of China (31727901)

Abstract:

While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement suboriginal information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.

Key words: convolutional neural network (CNN), feature pyramid network (FPN), object detection, deconvolution