Journal of Systems Engineering and Electronics ›› 2019, Vol. 30 ›› Issue (1): 1-12.doi: 10.21629/JSEE.2019.01.01
收稿日期:
2018-05-08
出版日期:
2019-02-27
发布日期:
2019-02-26
Baojun ZHAO1,2(), Boya ZHAO1,2(
), Linbo TANG1,2,*(
), Wenzheng WANG1,2(
), Chen WU1,2(
)
Received:
2018-05-08
Online:
2019-02-27
Published:
2019-02-26
Contact:
Linbo TANG
E-mail:zbj@bit.edu.cn;zhaoboya@bit.edu.cn;tanglinbo@bit.edu.cn;wwz@bit.edu.cn;wuchen@gmail.com
About author:
ZHAO Baojun was born in 1960. He received his Ph.D. degree in electromagnetic measurement technology and equipment from Harbin Institute of Technology (HIT), Harbin, China, in 1996. From 1996 to 1998, he was a postdoctoral fellow at Beijing Institute of Technology (BIT), Beijing, China. Since 1998, he has been engaged in teaching and research work at Radar Research Laboratory, BIT. His main research interests include image/video coding, image recognition, infrared/laser signal processing, and parallel signal processing. E-mail:Supported by:
. [J]. Journal of Systems Engineering and Electronics, 2019, 30(1): 1-12.
Baojun ZHAO, Boya ZHAO, Linbo TANG, Wenzheng WANG, Chen WU. Multi-scale object detection by top-down and bottom-up feature pyramid network[J]. Journal of Systems Engineering and Electronics, 2019, 30(1): 1-12.
"
Feature | Height | Width | Number |
Conv3_3 | 0.035 4, 0.028 9, 0.070 7 | 0.070 7, 0.086 6, 0.035 4 | 33 750 |
0.866 0, 0.050 0, 0.070 7 | 0.028 9, 0.050 0, 0.070 7 | ||
Conv4_3 | 0.070 7, 0.057 7, 0.141 4 | 0.141 4, 0.173 2, 0.070 7 | 8 864 |
0.173 2, 0.100 0, 0.141 4 | 0.057 7, 0.100 0, 0.141 4 | ||
Conv6_2 | 0.141 4, 0.115 5, 0.282 8 | 0.282 8, 0.346 4, 0.141 4 | 2 166 |
0.346 4, 0.200 0, 0.278 4 | 0.115 5, 0.200 0, 0.278 4 | ||
Conv7_2 | 0.274 0, 0.223 7, 0.548 0 | 0.548 0, 0.671 2, 0.274 0 | 600 |
0.671 2, 0.387 5, 0.472 0 | 0.223 7, 0.387 5, 0.472 0 | ||
Conv8_2 | 0.406 6, 0.332 0, 0.813 20.995 9, 0.575 0, 0.662 1 | 0.813 2, 0.995 9, 0.406 60.332 0, 0.575 0, 0.662 11.078 3, 1.320 7, 0.539 2 | 150 |
Conv9_2 | 0.539 2, 0.440 2, 1.078 3 | 1.078 3, 1.320 7, 0.539 2 | 54 |
1.320 7, 0.762 5, 0.851 1 | 0.440 2, 0.762 5, 0.851 1 | ||
Conv10_2 | 0.671 8, 0.548 5, 1.343 5 | 1.343 5, 1.645 4, 0.671 8 | 6 |
1.645 4, 0.950 0, 1.039 5 | 0.548 5, 0.950 0, 1.039 5 |
"
Feature | Confidence kernel | Location kernel %number |
Conv3_3 | ||
Conv4_3 | ||
Conv6_2 | ||
Conv7_2 | ||
Conv8_2 | ||
Conv9_2 | ||
Conv10_2 |
"
Class | Faster R-CNN | ION | RFCN | SSD 300 | MR-CNN | TDBU-FPN |
Aeroplane | 76.5 | 79.2 | 79.0 | 81.0 | 80.3 | 82.6 |
Bicycle | 79.0 | 83.1 | 80.3 | 84.2 | 84.1 | 84.5 |
Bird | 70.9 | 77.6 | 76.6 | 76.7 | 78.5 | 78.6 |
Boat | 65.5 | 65.6 | 67.0 | 72.1 | 70.8 | 75.9 |
Bottle | 52.1 | 54.9 | 63.7 | 51.7 | 68.5 | 61.5 |
Bus | 83.1 | 85.4 | 84.8 | 86.1 | 88.0 | 85.5 |
Car | 84.7 | 85.1 | 85.6 | 86.1 | 85.9 | 86.9 |
Cat | 86.4 | 87.0 | 89.1 | 85.0 | 87.8 | 85.5 |
Chair | 52.0 | 54.4 | 62.2 | 63.0 | 60.3 | 64.1 |
Cow | 81.9 | 80.6 | 85.3 | 82.0 | 85.2 | 81.6 |
Dining_table | 65.7 | 73.8 | 67.9 | 76.9 | 73.7 | 78.1 |
Dog | 84.8 | 85.3 | 87.3 | 85.5 | 87.2 | 86.7 |
Horse | 84.6 | 82.2 | 86.6 | 87.3 | 86.5 | 88.5 |
Motorbike | 77.5 | 82.2 | 82.8 | 84.8 | 85.0 | 85.2 |
Person | 76.7 | 74.4 | 79.0 | 78.8 | 76.4 | 50.1 |
Potted_plant | 38.8 | 47.1 | 51.0 | 50.4 | 48.5 | 58.1 |
Sheep | 73.6 | 75.8 | 77.6 | 77.2 | 76.3 | 78.8 |
Sofa | 73.9 | 72.7 | 75.2 | 80.2 | 75.5 | 78.7 |
Tvmonitor | 83.0 | 84.2 | 83.5 | 87.6 | 85.0 | 88.5 |
Train | 72.6 | 80.4 | 76.5 | 76.2 | 81.0 | 77.8 |
mAP | 73.2 | 75.6 | 77.0 | 77.2 | 78.2 | 79.0 |
1 | LOWE D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60 (2): 91- 110. |
2 | DALAL N, TRIGGS B. Histograms of oriented gradients for human detection. Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, 886- 893. |
3 | RÄTSCH G, ONODA T, MÜLLER K R. Soft margins for Ad-aBoost. Machine Learning, 2001, 42 (3): 287- 320. |
4 | BREIMAN L. Random forests. Machine Learning, 2001, 45 (1): 5- 32. |
5 |
SUYKENS J A K, VANDEWALLE J. Least squares support vector machine classifiers. Neural Processing Letters, 1999, 9 (3): 293- 300.
doi: 10.1023/A:1018628609742 |
6 | FELZENSZWALB P, MCALLESTER D, RAMANAN D. A discriminatively trained, multiscale, deformable part model. Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition, 2008, 1- 8. |
7 | ZITNICK C L, DOLLÁR P. Edge boxes:locating object proposals from edges. Proc. of the European Conference on Computer Vision, 2014, 391- 405. |
8 | UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition. International Journal of Computer Vision, 2013, 104 (2): 154- 171. |
9 | RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115 (3): 211- 252. |
10 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks. Proc. of the Advances in Neural Information Processing Systems, 2012, 1097- 1105. |
11 |
LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989, 1 (4): 541- 551.
doi: 10.1162/neco.1989.1.4.541 |
12 |
RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors. Nature, 1986, 323 (6088): 533.
doi: 10.1038/323533a0 |
13 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 580- 587. |
14 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition. Proc. of the International Conference on Learning Representations, 2015, 1- 14. |
15 | GIRSHICK R. Fast R-CNN. Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition, 2015, 1440- 1448. |
16 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks. Proc. of the Advances in Neural Information Processing Systems, 2015, 91- 99. |
17 | REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once:unified, real-time object detection. Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition, 2016, 779- 788. |
18 | LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector. Proc. of the European Conference on Computer Vision, 2016, 21- 37. |
19 | FU C Y, LIU W, RANGA A, et al. DSSD:deconvolutional single shot detector. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1- 11. |
20 | SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1- 9. |
21 | BELL S, LAWRENCE ZITNICK C, BALA K, et al. Insideoutside net:detecting objects in context with skip pooling and recurrent neural networks. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 2874- 2883. |
22 | DAI J, LI Y, HE K, et al. R-FCN:object detection via regionbased fully convolutional networks. Proc. of the Advances in Neural Information Processing Systems, 2016, 379- 387. |
23 | HONG S, ROH B, KIM K H, et al. PVANet:lightweight deep neural networks for real-time object detection. Proc. of the Conference and Workshop on Neural Information Processing Systems, 2016, 1- 7. |
24 | LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 2117- 2125. |
25 | NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines. Proc. of the 27th International Conference on Machine Learning, 2010, 807- 814. |
26 | IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift. Proc. of the International Conference on Machine Learning, 2015, 448- 456. |
27 | HU P, RAMANAN D. Finding tiny faces. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1522- 1530. |
28 | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation. IEEE Trans. on Pattern Analysis & Machine Intelligence, 2014, 39 (4): 640- 651. |
29 | ERHAN D, SZEGEDY C, TOSHEV A, et al. Scalable object detection using deep neural networks. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 2147- 2154. |
30 | SZEGEDY C, REED S, ERHAN D, et al. Scalable, highquality object detection. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 1- 8. |
31 | SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 761- 769. |
32 |
HUBER P J. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 1964, 35 (1): 73- 101.
doi: 10.1214/aoms/1177703732 |
33 | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88 (2): 303- 338. |
34 | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions. Proc. of the International Conference on Learning Recognition, 2016, 1- 13. |
35 | GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks. Proc. of the 13th International Conference on Artificial Intelligence and Statistics, 2010, 249- 256. |
36 | HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN. Proc. of the IEEE International Conference on Computer Vision, 2017, 2980- 2988. |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||