
Journal of Systems Engineering and Electronics ›› 2026, Vol. 37 ›› Issue (1): 84-93.doi: 10.23919/JSEE.2025.000165
• ELECTRONICS TECHNOLOGY • Previous Articles Next Articles
Yu LIU, Diaoyin TAN(
), Wen ZHOU(
), Huaxin XIAO(
)
Received:2022-12-24
Accepted:2025-11-08
Online:2026-02-18
Published:2026-03-09
Contact:
Huaxin XIAO
E-mail:704985427@qq.com;zhouwen@nudt.edu.cn;xiaohuaxin@nudt.edu.cn
About author:Yu LIU, Diaoyin TAN, Wen ZHOU, Huaxin XIAO. Enhancing convolution for Transformer-based weakly supervised semantic segmentation[J]. Journal of Systems Engineering and Electronics, 2026, 37(1): 84-93.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
Table 1
Comparison with state-of-the-art methods on PASCAL VOC val set in terms of mIoU %"
| Method | Backbone | Supervision | Validation |
| IRNCVPR19[ | ResNet50 | I | 63.5 |
| SEAMCVPR20[ | ResNet38 | I | 64.5 |
| CONTANIPS20[ | ResNet38 | I | 66.1 |
| ICDCVPR20[ | ResNet101 | I+S | 67.8 |
| AdvCAMCVPR21[ | ResNet101 | I | 68.1 |
| CPNCVPR21[ | ResNet38 | I | 67.8 |
| AuxSegNetICCV21[ | ResNet38 | I+S | 69.0 |
| URNAAAI22[ | ResNet38 | I | 69.4 |
| TransCAM* | ResNet38 | I | 68.3 |
| ECCAM | ResNet38 | I | 70.2(+1.9) |
| 1 | LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431–3440. |
| 2 | CHEN L C, PAPANDREOU G, KOKKINOS I, et al Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2017, 40 (4): 834- 848. |
| 3 | ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 2881–2890. |
| 4 | HE J J, DENG Z Y, ZHOU L, et al. Adaptive pyramid context network for semantic segmentation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 7519–7528. |
| 5 | ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 2921–2929. |
| 6 | AHN J, CHO S, KWAK S. Weakly supervised learning of instance segmentation with inter-pixel relations. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2209–2218. |
| 7 | WANG Y D, ZHANG J, KAN M, et al. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 12275–12284. |
| 8 | WEI Y C, LIANG X D, CHEN Y P, et al Stc: a simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2016, 39 (11): 2314- 2320. |
| 9 | LEE J, KIM E, LEE S, et al. Ficklenet: weakly and semi-supervised semantic image segmentation using stochastic inference. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5267–5276. |
| 10 | KIRILLOV A , GIRSHICK R , HE K ,et al.Panoptic feature pyramid networks.IEEE, 2019. DOI: 10.1109/CVPR.2019.00656. |
| 11 | ZHOU T , ZHANG M , ZHAO F ,et al.Regional semantic contrast and aggregation for weakly supervised semantic segmentation. 2022.DOI:10.48550/arXiv.2203.09653. |
| 12 | WEI Y C, FENG J S, LIANG X D, et al. Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1568–1576. |
| 13 | HOU Q B, JIANG P T, WEI Y C, et al. Self-erasing network for integral object attention. Advances in Neural Information Processing Systems, 2018. DOI:10.48550/arXiv.1810.09821. |
| 14 | JIANG P T, HOU Q B, CAO Y, et al. Integral object mining via online attention accumulation. Proc. of the IEEE/CVF International Conference on Computer Vision, 2019: 2070–2079. |
| 15 | ZHANG F, GU C C, ZHANG C Y, et al. Complementary patch for weakly supervised semantic segmentation. Proc. of the IEEE/CVF International Conference on Computer Vision, 2021: 7242–7251. |
| 16 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need. https://arxiv.org/abs/1706.03762. |
| 17 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale. https://arxiv.org/abs/2010.11929. |
| 18 | LI R, MAI Z, TRABELSI C, et al. TransCAM: transformer attention-based CAM refinement for weakly supervised semantic segmentation. Journal of Visual Communication and Image Representation. 2023, 92: 103800. |
| 19 | XU L, OUYANG W, BENNAMOUN M, et al. Multi-class token Transformer for weakly supervised semantic segmentation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 4310–4319. |
| 20 | RU L X, ZHAN Y B, YU B S, et al. Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with Transformers. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16846–16855. |
| 21 | PENG Z L, HUANG W, GU S Z, et al. Conformer: local features coupling global representations for visual recognition. Proc. of the IEEE/CVF International Conference on Computer Vision, 2021: 367–376. |
| 22 | HOU Q B, JIANG P T, WEI Y C, et al. Self-erasing network for integral object attention. https://arxiv.org/abs/1810.09821. |
| 23 | ZHANG X L, WEI Y C, FENG J S, et al. Adversarial complementary learning for weakly supervised object localization. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 1325–1334. |
| 24 | KIM B, HAN S, KIM J. Discriminative region suppression for weakly-supervised semantic segmentation. Proc. of the AAAI Conference on Artificial Intelligence, 2021: 1754–1761. |
| 25 | GAO W, WAN F, PAN X J, et al. TS-CAM: token semantic coupled attention map for weakly supervised object localization. Proc. of the IEEE/CVF International Conference on Computer Vision, 2021: 2886–2895. |
| 26 |
CHEN Z W, WANG C G, WANG Y B, et al LCTR: on awakening the local continuity of transformer for weakly supervised object localization. Proc. of the AAAI Conference on Artificial Intelligence, 2022, 36 (1): 410- 418.
doi: 10.1609/aaai.v36i1.19918 |
| 27 | SRINIVAS A, LIN T Y, PARMAR N, et al. Bottleneck transformers for visual recognition. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 16519–16529. |
| 28 | ZHAO Y C, WANG G T, TANG C X, et al. A battle of network structures: an empirical study of CNN, transformer, and MLP. https://arxiv.org/abs/2108.13002. |
| 29 | GUO J Y, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12175–12185. |
| 30 | LI Y H, YAO T, PAN Y W, et al Contextual transformer networks for visual recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2023, 45 (2): 1489- 1500. |
| 31 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778. |
| 32 | HENDRYCKS D, GIMPEL K. Gaussian error linear units (GELUs). https://arxiv.org/abs/1606.08415. |
| 33 | SIFRE L, MALLAT S. Rigid-motion scattering for texture classification. https://arxiv.org/abs/1403.1687. |
| 34 |
EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88 (2): 303- 338.
doi: 10.1007/s11263-009-0275-4 |
| 35 | HARIHARAN B, ARBELAEZ P, BOURDEV L, et al. Semantic contours from inverse detectors. Proc. of the International Conference on Computer Vision, 2011: 991–998. |
| 36 |
RUSSAKOVSKY O, DENG J, SU H, et al Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115 (3): 211- 252.
doi: 10.1007/s11263-015-0816-y |
| 37 | LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization. https://arxiv.org/abs/1711.05101. |
| 38 | AHN J, KWAK S. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 4981–4990. |
| 39 | KRAHENBUHL P, KOLTUN V. Efficient inference in fully connected CRFs with Gaussian edge potentials. Proc. of the 25th International Conference on Neural Information Processing Systems, 2011: 109−117. |
| 40 | OH S J, BENENSON R, KHOREVA A, et al. Exploiting saliency for object segmentation from image level labels. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 5038–5047. |
| 41 | ZHANG D, ZHANG H W, TANG J H, et al Causal intervention for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems, 2020, 33, 655- 666. |
| 42 | FAN J S, ZHANG Z X, SONG C F, et al. Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 4283–4292. |
| 43 | LEE J, KIM E, YOON S. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 4071–4080. |
| 44 | XU L, OUYANG W, BENNAMOUN M, et al. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. Proc. of the IEEE/CVF International Conference on Computer Vision, 2021: 6984–6993. |
| 45 |
LI Y, DUAN Y Q, KUANG Z H, et al Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. Proc. of the AAAI Conference on Artificial Intelligence, 2022, 1447- 1455.
doi: 10.1609/aaai.v36i2.20034 |
| [1] | Meng SUN, Qingfeng JING, Weizhi ZHONG. Deep residual systolic network for massive MIMO channel estimation by joint training strategies of mixed-SNR and mixed-scenarios [J]. Journal of Systems Engineering and Electronics, 2025, 36(4): 903-913. |
| [2] | Yuxiang XIE, Quanzhi GONG, Xidao LUAN, Jie YAN, Jiahui ZHANG. A survey of fine-grained visual categorization based on deep learning [J]. Journal of Systems Engineering and Electronics, 2024, 35(6): 1337-1356. |
| [3] | Ruihui PENG, Xingrui WU, Guohong WANG, Dianxing SUN, Zhong YANG, Hongwen LI. Intelligent recognition and information extraction of radar complex jamming based on time-frequency features [J]. Journal of Systems Engineering and Electronics, 2024, 35(5): 1148-1166. |
| [4] | Cong XU, Zishu HE, Haicheng LIU. A lightweight false alarm suppression method in heterogeneous change detection [J]. Journal of Systems Engineering and Electronics, 2024, 35(4): 899-905. |
| [5] | Jinyang CHEN, Xuhua WANG, Xian CHEN. Track correlation algorithm based on CNN-LSTM for swarm targets [J]. Journal of Systems Engineering and Electronics, 2024, 35(2): 417-429. |
| [6] | Hao DU, Wei WANG, Xuerao WANG, Jingqiu ZUO, Yuanda WANG. Scene image recognition with knowledge transfer for drone navigation [J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1309-1318. |
| [7] | Chaopeng YU, Wei XIONG, Xiaoqing LI, Lei DONG. Deep convolutional neural network for meteorology target detection in airborne weather radar images [J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1147-1157. |
| [8] | Qihai YAO, Yong WANG, Yixin YANG. Range estimation of few-shot underwater sound source in shallow water based on transfer learning and residual CNN [J]. Journal of Systems Engineering and Electronics, 2023, 34(4): 839-850. |
| [9] | Hao DU, Wei WANG, Xuerao WANG, Yuanda WANG. Autonomous landing scene recognition based on transfer learning for drones [J]. Journal of Systems Engineering and Electronics, 2023, 34(1): 28-35. |
| [10] | Zhengliang ZHU, Degui YANG, Junchao ZHANG, Feng TONG. Dataset of human motion status using IR-UWB through-wall radar [J]. Journal of Systems Engineering and Electronics, 2021, 32(5): 1083-1096. |
| [11] | Chuan LIN, Qing CHANG, Xianxu LI. Uplink NOMA signal transmission with convolutional neural networks approach [J]. Journal of Systems Engineering and Electronics, 2020, 31(5): 890-898. |
| [12] | Wantian WANG, Ziyue TANG, Yichang CHEN, Yongjian SUN. Parity recognition of blade number and manoeuvre intention classification algorithm of rotor target based on micro-Doppler features using CNN [J]. Journal of Systems Engineering and Electronics, 2020, 31(5): 884-889. |
| [13] | Binquan LI, Xiaohui HU. Effective distributed convolutional neural network architecture for remote sensing images target classification with a pre-training approach [J]. Journal of Systems Engineering and Electronics, 2019, 30(2): 238-244. |
| [14] | Baojun ZHAO, Boya ZHAO, Linbo TANG, Wenzheng WANG, Chen WU. Multi-scale object detection by top-down and bottom-up feature pyramid network [J]. Journal of Systems Engineering and Electronics, 2019, 30(1): 1-12. |
| [15] | Jinbo CHEN, Zhiheng WANG, Hengyu LI. Real-time object segmentation based on convolutional neural network with saliency optimization for picking [J]. Journal of Systems Engineering and Electronics, 2018, 29(6): 1300-1307. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||