Weakly supervised semantic segmentation (WSSS) is a tricky task, which only provides category information for segmentation prediction. Thus, the key stage of WSSS is to generate the pseudo labels. For convolutional neural network (CNN) based methods, in which class activation mapping (CAM) is proposed to obtain the pseudo labels, and only concentrates on the most discriminative parts. Recently, transformer-based methods utilize attention map from the multi-headed self-attention (MHSA) module to predict pseudo labels, which usually contain obvious background noise and incoherent object area. To solve the above problems, we use the Conformer as our backbone, which is a parallel network based on convolutional neural network (CNN) and Transformer. The two branches generate pseudo labels and refine them independently, and can effectively combine the advantages of CNN and Transformer. However, the parallel structure is not close enough in the information communication. Thus, parallel structure can result in poor details about pseudo labels, and the background noise still exists. To alleviate this problem, we propose enhancing convolution CAM (ECCAM) model, which have three improved modules based on enhancing convolution, including deeper stem (DStem), convolutional feed-forward network (CFFN) and feature coupling unit with convolution (FCUConv). The ECCAM could make Conformer have tighter interaction between CNN and Transformer branches. After experimental verification, the improved modules we propose can help the network perceive more local information from images, making the final segmentation results more refined. Compared with similar architecture, our modules greatly improve the semantic segmentation performance and achieve 70.2% mean intersection over union(mIoU) on the PASCAL VOC 2012 dataset.