A Wavelet Transform and Spatial Positional Enhanced Method for Vision Transformer
Runyu HU , Xuesong TANG , Kuangrong HAO
Journal of Donghua University(English Edition) ›› 2025, Vol. 42 ›› Issue (3) : 330 -338.
A Wavelet Transform and Spatial Positional Enhanced Method for Vision Transformer
In the vision transformer(ViT) architecture, image data are transformed into sequential data for processing, which may result in the loss of spatial positional information. While the self-attention mechanism enhances the capacity of ViT to capture global features, it compromises the preservation of fine-grained local feature information. To address these challenges, we propose a spatial positional enhancement module and a wavelet transform enhancement module tailored for ViT models. These modules aim to reduce spatial positional information loss during the patch embedding process and enhance the model's feature extraction capabilities. The spatial positional enhancement module reinforces spatial information in sequential data through convolutional operations and multi-scale feature extraction. Meanwhile, the wavelet transform enhancement module utilizes the multi-scale analysis and frequency decomposition to improve the ViT's understanding of global and local image structures. This enhancement also improves the ViT's ability to process complex structures and intricate image details. Experiments on CIFAR-10, CIFAR-100 and ImageNet-1k datasets are done to compare the proposed method with advanced classification methods. The results show that the proposed model achieves a higher classification accuracy, confirming its effectiveness and competitive advantage.
transformer / wavelet transform / image classification / computer vision
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
National Natural Science Foundation of China(62176052)
/
| 〈 |
|
〉 |