SFNIC: Hybrid Spatial-Frequency Information for Lightweight Neural Image Compression

Youneng Bao , Wen Tan , Mu Li , Jiacong Chen , Qingyu Mao , Yongsheng Liang

CAAI Transactions on Intelligence Technology ›› 2025, Vol. 10 ›› Issue (6) : 1717 -1730.

PDF (2729KB)
CAAI Transactions on Intelligence Technology ›› 2025, Vol. 10 ›› Issue (6) :1717 -1730. DOI: 10.1049/cit2.70034
ORIGINAL RESEARCH
research-article

SFNIC: Hybrid Spatial-Frequency Information for Lightweight Neural Image Compression

Author information +
History +
PDF (2729KB)

Abstract

Neural image compression (NIC) has shown remarkable rate-distortion (R-D) efficiency. However, the considerable compu-tational and spatial complexity of most NIC methods presents deployment challenges on resource-constrained devices. We introduce a lightweight neural image compression framework designed to efficiently process both local and global information. In this framework, the convolutional branch extracts local information, whereas the frequency domain branch extracts global information. To capture global information without the high computational costs of dense pixel operations, such as attention mechanisms, Fourier transform is employed. This approach allows for the manipulation of global information in the frequency domain. Additionally, we employ feature shift operations as a strategy to acquire large receptive fields without any computa-tional cost, thus circumventing the need for large kernel convolution. Our framework achieves a superior balance between rate-distortion performance and complexity. On varying resolution sets, our method not only achieves rate-distortion (R-D) per-formance on par with versatile video coding (VVC) intra and other state-of-the-art (SOTA) NIC methods but also exhibits the lowest computational requirements, with approximately 200 KMACs/pixel. The code will be available at https://github.com/baoyu2020/SFNIC.

Keywords

deep learning / image coding / neural network / video coding

Cite this article

Download citation ▾
Youneng Bao, Wen Tan, Mu Li, Jiacong Chen, Qingyu Mao, Yongsheng Liang. SFNIC: Hybrid Spatial-Frequency Information for Lightweight Neural Image Compression. CAAI Transactions on Intelligence Technology, 2025, 10(6): 1717-1730 DOI:10.1049/cit2.70034

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Y. Bao, F. Meng, C. Li, S. Ma, Y. Tian, and Y. Liang, “Nonlinear Transforms in Learned Image Compression From a Communication Perspective,” IEEE Transactions on Circuits and Systems for Video Technology 33, no. 4 (2023): 1922-1936, https://doi.org/10.1109/tcsvt.2022.3216713.

[2]

Y. Bao, W. Tan, L. Zheng, F. Meng, W. Liu, and Y. Liang, “Taylor Series Based Dual-Branch Transformation for Learned Image Compression,” Signal Processing 212 (2023): 109128, https://doi.org/10.1016/j.sigpro.2023.109128.

[3]

Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 2020), 7936-7945.

[4]

D. He, Z. Yang, W. Peng, R. Ma,H. Qin, and Y. Wang, “ELIC: Effi-cient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), 5718-5727.

[5]

Y. Hu, W. Yang, Z. Ma, and J. Liu, “Learning End-to-End Lossy Image Compression: A Benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2021): 4194-4211, https://doi.org/10.1109/tpami.2021.3065339.

[6]

G. K. Wallace,“The JPEG Still Picture Compression Standard,” IEEE Transactions on Consumer Electronics 38, no. 1 (1992): xviii-xxxiv, https://doi.org/10.1109/30.125072.

[7]

M. W. Marcellin, M. J. Gormish, B. Ali, and M. P. Boliek, “An Overview of JPEG-2000,” in Proceedings DCC 2000. Data Compression Conference (IEEE, 2000), 523-541.

[8]

G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Trans-actions on Circuits and Systems for Video Technology 22, no. 12 (2012): 1649-1668, https://doi.org/10.1109/tcsvt.2012.2221191.

[9]

B. Bross, Y.-K. Wang, Y. Ye, et al., “Overview of the Versatile Video Coding (VVC) Standard and Its Applications,” IEEE Transactions on Circuits and Systems for Video Technology 31, no. 10 (2021): 3736-3764, https://doi.org/10.1109/tcsvt.2021.3101953.

[10]

J. Ballé, D. Minnen, S. Singh,S. J. Hwang, and N. Johnston, “Variational Image Compression With a Scale Hyperprior,” in 6th International Conference on Learning Representations, ICLR 2018, Van-couver, BC, Canada (2018).

[11]

D. Minnen,J. Ballé and T. George, “Joint Autoregressive and Hi-erarchical Priors for Learned Image Compression,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018,Montréal, Canada 2018), 10794-10803.

[12]

D. Minnen and S. Singh, “Channel-Wise Autoregressive Entropy Models for Learned Image Compression,” in 2020 IEEE International Conference on Image Processing (ICIP) 2020), 3339-3343.

[13]

Y. Xie,K. L. Cheng, and Q. Chen, “Enhanced Invertible Encoding for Learned Image Compression,” in Proceedings of the ACM Interna-tional Conference on Multimedia (2021), 162-170.

[14]

R. Zou,C. Song, and Z. Zhang, “The Devil Is in the Details: Window-Based Attention for Image Compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), 17492-17501.

[15]

M. Lu and Z. Ma, “High-Efficiency Lossy Image Coding Through Adaptive Neighborhood Information Aggregation,” preprint, arXiv:2204. 11448 (2022).

[16]

J. Liu,H. Sun, and J. Katto, “Learned Image Compression With Mixed Transformer-CNN Architectures,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), 1-10.

[17]

W. Guo-Hua, J. Li,B. Li, and Y. Lu, “EVC: Towards Real-Time Neural Image Compression With Mask Decay,” in The Eleventh Inter-national Conference on Learning Representations (2023).

[18]

P. Qin, Y. Bao, F. Meng, et al., “Leveraging Redundancy in Feature for Efficient Learned Image Compression,” in ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024), 3055-3059.

[19]

J. Li, Y. Wang, H. Xie, and K.-K. Ma, “Learning Local and Global Priors for JPEG Image Artifacts Removal,” IEEE Signal Processing Let-ters 27 (2020): 2134-2138, https://doi.org/10.1109/lsp.2020.3039932.

[20]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An Image Is Worth 16x16 Words:Transformers for Image Recognition at Scale,” in Inter-national Conference on Learning Representations (2021).

[21]

F. Li, S. Krivenko, and V. Lukin, “An Approach to Better Portable Graphics (BPG) Compression With Providing a Desired Quality,” in 2020 IEEE 2nd International Conference on Advanced Trends in Infor-mation Theory (ATIT) IEEE, 2020), 13-17.

[22]

S. Ma, Li Zhang, S. Wang, et al., “Evolution of AVS Video Coding Standards: Twenty Years of Innovation and Development,” Science China Information Sciences 65, no. 9 (2022): 192101, https://doi.org/10.1007/s11432-021-3461-9.

[23]

G. K. Wallace, “Overview of the JPEG (ISO/CCITT) Still Image Compression Standard,” in Image Processing Algorithms and Techniques, Vol. 1244 (SPIE, 1990), 220-233.

[24]

M. Rabbani and R. Joshi,“An Overview of the JPEG 2000 Still Image Compression Standard,” Signal Processing: Image Communication 17, no. 1 (2002): 3-48, https://doi.org/10.1016/s0923-5965(01)00024-8.

[25]

A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention Is All You Need,” Advances in Neural Information Processing Systems 30 (2017).

[26]

C. Luo, Y. Bao, T. Wen, C. Li, F. Meng, and Y. Liang, “A Complex-Valued Neural Network Based Robust Image Compression,” in Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (Springer, 2023), 53-64.

[27]

D. He, Y. Zheng, B. Sun,Y. Wang, and H. Qin, “Checkerboard Context Model for Efficient Learned Image Compression,” in Pro-ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 14771-14780.

[28]

Y. Zhu, Y. Yang and T. Cohen, “Transformer-Based Transform Coding in International Conference on Learning Representations (2022).

[29]

E. Dupont, A. Golinski, M. Alizadeh,Y. Whye Teh, and A. Doucet, “COIN: COmpression With Implicit Neural Representations,” in Neural Compression: From Information Theory to Applications-Workshop @ICLR 2021 2021).

[30]

Y. Strümpler, J. Postels, R. Yang, L. V. Gool, and F. Tombari, “Implicit Neural Representations for Image Compression,” in European Conference on Computer Vision (Springer, 2022), 74-91.

[31]

Y. Zhang, T. Van Rozendaal, J. Brehmer, M. Nagel and T. Cohen, “Implicit Neural Video Compression preprint, arXiv:2112. 11312 (2021).

[32]

H. Kim, M. Bauer, L. Theis,J. R. Schwarz, and E. Dupont, “C3:High-Performance and Low-Complexity Neural Compression From a Single Image or Video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024), 9347-9358.

[33]

H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, “Binary Neural Networks: A Survey,” Pattern Recognition 105 (2020): 107281, https://doi.org/10.1016/j.patcog.2020.107281.

[34]

J.-H. Kim J.-H. Choi J. Chang and J. -S. Lee, “Efficient Deep Learning-Based Lossy Image Compression via Asymmetric Autoencoder and Pruning,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, 2020), 2063-2067.

[35]

S. Yin, C. Li, F. Meng, et al., “Exploring Structural Sparsity in Neural Image Compression,” in 2022 IEEE International Conference on Image Processing (ICIP) IEEE, 2022), 471-475.

[36]

A. Luo, H. Sun, J. Liu, and J. Katto, “Memory-Efficient Learned Image Compression With Pruned Hyperprior Module,” in 2022 IEEE International Conference on Image Processing (ICIP) IEEE, 2022), 3061-3065.

[37]

W. Chen and Q. Yang, “Efficient Pruning Method for Learned Lossy Image Compression Models Based on Side Information,” in 2023 IEEE International Conference on Image Processing (ICIP) 2023), 3464-3468.

[38]

J. Ballé,N. Johnston, and D. Minnen, “Integer Networks for Data Compression With Latent-Variable Models,” in International Conference on Learning Representations (2019).

[39]

D. He, Z. Yang, Y. Chen, Q. Zhang, H. Qin, and Y. Wang, “Post-Training Quantization for Cross-Platform Learned Image Compres-sion,” (2022).

[40]

H. Sun, Z. Cheng, M. Takeuchi, and J. Katto, “End-to-End Learned Image Compression With Fixed Point Weight Quantization,” in 2020 IEEE International Conference on Image Processing (ICIP) IEEE, 2020), 3359-3363.

[41]

G.-W. Jeon S. E. Yu and J. -S. Lee, “Integer Quantized Learned Image Compression,” in 2023 IEEE International Conference on Image Processing (ICIP) IEEE, 2023), 2755-2759.

[42]

H. Sun, L. Yu, and J. Katto, “Q-LIC: Quantizing Learned Image Compression With Channel Splitting,” IEEE Transactions on Circuits and Systems for Video Technology (2022): 1.

[43]

J. Shi, M. Lu, and Z. Ma, “Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression,” IEEE Transactions on Circuits and Systems for Video Technology 34 (2023): 3082-3095.

[44]

H. Le, L. Zhang, A. Said, et al., “MobileCodec: Neural Inter-Frame Video Compression on Mobile Devices,” in Proceedings of the 13th ACM Multimedia Systems Conference (2022), 324-330.

[45]

T. Van Rozendaal, T. Singhal, L. Hoang, et al., “MobileNVC: Real-Time 1080p Neural Video Compression on a Mobile Device,” preprint, arXiv:2310.01258 (2023).

[46]

A. G. Howard, M. Zhu, B. Chen, et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” pre-print, arXiv:1704.04861 (2017).

[47]

F. Chollet, “Xception: Deep Learning With Depthwise Separable Convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 1251-1258.

[48]

X. Zhang, X. Zhou,M. Lin, and J. Sun, “ShuffieNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” in Pro-ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), 6848-6856.

[49]

M. Tan and Q. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in International Conference on Ma-chine Learning (PMLR, 2019), 6105-6114.

[50]

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-Level Accuracy With 50x Fewer Pa-rameters and <0.5 MB Model Size,” (2016).

[51]

K. Han, Y. Wang, Q. Tian, J. Guo,C. Xu, and C. Xu, “GhostNet: More Features From Cheap Operations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 1580-1589.

[52]

X. Qin and Z. Wang, “NASNet: A Neuron Attention Stage-By-Stage Net for Single Image Deraining,” preprint, arXiv:1912.03151 (2019).

[53]

J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “CompressAI: A PyTorch Library and Evaluation Platform for End-to-End Compres-sion Research,” preprint, arXiv:2011. 03029 (2020).

Funding

National Natural Science Foundation of China(Grants 62031013)

National Natural Science Foundation of China(62102339)

National Natural Science Foundation of China(62472124)

Guangdong Province Key Construction Discipline Scientific Research Capacity Improvement Project(Grant 2022ZDJS117)

Shenzhen Colleges and Universities Stable Support Programme(Grant GXWD20220811170130002)

Shenzhen Science and Technology Programme(Grant RCBS20221008093121052)

AI Summary AI Mindmap
PDF (2729KB)

41

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/