Striking the mantissa: how few bits are enough for accurate DNN inference?

Zhiyuan ZHANG , Ping ZHANG , Zhihua FAN , Wenming LI , Xiaochun YE , Xuejun AN

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (5) : 2105105

PDF (2853KB)
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (5) :2105105 DOI: 10.1007/s11704-025-51210-5
Architecture
LETTER
Striking the mantissa: how few bits are enough for accurate DNN inference?
Author information +
History +
PDF (2853KB)

Graphical abstract

Cite this article

Download citation ▾
Zhiyuan ZHANG, Ping ZHANG, Zhihua FAN, Wenming LI, Xiaochun YE, Xuejun AN. Striking the mantissa: how few bits are enough for accurate DNN inference?. Front. Comput. Sci., 2027, 21(5): 2105105 DOI:10.1007/s11704-025-51210-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010

[2]

Yu J, Prabhu K, Urman Y, Radway R M, Han E, Raina P. 8-bit transformer inference and fine-tuning for edge accelerators. In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 2024, 5–21

[3]

Burgess N, Milanovic J, Stephens N, Monachopoulos K, Mansell D. Bfloat16 processing for neural networks. In: Proceedings of the 26th IEEE Symposium on Computer Arithmetic (ARITH). 2019, 88–91

[4]

NVIDIA Corporation. NVIDIA Hopper architecture. see nvidia.com/en-us/data-center/technologies/hopper-architecture/ website

RIGHTS & PERMISSIONS

Higher Education Press

PDF (2853KB)

Supplementary files

Highlights

316

Accesses

0

Citation

Detail

Sections
Recommended

/