Striking the mantissa: how few bits are enough for accurate DNN inference?

Adv search

Striking the mantissa: how few bits are enough for accurate DNN inference?

Zhiyuan ZHANG , Ping ZHANG , Zhihua FAN , Wenming LI , Xiaochun YE , Xuejun AN

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (5) : 2105105

PDF (2853KB)

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (5) :2105105 DOI: 10.1007/s11704-025-51210-5

Architecture

LETTER

Striking the mantissa: how few bits are enough for accurate DNN inference?

Author information +

History +

PDF (2853KB)

Graphical abstract

Cite this article

Download citation ▾

Zhiyuan ZHANG, Ping ZHANG, Zhihua FAN, Wenming LI, Xiaochun YE, Xuejun AN. Striking the mantissa: how few bits are enough for accurate DNN inference?. Front. Comput. Sci., 2027, 21(5): 2105105 DOI:10.1007/s11704-025-51210-5

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000–6010

[2]	Yu J, Prabhu K, Urman Y, Radway R M, Han E, Raina P. 8-bit transformer inference and fine-tuning for edge accelerators. In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 2024, 5–21

[3]	Burgess N, Milanovic J, Stephens N, Monachopoulos K, Mansell D. Bfloat16 processing for neural networks. In: Proceedings of the 26th IEEE Symposium on Computer Arithmetic (ARITH). 2019, 88–91

[4]	NVIDIA Corporation. NVIDIA Hopper architecture. see nvidia.com/en-us/data-center/technologies/hopper-architecture/ website

RIGHTS & PERMISSIONS

Higher Education Press

PDF (2853KB)

Part of a collection:

Supplementary files

316

Accesses

0

Citation

Detail

Sections

Recommended

/

〈

〉