BAFT: bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelism
{{custom_citation.content}}
|
Linked article:
BAFT: bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelismRunzhe CHEN, Guandong LU, Yakai WANG, Rui ZHANG, Zheng HU, Yanming MIAO, Zhifang CAI, Jingwen LENG, Minyi GUO
Front. Comput. Sci.. 2025, Vol.19(1): 191102
Accesses
Citations
/
〈 |
|
〉 |