End-to-end congestion control in datacenter networks: a survey

Zejia ZHOU , Shan HUANG , Dezun DONG , Yang BAI , Liquan XIAO

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (5) : 2005501

PDF (1573KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (5) : 2005501 DOI: 10.1007/s11704-025-40212-y
Networks and Communication
REVIEW ARTICLE

End-to-end congestion control in datacenter networks: a survey

Author information +
History +
PDF (1573KB)

Abstract

High-performance data center network infrastructure and hardware equipment are rapidly evolving to provide a high-quality platform for increasingly complex and diverse cloud applications. However, relying solely on equipment upgrades cannot fully alleviate the challenges posed by rapidly changing internal traffic patterns. Efficient collaboration among congestion control, load balancing, flow scheduling, and other technologies is essential to enhance the network traffic transmission performance of data centers. We analyze the challenges in data center network congestion control. We reclassify congestion control protocols from a temporal and spatial perspective and proposed a classification method, named data and credit and end-to-end feedback (DCEF) congestion control framework. We describe the features under each category and summarize them. We also compare the differences in performance, convergence, deployment, and other aspects of some representative congestion control protocols. Finally, we look forward to the future development of congestion control.

Graphical abstract

Keywords

datacenters / congestion control / network protocol

Cite this article

Download citation ▾
Zejia ZHOU, Shan HUANG, Dezun DONG, Yang BAI, Liquan XIAO. End-to-end congestion control in datacenter networks: a survey. Front. Comput. Sci., 2026, 20(5): 2005501 DOI:10.1007/s11704-025-40212-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Hu S H, Bai W, Zeng G X, Wang Z L, Qiao B C, Chen K, Tan K, Wang Y. Aeolus: A building block for proactive transport in datacenters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2020, 422−434

[2]

Montazeri B, Li Y L, Alizadeh M, Ousterhout J. Homa: A receiver-driven low-latency transport protocol using network priorities. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2018, 221–235

[3]

Addanki V, Apostolaki M, Ghobadi M, Schmid S, Vanbever L. ABM: active buffer management in datacenters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2022, 36–52

[4]

Huang S, Dong D Z, Bai W . Congestion control in high-speed lossless data center networks: A survey. Journal of Computer Systems, 2018, 89: 360–374

[5]

Alizadeh M, Greenberg A, Maltz D A, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M. Data center TCP (DCTCP). In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2010, 63–74

[6]

Mittal R, Lam V T, Dukkipati N, Blem E, Wassel H, Ghobadi M, Vahdat A, Wang Y, Wetherall D, Zats D. TIMELY: RTT-based Congestion Control for the Datacenter. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2015, 537–550

[7]

Perry J, Ousterhout A, Balakrishnan H, Shah D, Fugal H. Fastpass: A Centralized "Zero-queue" Datacenter Network. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2014, 307–318

[8]

Cho I, Jang K, Han D. Credit-scheduled delay-bounded congestion control for datacenters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2017, 239–252

[9]

Jiang N, Becker D U, Michelogiannakis G, Dally W J. Network congestion avoidance through speculative reservation. In: Proceeding of IEEE International Symposium on High-Performance Computer Architecture. 2012, 1–12

[10]

Michelogiannakis G, Jiang N, Becker D, Dally W J. Channel reservation protocol for over-subscribed channels and destinations. In: Proceedings of Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 2013, 52

[11]

Gao P X, Narayan A, Kumar G, Agarwal R, Ratnasamy S, Shenker S. phost: Distributed near-optimal datacenter transport over commodity network fabric. In: Proceedings of the ACM Conference on Emerging Networking Experiments and Technologies. 2015, 1–12

[12]

Jiang N, Dennison L, Dally W J. Network endpoint congestion control for fine-grained communication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015, 1–12

[13]

Ruan C, Wang J, Jiang W, Zhang T. Polo: Receiver-driven congestion control for low latency over commodity network fabric. In: Proceedings of the International Conference on Parallel Processing. 2020, 1–10

[14]

Cai Q, Arashloo M T, Agarwal R. dcPIM: Near-optimal proactive datacenter transport. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2022, 53–65

[15]

Huang S, Dong D Z, Zhou Z J, Liao X K . MP-CREDIT: Multi-path credit for high-speed data center transports. In: Computer Networks, 2021, 193( 108061): 1–11

[16]

Handley M, Raiciu C, Agache A, Voinescu A, Moore A W, Antichi G, Wójcik M. Re-architecting datacenter networks and stacks for low latency and high performance. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2017, 29–42

[17]

Vamanan B, Hasan J, Vijaykumar T N. Deadline-aware datacenter tcp (D2TCP). In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2012, 115–126

[18]

Alizadeh M, Yang S, Sharif M, Katti S, McKeown N, Prabhakar B, Shenker S. pFabric: minimal near-optimal datacenter transport. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2013, 435–446

[19]

Bai W, Chen L, Chen K, Han D, Tian C, Wang H. Information-agnostic flow scheduling for commodity data centers. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2015, 1954–1967

[20]

Bai W, Chen L, Chen K, Wu H T. Enabling ECN in multi-service multi-queue data centers. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2016, 537–549

[21]

Wilson C, Ballani H, Karagiannis T, Rowtron A. Better never than late:meeting deadlines in datacenter networks. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2011, 50–61

[22]

Hong C, Caesar M, Godfrey P B. Finishing Flows Quickly with Preemptive Scheduling. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2012, 127–138

[23]

Narayan A, Cangialosi F, Raghavan D, Goyal P, Narayana S, Mittal R, Alizadeh M, Balakrishnan H. Restructuring endpoint congestion control. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2018, 30–43

[24]

Eo J, Niu Z, Cheng W, Yan F Y, Gao R, Kardhashi J, Inglis S, Revow M, Chun B G, Cheng P, , . OpenNetLab: Open platform for RL-based congestion control for real-time communications. In: Proceedings of the Asia-Pacific Workshop on Networking. 2022

[25]

Winstein K, Balakrishnan H. TCP ex Machina: Computer-generated congestion control. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2013, 123–134

[26]

Roy A, Zeng H, Bagga J, Porter G, Snoeren A C. Inside the social network’s (datacenter) network. In: Proceedings of the ACM Conference on Special Interest Group on Data Communication. 2015, 123–137

[27]

Atikoglu B, Xu Y, Frachtenberg E, Jiang S, Paleczny M. Workload analysis of a large-scale key-value store. In: Proceedings of ACM Special Interest Group on Measurement and Evaluation of Computer Systems. 2012, 53–64

[28]

Zats D, Das T, Mohan P, Borthakur D, Katz R. DeTail: reducing the flow completion time tail in datacenter networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2012, 139–150

[29]

Kandula S, Sengupta S, Greenberg A, Patel P, Chaiken R. The nature of data center traffic: measurements \& analysis. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2009, 202–208

[30]

Benson T, Akella A, Maltz D A. Network traffic characteristics of data centers in the wild. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2010, 267–280

[31]

Nishtala R, Fugal H, Grimm S, Kwiatkowski M, Lee H, Li H C, McElroy R, Paleczny M, Peek D, Saab P. Scaling memcache at facebook. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2013, 385–398

[32]

Alizadeh M, Kabbani A, Edsall T, Prabhakar B, Vahdat A, Yasuda M. Less is more: trading a little bandwidth for ultra-low latency in the data center. In: Proceedings of {USENIX} Symposium on Networked Systems Design and Implementation. 2012, 253–266

[33]

Arashloo, Mina Tahmasbi, others. Enabling Programmable Transport Protocols in High-Speed NICs. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2020, 93–109

[34]

Zhou R J, Dong D Z, Huang S, Bai Y. FastTune: timely and precise congestion control in data center network. In: Proceeding of IEEE International Conference on Parallel & Distributed Processing with Applications. 2021, 238–245

[35]

Mills D. Internet delay experiments. RFC889. 1983

[36]

Zhang Y, Ansari N . On architecture design, congestion notification, TCP incast and power consumption in data centers. Journal of IEEE Communications Surveys \& Tutorials, 2012, 15( 1): 39–64

[37]

Wu H, Feng Z, Guo C, Zhang Y. ICTCP: Incast congestion control for TCP in data center networks. In: Proceedings of the International Conference on Emeerging Networking Experiments and Technologies. 2010, 1–12

[38]

Zhang J, Ren F, Shu R, Cheng P. TFC: Token flow control in data center networks. In: Proceedings of the European Conference on Computer Systems. 2016, 1–14

[39]

Kalia A, Kaminsky M, Andersen D. Datacenter RPC can be general and fast. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2019, 1–16

[40]

Zhang Y, Kumar G, Dukkipati N, Wu X, Jha P, Chowdhury M, Vahdat A. Aequitas: admission control for performance-critical RPCs in datacenters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2022, 1–18

[41]

Lee C, Park C, Jang K, Moon S, Han D . DX: Latency-based congestion control for datacenters. Journal of IEEE/ACM Transactions on Networking, 2016, 25( 1): 335–348

[42]

Kumar G, Dukkipati N, Jang K, Wassel H M, Wu X, Montazeri B, Wang Y, Springborn K, Alfeld C, Ryan M, Wetherall D, Vahdat A. Swift: Delay is simple and effective for congestion control in the datacenter. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2020, 514–528

[43]

Singhvi A, Akella A, Gibson D, Wenisch T F, Wong-Chan M, Clark S, Martin M MK, McLaren M, Chandra P, Cauble R, Wassel H M G, Montazeri B, Sabato S L, Scherpelz J. 1RMA: Re-envisioning remote memory access for multi-tenant datacenters. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2020, 708–721

[44]

IEEE.802.11Qau Congestion notification. 2010

[45]

Gao Y, Yang Y, Chen T, Zheng J Q, Mao B, Chen G H. DCQCN+: Taming large-scale incast congestion in rdma over ethernet networks. In: Proceedings of IEEE 26th International Conference on Network Protocols. 2018, 110–120

[46]

Guo C X, Wu H T, Deng Z, Soni G, Ye J X, Padhye J, Lipshteyn M. RDMA over commodity ethernet at scale. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2016, 202–215

[47]

Mittal R, Shpiner A, Panda A, Zahavi E, Krishnamurthy A, Ratnasamy S, Shenker S. Revisiting network support for RDMA. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2018, 313–326

[48]

Hu J, Zeng C, Wang Z, Xu H, Huang J, Chen K. Load Balancing in PFC-Enabled Datacenter Networks. In: Proceedings of the Asia-Pacific Workshop on Networking. 2022

[49]

Qian K, Cheng W, Zhang T, Ren F. Gentle flow control: avoiding deadlock in lossless networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2019, 75–89

[50]

Hu J, Huang J, Lv W, Li W, Wang J, He T. TLB: Traffic-aware load balancing with adaptive granularity in data center networks. In: Proceedings of the International Conference on Parallel Processing. 2019, 1–10

[51]

Goyal P, Shah P, Sharma N K, Alizadeh M, Anderson T E. Backpressure Flow Control. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2022, 779–805

[52]

Zhang Y, Meng Q, Liu Y, Ren F . Revisiting congestion detection in lossless networks. Journal of IEEE/ACM Transactions on Networking, 2023, 31( 5): 313–326

[53]

Zhou Y K, Dong D Z, Pang Z B, Ye J H, Jin F. Fast-Converging Congestion Control in Datacenter Networks. In: Proceeding of IEEE Symposium on Computers and Communications. 2022, 1–7

[54]

Addanki V, Michel O, Schmid S. PowerTCP: Pushing the performance limits of datacenter networks. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2022, 51–70

[55]

Zhu Y, Eran H, Firestone D, Guo C X, Lipshteyn M, Liron Y, Padhye J, Raindel S, Yahia M H, Zhang M. Congestion Control for Large-Scale RDMA Deployments. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2015, 523–536

[56]

Cheng W X, Qian K, Jiang W C, Zhang T, Ren F Y. Re-architecting Congestion Management in Lossless Ethernet. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2020, 19–36

[57]

Li Y L, Miao R, Liu H H, Zhuang Y, Feng F, Tang L, Cao Z, Zhang M, Kelly F, Alizadeh M, Yu M L. HPCC: High precision congestion control. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2019, 44–58

[58]

Cheng P, Ren F, Shu R, Lin C. Catch the whole lot in an action: Rapid precise packet loss notification in data center. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2014, 17–28

[59]

Yuan G Y, Zhou R J, Dong D Z, Huang S. Breaking One-RTT Barrier: Ultra-Precise and Efficient Congestion Control in Datacenter Networks. In: Proceedings of International Conference on Computer Communications and Networks. 2021, 1–9

[60]

Hu S, Bai W, Qiao B, Chen K, Tan K. Augmenting proactive congestion control with aeolus. In: Proceedings of the Asia-Pacific Workshop on Networking. 2018, 22–28

[61]

Bai Y, Dong D Z, Huang S, Zhou Z J, Liao X K. SSP: Speeding up Small Flows for Proactive Transport in Datacenters. In: Proceedings of IEEE International Conference on Cluster Computing. 2020, 153–161

[62]

Zhou Z J, Dong D Z, Huang S, Wei Z H. Expresspass++: Credit-efficient congestion control for data centers. In: Proceedings of IEEE International Conference on Parallel & Distributed Processing with Applications. 2019, 46–52

[63]

Huang S, Dong D Z, Zeng L B, Zhou Z J, Zhou Y K, Liao X K. DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers. In: Proceedings of the International Conference on Parallel Processing. 2022, 1–11

[64]

Chiu D M, Jain R . Analysis of the increase and decrease algorithms for congestion avoidance in computer networks. In: Computer Networks and ISDN Systems, 1989, 17( 1): 1–14

[65]

Zhu Y, Ghobadi M, Misra V, Padhye J. ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY. In: Proceedings of the International on Conference on emerging Networking EXperiments and Technologies. 2016, 313–327

[66]

Zhou R J, Dong D Z, Huang S, Zhou Z J, Bai Y. Taming Congestion and Latency in Low-Diameter High-Performance Datacenters. In: Proceedings of Network and Parallel Computing. 2021, 98–106

[67]

Chen S S, Wang W Y, Canel C, Seshan S, Snoeren A C, Steenkiste P. Time-division TCP for Reconfigurable Data Center Networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2022, 19–35

[68]

Zhou R J, Yuan G Y, Dong D Z, Huang S. APCC: agile and precise congestion control in datacenters. In: Proceedings of IEEE International Conference on Parallel & Distributed Processing with Applications. 2020, 649–656

[69]

Jose L, Yan L, Alizadeh M, Varghese G, McKeown N, Katti S. High Speed Networks Need Proactive Congestion Control. In: Proceedings of the ACM Workshop on Hot Topics in Networks. 2015, 1–7

[70]

Tan L, Su W, Zhang W, Lv J, Zhang Z, Miao J, Liu X, Li N. In-band network telemetry: A survey. In: Computer Networks. 2021, volume 186, pages 107763

[71]

Wang W, Moshref M, Li Y, Kumar G, Ng TS E, Cardwell N, Dukkipati N. Poseidon: Efficient, Robust, and Practical Datacenter CC via Deployable INT. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2023, 255–274

[72]

Arslan S, Li Y, Kumar G, Dukkipati N. Bolt:Sub-RTT Congestion Control for Ultra-Low Latency. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2023, 219–236

[73]

Taheri P, Menikkumbura D, Vanini E, Fahmy S, Eugster P, Edsall T. RoCC: robust congestion control for RDMA. In: Proceedings of the International conference on emerging networking experiments and technologies. 2020, 17–30

[74]

Cao J X, Xia R, Yang P K, Guo C X, Lu G H, Yuan L H, Zheng Y X, Wu H T, Xiong Y Q, Maltz D. Per-packet load-balanced, low-latency routing for clos-based data center networks. In: Proceedings of the International Conference on Emerging Networking Experiments and Technologes. 2013, 49–60

[75]

Gusat M, Crisan D, Minkenberg C, DeCusatis C. R3C2: Reactive Route and Rate Control for CEE. In: Proceedings of IEEE Symposium on High Performance Interconnects. 2010, 50–57

[76]

Raiciu C, Barre S, Pluntke C, Greenhalgh A, Wischik D, Handley M. Improving datacenter performance and robustness with multipath TCP. In: Proceeding of the Conference of the ACM Special Interest Group on Data Communication. 2011, 266–277

[77]

He K, Rozner E, Agarwal K, Gu Y, Felter W, Carter J, Akella A. AC/DC TCP: Virtual congestion control enforcement for datacenter networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2016, 244–257

[78]

Hu D H, Dong D Z, Bai Y, Huang S, Zhou Z J, Wei Z H, Liao X K . Harmonia: Explicit Congestion Notification and Credit-Reservation Transport Converged Congestion Control in Datacenters. Journal of Computer Science and Technology, 2021, 36: 1071–1086

[79]

Hu D H, Bai Y, Dong D Z, Huang S, Liao X K. Converging Credit-based and Reactive Datacenter Transport using ECN and RTT. In: Proceedings of IEEE 22nd International Conference on High Performance Computing and Communications. 2020, 433–440

[80]

Leiserson C E . Fat-trees: Universal networks for hardware-efficient supercomputing. Journal of IEEE transactions on Computers, 1985, 100( 10): 892–901

[81]

Singh A, Ong J, Agarwal A, Anderson Glen, Armistead A, Bannon R, Boving S, Desai G, Felderman B, Germano P, Kanagala A, Provost J, Simmons J, Tanda E, Wanderer J, Holzle U, Stuart S, Vahdat A. Jupiter rising: A decade of clos topologies and centralized control in google’s datacenter network. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2015, 183–197

[82]

Hopps, Christian. Analysis of an equal-cost multi-path algorithm. 2000

[83]

Alizadeh M, Edsall T, Dharmapurikar S, Vaidyanathan R, Chu K, Fingerhut A, Lam V T, Matus F, Pan R, Yadav N, Varghese G. CONGA: Distributed congestion-aware load balancing for datacenters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2014, 503–514

[84]

He K, Rozner E, Agarwal K, Felter W, Carter J, Akella A. Presto: Edge-based load balancing for fast datacenter networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2015, 465–478

[85]

Lu Y W, Chen G, Li B J, Tan K, Xiong Y Q, Cheng P, Zhang J S, Chen, E H, Moscibroda T. Multi-path transport for RDMA in datacenters. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2018, 357–371

[86]

Al-Fares M, Radhakrishnan S, Raghavan B, Huang N, Vahdat A. Hedera: dynamic flow scheduling for data center networks. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2010, 89–92

[87]

Perry J, Balakrishnan H, Shah D. Flowtune: Flowlet Control for Datacenter Networks. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2017, 421–435

[88]

Vanini E, Pan R, Alizadeh M, Taheri P, Edsall T. Let it flow: resilient asymmetric load balancing with flowlet switching. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation. 2017, 407–420

[89]

Ghorbani S, Yang Z, Godfrey P B, Ganjali Y, Firoozshahian A. DRILL: Micro Load Balancing for Low-latency Data Center Networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 2017, 225–238

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1573KB)

Supplementary files

Highlights

332

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/