HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias

Chenglong SUN , Yanqing ZHOU , Qi WANG , Yan ZHANG

Eng Inform Technol Electron Eng ›› 2026, Vol. 27 ›› Issue (5) : 250156

PDF (2980KB)
Eng Inform Technol Electron Eng ›› 2026, Vol. 27 ›› Issue (5) :250156 DOI: 10.1631/ENG.ITEE.2025.0156
Research Article
HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias
Author information +
History +
PDF (2980KB)

Abstract

Three-dimensional network-on-chips (3D NoCs) are increasingly used to improve scalability in multicore systems. Through-silicon via (TSV) is a critical technology for enabling vertical interconnects between NoC layers. However, TSV-based interlayer connections are highly prone to faults resulting from manufacturing defects, aging, or other sources, which compromise system reliability. To address these challenges, particularly in chiplet-based 3D NoCs, robust fault-tolerant mechanisms are crucial for maintaining operational integrity in the presence of TSV faults. We introduce a novel fault-tolerant architecture designed to ensure persistent communication reliability despite permanent vertical link failures, named HyRAS, a hybrid redundancy- and serialization-based method. Our approach is built on two synergistic mechanisms. First, a lightweight spatial redundancy-based scheme leverages shared TSV resources to mitigate the impact of isolated faults. Second, for more severe fault scenarios, an adaptive serialization-based strategy is employed to maintain connectivity by efficiently using the remaining functional links. The architecture is rigorously evaluated through functional simulations using both synthetic traffic patterns and realistic application workloads. Compared to contemporary fault-tolerant methods, HyRAS achieves up to 28.2% higher throughput under realistic workloads with significant defect clusters. These gains are achieved with only modest overhead, incurring a 14.53% increase in area and 8.87% increase in power consumption relative to the standard redundancy-based router.

Keywords

Three-dimensional network-on-chip (3D NoC) / Through-silicon vias (TSVs) / Redundancy / Fault-tolerant

Cite this article

Download citation ▾
Chenglong SUN, Yanqing ZHOU, Qi WANG, Yan ZHANG. HyRAS: a hybrid redundancy- and serialization-based fault-tolerant architecture for through-silicon vias. Eng Inform Technol Electron Eng, 2026, 27(5): 250156 DOI:10.1631/ENG.ITEE.2025.0156

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Agarwal S , Goel K , Sinha M , et al., 2025. Mitigation of phase transitions in self-organizing NoC for stable queueing dynamics. IEEE Trans Comput, 74(2): 623- 636.

[2]

Akbari S , Shafiee A , Fathy M , et al., 2012. AFRA:a low cost high performance reli-able routing for 3D mesh NoCs. Proc Design, Automation & Test in Europe Conf & Exhibition, p.332-337.

[3]

Asadboland M , Mehranzadeh A , Mosleh M , 2025. CTWR:a congestion, temperature and wear-aware routing algorithm for partially-connected 3D network-on-chip. Comput Electr Eng, 124: 110421.

[4]

Bienia C , 2011. Benchmarking Modern Multiprocessors. PhD Dissemination, Princeton University, Princeton, USA.

[5]

Binkert N , Beckmann B , Black G , et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39(2): 1- 7.

[6]

Bose A , Ghosal P , 2020. A low latency energy efficient BFT based 3D NoC design with zone based routing strategy. J Syst Archit, 108: 101738.

[7]

Catania V , Mineo A , Monteleone S , et al., 2017. Cycle-accurate network on chip simu-lation with Noxim. ACM Trans Model Comput Simul, 27(1): 4.

[8]

Chen S , Xu Q , Yu B , 2019. Adaptive 3D-IC TSV fault tolerance structure generation. IEEE Trans Comput-Aided Des Integr Circ Syst, 38(5): 949- 960.

[9]

Dang KN , Ahmed AB , Okuyama Y , et al., 2020a. Scalable design methodology and on-line algorithm for TSV-cluster defects recovery in highly reliable 3D-NoC systems. IEEE Trans Emerg Top Comput, 8(3): 577- 590.

[10]

Dang KN , Ahmed AB , Abdallah AB , et al., 2020b. TSV-OCT:a scalable online multipleTSV defects localization for real-time 3-D-IC systems. IEEE Trans Very Large Scale Integr Syst, 28(3): 672- 685.

[11]

Dang KN , Ahmed AB , Abdallah AB , et al., 2022. HotCluster:a thermal-aware defect recovery method for through-silicon-vias toward reliable 3-D ICs systems. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(4): 799- 812.

[12]

da Silva AA , Nogueira L , Coelho A , et al., 2025. Securet3d:an adaptive, secure, and fault-tolerant aware routing algorithm for vertically-partially connected 3D-NoC. IEEE Trans Very Large Scale Integr Syst, 33 (1): 275- 287.

[13]

Dubois F , Sheibanyrad A , Petrot F , et al., 2013. Elevator-first:a deadlock-free distributed routing algorithm for vertically partially connected 3D-NoCs. IEEE Trans Comput, 62(3): 609- 615.

[14]

Flich J , Duato J , 2008. Logic-based distributed routing for NoCs. IEEE Comput Archit Lett, 7(1): 13- 16.

[15]

Fu YX , Zhang C , Song WQ , et al., 2021. Optimizing vertical link placement and con-gestion aware dynamic elevator assignment for partially connected 3D-NoCs. IEEE Trans Comput-Aided Des Integr Circ Syst, 40(10): 1957- 1970.

[16]

Hou KH , Fan ZW , Zhang SF , et al., 2024. Dimension influence on the interface fatigue char-acteristics of three-dimensional TSV array:a fully coupled thermal-electrical-structural analysis. IEEE Trans Dev Mater Reliab, 24(4): 571- 583.

[17]

Hsieh AC , Hwang T , 2012. TSV redundancy:architecture and design issues in 3-D IC. IEEE Trans Very Large Scale Integr Syst, 20(4): 711- 722.

[18]

Jiang L , Xu Q , Eklow B , 2013. On effective through-silicon via repair for 3-D-stacked ICs. IEEE Trans Comput-Aided Des Integr Circ Syst, 32(4): 559- 571.

[19]

Kang U , Chung HJ , Heo S , et al., 2010. 8 Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE J Sol-State Circ, 45(1): 111- 119.

[20]

Khalil K , Eldash O , Kumar A , et al., 2021. Self-healing router approach for highperformance network-on-chip. IEEE Open J Circ Syst, 2: 485- 496.

[21]

Khalil K , Kumar A , Bayoumi M , 2024. Dynamic fault tolerance approach for network-on-chip architecture. IEEE J Emerg Sel Top Circ Syst, 14(3): 384- 394.

[22]

Kirtonia P , Williams S , Akter S , et al., 2026. A novel TSV model with fault character-ization for high-frequency transmission in 3D ICs. IEEE Trans Circ Syst I Reg Pap, 73(3): 1742- 1755.

[23]

Lee H , Shin SH , Yoo Y , et al., 2023. TRUST:through-silicon via repair using switch matrix topology. IEEE Trans Comput-Aided Des Integr Circ Syst, 42(7): 2377- 2390.

[24]

Liu C , Chu C , Xu DW , et al., 2022. HyCA:a hybrid computing architecture for faulttolerant deep learning. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(10): 3400- 3413.

[25]

Lung CL , Chien JH , Shi YY , et al., 2011. TSV fault-tolerant mechanisms with applica-tion to 3D clock networks. Proc Int SoC Design Conf, p.127-130.

[26]

Maity DK , Roy SK , Giri C , 2021. TSV-cluster defect tolerance using tree-based redun-dancy for yield improvement of 3-D ICs. IEEE Trans Comput-Aided Des Integr Circ Syst, 40(8): 1500- 1510.

[27]

Mercier R , Killian C , Kritikakou A , et al., 2022. BiSuT:a NoC-based bit-shuffling tech-nique for multiple permanent faults mitigation. IEEE Trans Comput-Aided Des In-tegr Circ Syst, 41(7): 2276- 2289.

[28]

Ni TM , Liu DS , Xu Q , et al., 2020. Architecture of cobweb-based redundant TSV for clustered faults. IEEE Trans Very Large Scale Integr Syst, 28(7): 1736- 1739.

[29]

Ni TM , Xu Q , Huang ZF , et al., 2021. A cost-effective TSV repair architecture for clus-tered faults in 3-D IC. IEEE Trans Comput-Aided Des Integr Circ Syst, 40(9): 1952- 1956.

[30]

Niazmand B , Azad SP , Flich J , et al., 2016. Logic-based implementation of fault-tolerant routing in 3D network-on-chips. Proc 10th IEEE/ACM Int Symp on Networks-on-Chip, p.1-8.

[31]

Ouyang YM , Zhang TB , Li JH , et al., 2024. Fault-tolerant routing for reliable packet transmission in on-chip networks. Microelectr J, 153: 106425.

[32]

Papaphilippou P , Van Chu T , 2024. Efficient deadlock avoidance for 2-D mesh NoCs that use OQ or VOQ routers. IEEE Trans Comput, 73(5): 1414- 1426.

[33]

Reddy RP , Acharyya A , Khursheed S , 2017. IEEE Trans Very Large Scale Integr Syst, 25(7): 2071- 2080.

[34]

Song RH , Zhang JQ , Zhu ZQ , et al., 2024. Fault and self-repair for high reliability in die-to-die interconnection of 2.5D/3D IC. Microelectr Reliab, 158: 115429.

[35]

Taheri E , Kim RG , Nikdast M , 2023. AdEle+:an adaptive congestion-and-energy-aware elevator selection for partially connected 3D networks-on-chip. IEEE Trans Com-put, 72(8): 2278- 2292.

[36]

Wang SC , Chakrabarty MB , Tahoori S , 2019. Defect clustering-aware spare-TSV allo-cation in 3-D ICs for yield enhancement. IEEE Trans Comput-Aided Des Integr Circ Syst, 38(10): 1928- 1941.

[37]

Wei C , Cui XL , Cui XX , 2024. Dy-MFNS-CAC:an encoding mechanism to suppress the crosstalk and repair the hard faults in rectangular TSV arrays. IEEE Trans Reliab, 73(1): 622- 636.

[38]

Xiong RT , Ren W , Zhang CZ , et al., 2025. A sampling-based acceleration method for heterogeneous chiplet NoC simulations. Fut Gener Comput Syst, 166: 107643.

[39]

Xu Q , Geng H , Ni TM , et al., 2022. Fortune:a new fault-tolerance TSV configuration in router-based redundancy structure. IEEE Trans Comput-Aided Des Integr Circ Syst, 41(10): 3182- 3187.

[40]

Zhang Y , Jing ZW , Yang QH , et al., 2025. A survey on routing algorithm and router microarchitecture of three-dimensional network-on-chip. J Syst Archit, 164: 103429.

RIGHTS & PERMISSIONS

The Authors. Published by Zhejiang University Press Co., Ltd.

PDF (2980KB)

Supplementary files

EITEE20260505-CLS-suppl 2

0

Accesses

0

Citation

Detail

Sections
Recommended

/