WSC optimizer: an optimization tool for wafer-scale chip architecture exploration

Wenbo ZHANG; Bo DING; Shuai WEI; Qinrang LIU; Hong YU; Ke SONG; Wei GUO; Bo MEI; Rui ZHENG

doi:10.1631/ENG.ITEE.2025.0008

Eng Inform Technol Electron Eng ›› 2026, Vol. 27 ›› Issue (4) :250008 DOI: 10.1631/ENG.ITEE.2025.0008

Research Article

WSC optimizer: an optimization tool for wafer-scale chip architecture exploration

Author information +

History +

PDF (1430KB)

Abstract

In recent years, mature advanced packaging technologies have increasingly enabled the integration of multiple small dies into larger chips, while retaining chip-scale density and high-bandwidth interconnects. To address the inefficiencies of manual design and the challenges of heterogeneous optimization in wafer-scale chip (WSC) development, we systematically explore key factors in WSC architecture design. We integrate chip layout, operator mapping, and hardware-software codesign, and formulate the WSC architecture exploration problem as a multi-objective optimization task. First, we establish a hierarchical architecture model for WSCs, unifying the quantification of core constraints and interconnect topology constraints; second, we propose a hierarchical multi-objective collaborative optimization framework to jointly optimize physical constraints and task mapping communication patterns; finally, we develop a WSC optimizer toolchain that supports mixed-granularity simulation and generates optimal configurations for representative workloads. Experimental results demonstrate that compared with traditional computer architectures, the optimized architectures generated by our WSC optimizer achieve up to a 22× throughput improvement and a 5× latency reduction in application domains, such as cryptographic decryption and signal processing.

Keywords

Wafer-scale chip / Hardware-software co-design / Chip layout / Design space exploration

Cite this article

Download citation ▾

Wenbo ZHANG, Bo DING, Shuai WEI, Qinrang LIU, Hong YU, Ke SONG, Wei GUO, Bo MEI, Rui ZHENG. WSC optimizer: an optimization tool for wafer-scale chip architecture exploration. Eng Inform Technol Electron Eng, 2026, 27(4): 250008 DOI:10.1631/ENG.ITEE.2025.0008

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Achiam J , Adler S , Agarwal S , et al., 2023. GPT-4 Technical Report.https://api.semanticscholar.org/CorpusID: 257532815[Accessed on Dec. 1, 2025].

[2]	Ahmad M , DeLaCruz J , Ramamurthy A , 2022. Heterogeneous integration of chiplets: cost and yield tradeoff analysis. Proc 23rd Int Conf on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems, p.1-9.

[3]	Ali H , Tariq UU , Hardy J , et al., 2021. A survey on system-level energy optimisation for MPSoCs in IoT and consumer electronics. Comput Sci Rev, 41: 100416.

[4]	Baktash JA , Dawodi M , 2023. GPT-4: a review on advancements and opportunities in natural language processing. J Elect Electron Eng, 2 (4): 548- 549.

[5]	Binkert N , Beckmann B , Black G , et al., 2011. The gem5 simulator. ACM SIGARCH Comput Archit News, 39 (2): 1- 7.

[6]	Bohr M M , 2009. The new era of scaling in an SoC world. Proc IEEE Int Solid-State Circuits Conf-Digest of Technical Papers, p.23-28.

[7]	Brown TB , Mann B , Ryder N , et al., 2020. Language models are few-shot learners. Proc 34^th Int Conf on Neural Information Processing Systems, Article 159.

[8]	Burns JA , Aull BF , Chen CK , et al., 2006. A wafer-scale 3-D circuit integration technology. IEEE Trans Electron Dev, 53 (10): 2507- 2516.

[9]	Chakaravarthy RV , Kwon H , Jiang H , et al., 2021. Vision control unit in fully self-driving vehicles using Xilinx MPSoC and opensource stack. Proc 26^th Asia and South Pacific Design Automation Conf, p.311-317.

[10]	Chen SX , Li SY , Zhuang Z , et al., 2024. Floorplet: performance-aware floorplan framework for chiplet integration. IEEE Trans ComputAid Des Integr Circ Syst, 43 (6): 1638- 1649.

[11]	Chen YW , Wang RH , Cheng YH , et al., 2024. SUN: dynamic hybridprecision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks. IEEE Trans Comput-Aid Des Integr Circ Syst, 43 (7): 2163- 2176.

[12]	Chowdhery A , Narang S , Devlin J , et al., 2023. PaLM: scaling language modeling with pathways. J Mach Learn Res, 24 (1): 240.

[13]	Deng CH , Li XY , Feng Z , et al., 2022. GARNet: reduced-rank topology learning for robust and scalable graph neural networks.

[14]	Feng YX , Ma KS , 2022. Chiplet actuary: a quantitative cost model and multi-chiplet architecture exploration. Proc 59^th ACM/IEEE Design Automation Conf, p.121-126.

[15]	Hammarlund P , Martinez AJ , Bajwa AA , et al., 2014. Haswell: the fourth-generation Intel Core Processor. IEEE Micro, 34 (2): 6- 20.

[16]	Han YH , Xu HB , Lu MX , et al., 2024. The big chip: challenge, model and architecture. Fund Res, 4 (6): 1431- 1441.

[17]	Hu Y , Lin XH , Wang HZ , et al., 2024. Wafer-scale computing: advancements, challenges, and future perspectives. IEEE Circ Syst Mag, 24 (1): 52- 81.

[18]	IEEE , 2024. International Roadmap for Devices and Systems^TM.https://irds.ieee.org/images/files/pdf/2024/2024IRDS_MET.pdf[Accessed on Dec. 1, 2025].

[19]	Jung S , Lee H , Myung S , et al., 2022. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature, 601(7892): 211- 216.

[20]	Leon V , Minaidis P , Lentaris G , et al., 2023. Accelerating AI and computer vision for satellite pose estimation on the Intel Myriad X embedded SoC. Microprocess Microsyst, 103: 104947. Microprocess Microsyst, 103: 104947.

[21]	Leon V , Minaidis P , Soudris D , et al., 2024. MPAI: a co-processing architecture with MPSoC & AI accelerators for vision applications in space. Proc 31^st IEEE Int Conf on Electronics, Circuits and Systems, p.1-2.

[22]	Li FP , Wang Y , Cheng YQ , et al., 2022. GIA: a reusable general interposer architecture for agile chiplet integration. Proc IEEE/ACM Int Conf on Computer Aided Design, p.1-9.

[23]	Li ZS , Liu LB , Deng YD , et al., 2017. Aggressive pipelining of irregular applications on reconfigurable hardware. Proc 44^th Annual Int Symp on Computer Architecture, p.575-586.

[24]	Loh Y , Xie Y , Black B , 2007. Processor design in 3D die-stacking technologies. IEEE Micro, 27 (3): 31- 48.

[25]	Markidis S , Der Chien SW , Laure E , et al., 2018. NVIDIA tensor core programmability, performance & precision. Proc IEEE Int Parallel and Distributed Processing Symp Workshops, p.522-531.

[26]	Pal S , Petrisko D , Tomei M , et al., 2019. Architecting waferscale processors-a GPU case study. Proc IEEE Int Symp on High Performance Computer Architecture, p.250-263.

[27]	Pal S , Liu JY , Alam I , et al., 2021. Designing a 2048-chiplet, 14336-core waferscale processor. Proc 58^th ACM/IEEE Design Automation Conf, p.1183-1188.

[28]	Panousopoulos V , Papaloukas E , Leon V , et al., 2024. HW/SW codesign on embedded SoC FPGA for star tracking optimization in space applications. J Real-Time Image Proc, 21 (1): 16.

[29]	Patel D , Wong G , 2023. GPT-4 architecture, infrastructure, training dataset, costs, vision, MoE. Proc Demystifying GPT-4: the Engineering Tradeoffs that Led OpenAI to Their Architecture, p.1-17.

[30]	Raffel C , Shazeer N , Roberts A , et al., 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res, 21 (140): 1- 67.

[31]	Shao YS , Clemons J , Venkatesan R , et al., 2019. Simba: scaling deep-learning inference with multi-chip-module-based architecture. Proc 52^nd Annual IEEE/ACM Int Symp on Microarchitecture, p.14-27.

[32]	Talpes E , Williams D , Sarma DD , 2022. DOJO: the microarchitecture of Tesla's exa-scale computer. Proc IEEE Hot Chips 34 Symp, p.1-28.

[33]	Tang XP , Tian RQ , Wong DF , 2001. Fast evaluation of sequence pair in block placement by longest common subsequence computation. IEEE Trans Comput-Aid Des Integr Circ Syst, 20 (12): 1406- 1413.

[34]	Tatar G , Bayar S , Çiçek İ , et al., 2024. Real-time multi-learning deep neural network on an MPSoC-FPGA for intelligent vehicles: harnessing hardware acceleration with pipeline. IEEE Trans Intell Veh, 9 (6): 5021- 5032.

[35]	Touvron H , Martin L , Stone K , et al., 2023. Llama 2: open foundation and fine-tuned chat models.

[36]	Turner WJ ， Poulton JW , Wilson JM , et al., 2018. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. Proc IEEE Custom Integrated Circuits Conf, p.1-8.

[37]	Venkatesan R , Shao YS , Wang MR , et al., 2019. MAGNet: a modular accelerator generator for neural networks. Proc IEEE/ACM Int Conf on Computer-Aided Design, p.1-8.

[38]	Weng J , Liu SH , Dadu V , et al., 2020. DSAGen: synthesizing programmable spatial accelerators. Proc 47^th Annual Int Symp on Computer Architecture, p.268-281.

[39]	Wu JX , Liu QR , Shen JL , et al., 2024. From SoC to SDSoW: a new paradigm for microelectronics development. Sci Sin Inform, 54: 1350- 1368.

[40]	Xu QZ , Wang CH , Li ZQ , et al., 2025. A wafer-scale heterogeneous integration thermal simulator. Appl Therm Eng, 264: 125459.

[41]	Yenduri G , Ramalingam M , Selvi GC , et al., 2024. GPT (generative pre-trained transformer)-a comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access, 12: 54608- 54649.

[42]	Zhang JM , Wang XY , Ye YY , et al., 2024. M2M: a fine-grained mapping framework to accelerate multiple DNNs on a multi-chiplet architecture. IEEE Trans VLSI Syst, 32 (10): 1864- 1877.

[43]	Zhang SS , Roller S , Goyal N , et al., 2022. OPT: open pre-trained transformer language models.

[44]	Zhu JC , Xue CH , Chen YQ , et al., 2025. Theseus: exploring efficient wafer-scale chip design for large language models. IEEE Trans Comput-Aid Des Integr Circ Syst, 44 (12): 4793- 4806.

[45]	Zhuang Z , Yu B , Chao KY , et al., 2022. Multi-package co-design for chiplet integration. Proc 41^st IEEE/ACM Int Conf on ComputerAided Design, Article 4.

[46]	Zou DX , Wang GG , Pan G , et al., 2016. A modified simulated annealing algorithm and an excessive area model for floorplanning using fixed-outline constraints. Front Inform Technol Electron Eng, 17 (11): 1228- 1244.