Research on the development and challenges of PIM

Qingjie LANG; Ruoxi WANG; Donghuan XIE; Zhiwei WANG; Zhenyu GAO; Li SHEN

doi:10.1007/s11704-024-40318-9

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (5) :2005103 DOI: 10.1007/s11704-024-40318-9

Architecture

REVIEW ARTICLE

Research on the development and challenges of PIM

Qingjie LANG ¹^,²
, Ruoxi WANG ¹^,²
, Donghuan XIE ¹^,²
, Zhiwei WANG ¹^,²
, Zhenyu GAO ¹^,²
, Li SHEN ¹^,²^,^†

Author information +

History +

PDF (971KB)

Abstract

The performance of current computer system is limited by the Memory Wall caused by the unbalanced development between memory technology and processor technology. To reduce the overhead of data movement between memory and processor, a series of Processing-in-Memory (PIM) systems have been developed to move computing closer to memory. In this article, PIM focuses on exploiting the analog operational properties to compute in DRAM. We provide a comprehensive summary from three perspectives: the development of PIM’s basic operations, the programmability of PIM and the challenges faced by PIM. The development of PIM’s basic operations focuses on current development in implementing various types of computation, such as logic operations and complex arithmetic operations. The programmability of PIM emphasizes the combination of PIM systems with existing systems and the development of ISAs, libraries and compiler systems to enhance ease of programming. The challenges faced by PIM primarily highlight some crucial obstacles in terms of software and architecture development. Current developments of PIM present both opportunities and challenges, and the goal of this article is to provide researchers with a comprehensive understanding of PIM’s advancements.

Graphical abstract

Keywords

DRAM / processing-in-memory / main memory / processing-using-DRAM / programming

Cite this article

Download citation ▾

Qingjie LANG, Ruoxi WANG, Donghuan XIE, Zhiwei WANG, Zhenyu GAO, Li SHEN. Research on the development and challenges of PIM. Front. Comput. Sci., 2026, 20 (5) : 2005103 DOI:10.1007/s11704-024-40318-9

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	McKee S A. Reflections on the memory wall. In: Proceedings of the 1st Conference on Computing Frontiers. 2004, 162

[2]	Mutlu O, Ghose S, Gómez-Luna J, Ausavarungnirun R . Processing data where it makes sense: enabling in-memory computation. Microprocessors and Microsystems, 2019, 67: 28–41

[3]	Aga S, Jeloka S, Subramaniyan A, Narayanasamy S, Blaauw D, Das R. Compute caches. In: Proceedings of 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 2017, 481−492

[4]	Eckert C, Wang X, Wang J, Subramaniyan A, Iyer R, Sylvester D, Blaaauw D, Das R. Neural cache: bit-serial in-cache acceleration of deep neural networks. In: Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 2018, 383−396

[5]	Zhang D, Lang Q, Wang R, Shen L . Extension VM: interleaved data layout in vector memory. ACM Transactions on Architecture and Code Optimization, 2024, 21( 1): 18

[6]	Fujiki D, Mahlke S, Das R. Duality cache for data parallel acceleration. In: Proceedings of the 46th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 2019, 1−14

[7]	Fan R, Cui Y, Chen Q, Wang M, Zhang Y, Zheng W, Li Z. MAICC: a lightweight many-core architecture with in-cache computing for multi-DNN parallel inference. In: Proceedings of the 56th IEEE/ACM International Symposium on Microarchitecture (MICRO). 2023, 411−423

[8]	Al-Hawaj K, Ta T, Cebry N, Agwa S, Afuye O, Hall E, Golden C, Apsel A B, Batten C. EVE: ephemeral vector engines. In: Proceedings of 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2023, 691−704

[9]

Seshadri V, Kim Y, Fallin C, Lee D, Ausavarungnirun R, Pekhimenko G, Luo Y, Mutlu O, Gibbons P B, Kozuch M A, Mowry T C. RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2013, 185−197

[10]	Chang K K, Nair P J, Lee D, Ghose S, Qureshi M K, Mutlu O. Low-cost inter-linked subarrays (LISA): enabling fast inter-subarray data movement in dram. In: Proceedings of 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). 2016, 568−580

[11]	Rezaei S H S, Modarressi M, Ausavarungnirun R, Sadrosadati M, Mutlu O, Daneshtalab M . NoM: network-on-memory for inter-bank data transfer in highly-banked memories. IEEE Computer Architecture Letters, 2020, 19( 1): 80–83

[12]	Deng Q, Jiang L, Zhang Y, Zhang M, Yang J. DrAcc: a dram based accelerator for accurate CNN inference. In: Proceedings of the 55th ACM/ESDA/IEEE Design Automation Conference (DAC). 2018, 1−6

[13]	Li S, Niu D, Malladi K T, Zheng H, Brennan B, Xie Y. DRISA: a DRAM-based reconfigurable in-situ accelerator. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2017, 288−301

[14]

Lenjani M, Gonzalez P, Sadredini E, Li S, Xie Y, Akel A, Eilert S, Stan M R, Skadron K. Fulcrum: a simplified control and access mechanism toward flexible and practical in-situ accelerators. In: Proceedings of 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 2020, 556−569

[15]

Hajinazar N, Oliveira G F, Gregorio S, Ferreira J D, Ghiasi N M, Patel M, Alser M, Ghose S, Gómez-Luna J, Mutlu O. SIMDRAM: a framework for bit-serial SIMD processing using DRAM. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021, 329−345

[16]

Oliveira G F, Olgun A, Yağlıkçı A G, Bostancı F N, Gómez-Luna J, Ghose S, Mutlu O. MIMDRAM: an end-to-end processing-using-dram system for high-throughput, energy-efficient and programmer-transparent multiple-instruction multiple-data computing. In: Proceedings of 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2024, 186−203

[17]

Ferreira J D, Falcao G, Gómez-Luna J, Alser M, Orosa L, Sadrosadati M, Kim J S, Oliveira G F, Shahroodi T, Nori A, Mutlu O. pLUTo: enabling massively parallel computation in DRAM via lookup tables. In: Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). 2022, 900−919

[18]

Seshadri V, Lee D, Mullins T, Hassan H, Boroumand A, Kim J, Kozuch M A, Mutlu O, Gibbons P B, Mowry T C. Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2017, 273−287

[19]	Deng Q, Zhang Y, Zhang M, Yang J. LAcc: exploiting lookup table-based fast and accurate vector multiplication in dram-based CNN accelerator. In: Proceedings of the 56th ACM/IEEE Design Automation Conference (DAC). 2019, 1−6

[20]	Chi P, Li S, Xu C, Zhang T, Zhao J, Liu Y, Wang Y, Xie Y. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In: Proceedings of 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 2016, 27−39

[21]

Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan J P, Hu M, Williams R S, Srikumar V. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). 2016, 14−26

[22]

Park J, Azizi R, Oliveira G F, Sadrosadati M, Nadig R, Novo D, Gómez-Luna J, Kim M, Mutlu O. Flash-cosmos: in-flash bulk bitwise operations using inherent computation capability of NAND flash memory. In: Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). 2022, 937−955

[23]	Nag A, Ramachandra C N, Balasubramonian R, Stutsman R, Giacomin E, Kambalasubramanyam H, Gaillardon P E. GenCache: leveraging in-cache operators for efficient sequence alignment. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 2019, 334−346

[24]	Pawlowski J T. Hybrid memory cube (HMC). In: Proceedings of 2011 IEEE Hot Chips 23 Symposium (HCS). 2011, 1−24

[25]	Subramaniyan A, Das R . Parallel automata processor. ACM SIGARCH Computer Architecture News, 2017, 45( 2): 600–612

[26]	Kim Y B, Chen T. Assessing merged DRAM/logic technology. In: Proceedings of 1996 IEEE International Symposium on Circuits and Systems (ISCAS). 1996, 133−136

[27]	Ali M F, Jaiswal A, Roy K . In-memory low-cost bit-serial addition using commodity DRAM technology. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67( 1): 155–165

[28]	Angizi S, Fan D. GraphiDe: a graph processing accelerator leveraging in-dram-computing. In: Proceedings of 2019 Great Lakes Symposium on VLSI. 2019, 45−50

[29]	Gao F, Tziantzioulis G, Wentzlaff D. ComputeDRAM: in-memory compute using off-the-shelf drams. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 2019, 100−113

[30]	Xin X, Zhang Y, Yang J. ELP2IM: efficient and low power bitwise operation processing in DRAM. In: Proceedings of 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 2020, 303−314

[31]	Seshadri V, Hsieh K, Boroum A, Lee D, Kozuch M A, Mutlu O, Gibbons P B, Mowry T C . Fast bulk bitwise AND and OR in DRAM. IEEE Computer Architecture Letters, 2015, 14( 2): 127–131

[32]	Angizi S, Fan D. ReDRAM: a reconfigurable processing-in-DRAM platform for accelerating bulk bit-wise operations. In: Proceedings of 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2019, 1−8

[33]	Sutradhar P R, Connolly M, Bavikadi S, Pudukotai Dinakarrao S M, Indovina M A, Ganguly A . pPIM: a programmable processor-in-memory architecture with precision-scaling for deep learning. IEEE Computer Architecture Letters, 2020, 19( 2): 118–121

[34]	Connolly M, Sutradhar P R, Indovina M, Ganguly A. Flexible instruction set architecture for programmable look-up table based processing-in-memory. In: Proceedings of the 39th IEEE International Conference on Computer Design (ICCD). 2021, 66−73

[35]	Zhou R, Roohi A, Misra D, Angizi S. FlexiDRAM: a flexible in-DRAM framework to enable parallel general-purpose computation. In: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. 2022, 7

[36]	Zhou R, Tabrizchi S, Roohi A, Angizi S . LT-PIM: an LUT-based processing-in-DRAM architecture with RowHammer self-tracking. IEEE Computer Architecture Letters, 2022, 21( 2): 141–144

[37]	Peng X, Wang Y, Yang M C. CHOPPER: a compiler infrastructure for programmable bit-serial SIMD processing using memory in DRAM. In: Proceedings of 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2023, 1275−1288

[38]	Zhang C, Sun H, Li S, Wang Y, Chen H, Liu H . A survey of memory-centric energy efficient computer architecture. IEEE Transactions on Parallel and Distributed Systems, 2023, 34( 10): 2657–2670

[39]	Mutlu O, Ghose S, Gómez-Luna J, Ausavarungnirun R. A modern primer on processing in memory. 2020, arXiv preprint arXiv: 2012.03112

[40]

Seshadri V, Mullins T, Boroumand A, Mutlu O, Gibbons P B, Kozuch M A, Mowry T C. Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses. In: Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2015, 267−280

[41]	Intel Corp. 6th Generation Intel® Core™ Processor Family: Datasheet—Volume 1. See intel.com/content/www/us/en/content-details/332687/6th-generation-intel-core-processor-family-datasheet-volume-1.html website

[42]	Lempel O. 2nd Generation Intel® Core Processor Family: Intel® Core i7, i5 and i3. In: 2011 IEEE Hot Chips 23 Symposium(HCS). 2011, 1-48

[43]	JEDEC Solid State Technology Association. JESD79-4B DDR4 SDRAM. Arlington: JEDEC Solid State Technology Association, 2017.

[44]	Kim Y, Yang W, Mutlu O . Ramulator: a fast and extensible DRAM simulator. IEEE Computer Architecture Letters, 2016, 15( 1): 45–49

[45]	JEDEC Solid State Technology Association. JESD235D High bandwidth memory (HBM) DRAM. 2021

[46]	Hajinazar N, Oliveira G F, Gregorio S, Ferreira J, Ghiasi N M, Patel M, Alser M, Ghose S, Gómez Luna J, Mutlu O. SIMDRAM: an end-to-end framework for bit-serial SIMD computing in dram. 2021, arXiv preprint arXiv: 2105.12839

[47]	Hassan H, Patel M, Kim J S, Yaglikci A G, Vijaykumar N, Ghiasi N M, Ghose S, Mutlu O. CROW: a low-cost substrate for improving DRAM performance, energy efficiency, and reliability. In: Proceedings of 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 2019, 129−142

[48]

Chang K K, Yağlıkçı A G, Ghose S, Agrawal A, Chatterjee N, Kashyap A, Lee D, O’Connor M, Hassan H, Mutlu O . Understanding reduced-voltage operation in modern dram devices: experimental characterization, analysis, and mechanisms. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2017, 1( 1): 10

[49]	Gupta S, Rosing T Š. Invited: accelerating fully homomorphic encryption with processing in memory. In: Proceedings of the 58th ACM/IEEE Design Automation Conference (DAC). 2021, 1335−1338

[50]	Xin X, Zhang Y, Yang J. ROC: DRAM-based processing with reduced operation cycles. In: Proceedings of the 56th ACM/IEEE Design Automation Conference (DAC). 2019, 1−6

[51]	Sim J, Seol H, Kim L S. NID: processing binary convolutional neural network in commodity DRAM. In: Proceedings of 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2018, 1−8

[52]	Zhou R, Roohi A, Misra D, Angizi S. ReD-LUT: reconfigurable in-DRAM LUTs enabling massive parallel computation. In: Proceedings of 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 2022, 1−8

[53]	Oliveira G F, Gómez-Luna J, Ghose S, Boroumand A, Mutlu O . Accelerating neural network inference with processing-in-DRAM: from the edge to the cloud. IEEE Micro, 2022, 42( 6): 25–38

[54]	Walker A J, Lee S, Beery D . On DRAM rowhammer and the physics of insecurity. IEEE Transactions on Electron Devices, 2021, 68( 4): 1400–1410

[55]	Goswami K, Das S, Satapathy S, Banerjee D S. A case for amplifying row hammer attacks via cell-coupling in DRAM devices. In: Proceedings of 2022 International Symposium on Memory Systems. 2023, 3

[56]	Zhou R, Tabrizchi S, Morsali M, Roohi A, Angizi S. P-PIM: a parallel processing-in-DRAM framework enabling row hammer protection. In: Proceedings of 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2023, 1−6

[57]	Sutradhar P R, Bavikadi S, Connolly M, Prajapati S, Indovina M A, Dinakarrao S M P, Ganguly A . Look-up-table based processing-in-memory architecture with programmable precision-scaling for deep learning applications. IEEE Transactions on Parallel and Distributed Systems, 2022, 33( 2): 263–275

[58]	Chen C, Qian W, Imani M, Yin X, Zhuo C . PAM: a piecewise-linearly-approximated floating-point multiplier with unbiasedness and configurability. IEEE Transactions on Computers, 2022, 71( 10): 2473–2486

[59]	Imani M, Sokolova A, Garcia R, Huang A, Wu F, Aksanli B, Rosing T. ApproxLP: approximate multiplication with linearization and iterative error control. In: Proceedings of the 56th ACM/IEEE Design Automation Conference (DAC). 2019, 1−6

[60]

Yağlikçi A G, Patel M, Kim J S, Azizi R, Olgun A, Orosa L, Hassan H, Park J, Kanellopoulos K, Shahroodi T, Ghose S, Mutlu O. BlockHammer: preventing rowhammer at low cost by blacklisting rapidly-accessed dram rows. In: Proceedings of 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 2021, 345−358

[61]	Park Y, Kwon W, Lee E, Ham T J, Ho Ahn J, Lee J W. Graphene: strong yet lightweight row hammer protection. In: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2020, 1−13

[62]	Olgun A, Luna J G, Kanellopoulos K, Salami B, Hassan H, Ergin O, Mutlu O . PiDRAM: a holistic end-to-end FPGA-based framework for processing-in-DRAM. ACM Transactions on Architecture and Code Optimization, 2022, 20( 1): 8

[63]	Gómez-Luna J, Hajj I E, Fernandez I, Giannoula C, Oliveira G F, Mutlu O . Benchmarking a new paradigm: experimental analysis and characterization of a real processing-in-memory system. IEEE Access, 2022, 10: 52565–52608

RIGHTS & PERMISSIONS

Higher Education Press

PDF (971KB)

2412

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates