ARCHER: a ReRAM-based accelerator for compressed recommendation systems
Xinyang SHEN , Xiaofei LIAO , Long ZHENG , Yu HUANG , Dan CHEN , Hai JIN
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (5) : 185607
ARCHER: a ReRAM-based accelerator for compressed recommendation systems
Modern recommendation systems are widely used in modern data centers. The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and memory. ReRAM-based processing-in-memory (PIM) can resolve this problem by processing embedding vectors where they are stored. However, the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip, which induces off-chip accesses that may offset the PIM profits. Therefore, we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance loss. In this paper, we propose ARCHER, a ReRAM-based PIM architecture that implements fully on-chip recommendations under resource constraints. First, we make a full analysis of the computation pattern and access pattern on the decomposed table. Based on the computation pattern, we unify the operations of each layer of the decomposed model in multiply-and-accumulate operations. Based on the access observation, we propose a hierarchical mapping schema and a specialized hardware design to maximize resource utilization. Under the unified computation and mapping strategy, we can coordinate the inter-processing elements pipeline. The evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system, the state-of-the-art near-memory processing recommendation system RecNMP, and the ReRAM-based recommendation accelerator REREC by , , and in terms of performance and , , and in terms of energy savings, respectively.
recommendation system / ReRAM / processing-in-memory / embedding layer
| [1] |
Ke L, Gupta U, Cho B Y, Brooks D, Chandra V, Diril U, Firoozshahian A, Hazelwood K, Jia B, Lee H H S, Li M, Maher B, Mudigere D, Naumov M, Schatz M, Smelyanskiy M, Wang X, Reagen B, Wu C J, Hempstead M, Zhang X. RecNMP: Accelerating personalized recommendation with near-memory processing. In: Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 2020, 790−803 |
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
Hwang R, Kim T, Kwon Y, Rhu M. Centaur: a chiplet-based, hybrid sparse-dense accelerator for personalized recommendations. In: Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 2020, 968−981 |
| [8] |
Kal H, Lee S, Ko G, Ro W W. SPACE: locality-aware processing in heterogeneous memory for personalized recommendations. In: Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture. 2021, 679−691 |
| [9] |
|
| [10] |
|
| [11] |
Imani M, Gupta S, Kim Y, Rosing T. FloatPIM: in-memory acceleration of deep neural network training with high precision. In: Proceedings of the 46th ACM/IEEE Annual International Symposium on Computer Architecture. 2019, 802−815 |
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
Zha Y, Li J. Hyper-AP: enhancing associative processing through a full-stack optimization. In: Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture. 2020, 846−859 |
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
Yin C, Acun B, Wu C J, Liu X. TT-Rec: Tensor train compression for deep learning recommendation models. 2021, arXiv preprint arXiv: 2101.11714 |
| [22] |
|
| [23] |
Xu C, Niu D, Muralimanohar N, Balasubramonian R, Zhang T, Yu S, Xie Y. Overcoming the challenges of crossbar resistive memory architectures. In: Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture. 2015, 476−488 |
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J. Product-based neural networks for user response prediction. In: Proceedings of the 16th IEEE International Conference on Data Mining. 2016, 1149−1154 |
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |