Cross-layer efforts for energy-efficient computing: towards peta operations per second perwatt
Xiaobo Sharon HU, Michael NIEMIER
Cross-layer efforts for energy-efficient computing: towards peta operations per second perwatt
AsMoore’s law based device scaling and accompanying performance scaling trends are slowing down, there is increasing interest in new technologies and computational models for fast and more energy-efficient information processing. Meanwhile, there is growing evidence that, with respect to traditional Boolean circuits and von Neumann processors, it will be challenging for beyond-CMOS devices to compete with the CMOS technology. Exploiting unique characteristics of emerging devices, especially in the context of alternative circuit and architectural paradigms, has the potential to offer orders of magnitude improvement in terms of power, performance, and capability. To take full advantage of beyond-CMOS devices, cross-layer efforts spanning from devices to circuits to architectures to algorithms are indispensable. This study examines energy-efficient neural network accelerators for embedded applications in this context. Several deep neural network accelerator designs based on cross-layer efforts spanning from alternative device technologies, circuit styles, to architectures are highlighted. Application-level benchmarking studies are presented. The discussions demonstrate that cross-layer efforts indeed can lead to orders of magnitude gain towards achieving extreme-scale energy-efficient processing.
Moore’s law / Energy-efficient computing / Neural network accelerators / Beyond-CMOS devices
[1] |
Avci UE, Rios R, Kuhn K,
|
[2] |
Aziz A, Breyer ET, Chen A,
|
[3] |
Bottou L, 2010. Large-scale machine learning with stochasticgradient descent. Proc 19th Int Conf on Computational Statistics, p.177–186. https://doi.org/10.1007/978-3-7908-2604-3_16
|
[4] |
Chen XM, Yin XZ, Niemier M,
|
[5] |
Chen YH, Krishna T, Emer JS,
|
[6] |
Chua LO, Roska T, 2002. Cellular Neural Networks and Visual Computing: Foundations and Applications. Cambridge University Press, New York, NY, USA.
|
[7] |
Chua LO, Yang L, 1988. Cellular neural networks: theory. IEEE Trans Circ Syst, 35(10):1257–1272. https://doi.org/10.1109/31.7600
|
[8] |
Dahl GE, Sainath TN, Hinton GE, 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.8609–8613. https://doi.org/10.1109/ICASSP.2013.6639346
|
[9] |
Esmaeilzadeh H, Blem E, St. Amant R,
|
[10] |
Esmaeilzadeh H, Blem ESt. Amant R,
|
[11] |
George S, Aziz A, Li XQ,
|
[12] |
George S, Ma KS, Aziz A,
|
[13] |
Horváth A, Hillmer M, Lou QW,
|
[14] |
Ionescu AM, Riel H, 2011. Tunnel field-effect transistors as energy-efficient electronic switches. Nature, 479(7373):329–337. https://doi.org/10.1038/nature10679
|
[15] |
Kam H, Liu TJK, Alon E, 2012. Design requirements for steeply switching logic devices. IEEE Trans Electron Dev, 59(2):326–334. https://doi.org/10.1109/TED.2011.2175484
|
[16] |
Khatami Y, Banerjee K, 2009. Steep subthreshold slope n- and p-type tunnel-FET devices for low-power and energy-efficient digital circuits. IEEE Trans Electron Dev, 56(11):2752–2761. https://doi.org/10.1109/TED.2009.2030831
|
[17] |
Kim K, Lee S, Kim JY,
|
[18] |
LeCun Y, Bottou L, Bengio Y,
|
[19] |
Li MO, Yan RS, Jena D,
|
[20] |
Liu HC, Datta S, Shoaran M,
|
[21] |
Lou QW, Palit I, Horváth A,
|
[22] |
Lou QW, Pan CY, McGuinness J,
|
[23] |
Molinar-Solis JE, Gomez-Castaneda F, Moreno-Cadenas JA,
|
[24] |
Moons B, Verhelst M, 2016. A 0.3–2.6 TOPS/W precisionscalable processor for real-time large-scale ConvNets. Proc IEEE Symp on VLSI Circuits, p.1–2. https://doi.org/10.1109/VLSIC.2016.7573525
|
[25] |
Nikonov DE, Young IA, 2013. Overview of beyond-CMOS devices and a uniform methodology for their benchmarking. Proc IEEE, 101(12):2498–2533. https://doi.org/10.1109/JPROC.2013.2252317
|
[26] |
Nikonov DE, Young IA, 2015. Benchmarking of beyond-CMOS exploratory devices for logic integrated circuits. IEEE J Explor Sol-State Comput Dev Circ, 1:3–11. https://doi.org/10.1109/JXCDC.2015.2418033
|
[27] |
Pan CY, Naeemi A, 2017a. Beyond-CMOS device benchmarking for Boolean and non-Boolean logic applications. http://cn.arxiv.org/abs/1711.04295
|
[28] |
Pan CY, Naeemi A, 2017b. Beyond-CMOS non-Boolean logic benchmarking: insights and future directions. Proc Design, Automation & Test in Europe Conf & Exhibition, p.133–138. https://doi.org/10.23919/DATE.2017.7926971
|
[29] |
Perricone R, Hu XS, Nahas J,
|
[30] |
Reagen B, Whatmough P, Adolf R,
|
[31] |
Reis D, Niemier M, Hu X, 2018. Computing in memory with FeFETs. Proc IEEE/ACM Int Symp on Low Power Electronics and Design, p.1–6. https://doi.org/10.1145/2627369.2627631
|
[32] |
Rodriguez-Vázquez A, Liñán-Cembrano G, Carranza L,
|
[33] |
Salahuddin S, Datta S, 2008. Use of negative capacitance to provide voltage amplification for low power nanoscale devices. Nano Lett, 8(2):405–410. https://doi.org/10.1021/nl071804g
|
[34] |
Salmon L, 2017. A DARPA Perspective. https://www.src. org/calendar/e006128/agenda/salmon-darpa.pdf
|
[35] |
Scheutz M, McRaven J, Cserey G, 2004. Fast, reliable, adaptive, bimodal people tracking for indoor environments. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.1347–1352. https://doi.org/10.1109/IROS.2004.1389583
|
[36] |
Seabaugh AC, Zhang Q, 2010. Low-voltage tunnel transistors for beyond CMOS logic. Proc IEEE, 98(12):2095–2110. https://doi.org/10.1109/JPROC.2010.2070470
|
[37] |
Sedighi B, Hu XS, Liu HC,
|
[38] |
Szegedy C, Vanhoucke V, Ioffe S,
|
[39] |
Szolgay P, Szatmari I, Laszlo K, 1997. A fast fixed point learning method to implement associative memory on CNNs. IEEE Trans Circ Syst I, 44(4):362–366. https://doi.org/10.1109/81.563627
|
[40] |
Tang TQ, Xia LX, Li BX,
|
[41] |
Wan L, Zeiler M, Zhang S,
|
[42] |
Wang L, de Gyvez JP, Sanchez-Sinencio E, 1998. Time multiplexed color image processing based on a CNN with cell-state outputs. IEEE Trans VLSI Syst, 6(2):314–322. https://doi.org/10.1109/92.678895
|
[43] |
Whatmough PN, Lee SK, Lee H,
|
[44] |
Xu XW, Lu Q, Wang TC,
|
[45] |
Yin XZ, Aziz A, Nahas J,
|
[46] |
Yin XZ, Sedighi B, Niemier M,
|
[47] |
Yin XZ, Niemier M, Hu XS, 2017. Design and benchmarking of ferroelectric FET based TCAM. Proc Design, Automation & Test in Europe Conf & Exhibition, p.1448–1453. https://doi.org/10.23919/DATE.2017.7927219
|
[48] |
Zhao W, Cao Y, 2006. New generation of predictive technologymodel for sub-45 nm early design exploration. IEEE Trans Electron Dev, 53(11):2816–2823. https://doi.org/10.1109/TED.2006.884077
|
[49] |
Zhou G, Li R, Vasen T,
|
/
〈 | 〉 |