Energy-efficient reconfigurable FPGA accelerator for marine convolutional neural network-long short-term memory applications
Qiuyu Wang , Qi Wen , Zhiqiang Wei , Haiyang Mao , Ruitao Tao , Hao Zhang
Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) : 9
Deep learning models that combine convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have demonstrated strong capabilities in spatiotemporal feature extraction, proving effective for applications such as ocean environment monitoring and forecasting. Specialized artificial intelligence (AI) processors are often required for marine equipment with constrained computational resources and energy budgets to handle AI workloads. However, the distinct computational and memory access patterns of CNNs and LSTMs present significant challenges for designing efficient edge AI processors; existing hardware accelerators often struggle to efficiently support the heterogeneous computational patterns, irregular dataflow, and dynamic precision requirements of such hybrid models. To address these challenges, this paper proposes a dynamically reconfigurable field-programmable gate array (FPGA)-based accelerator tailored for parallel CNN-LSTM computation. The proposed architecture integrates a mixed-precision computation array, multilevel reconfigurable processing elements, and a triple-mode dataflow controller supporting weight-stationary/output-stationary/row-stationary dataflow, thereby enabling adaptive resource allocation and enhanced data reuse under diverse computation patterns. The accelerator is designed to efficiently execute both individual and hybrid CNN-LSTM workloads. Experimental evaluation on a representative ConvLSTM-based sea surface temperature prediction task demonstrates that the proposed design achieves high throughput and energy efficiency in both convolutional and recurrent computation phases.
FPGA accelerator / Convolutional neural network-long short-term memory / Mixed precision / Dataflow optimization / Ocean monitoring
| [1] |
|
| [2] |
Cao SJ, Zhang C, Yao ZL, Xiao WC, Nie LS, Zhan DC et al (2019) Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: FPGA’19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 63–72 |
| [3] |
Chen YH, Emer J, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient convolutional neural networks. In: 2016 ACM/IEEE 43rd International Symposium on Computer Architecture (ISCA). IEEE, pp 367–379 |
| [4] |
|
| [5] |
Gao C, Neil D, Ceolini E, Liu SC, Delbruck T (2018) DeltaRNN: a power-efficient recurrent neural network accelerator. In: FPGA’18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 21–30 |
| [6] |
|
| [7] |
Hale C (2024) Dynamic reconfigurable CNN accelerator for embedded edge computing: a hardware-software co-design approach to minimize power and resource consumption. Trans Comput Sci Methods 4(9):1–10. https://pspress.org/index.php/tcsm/article/view/139 |
| [8] |
Han S, Kang JL, Mao HZ, Hu YM, Li X, Li YB et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA’17: Proceedings of the 20178 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 1–10 |
| [9] |
Han S, Liu XY, Mao HZ, Pu J, Pedram A, Horowitz MA et al (2016) EIE: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 243–254. https://doi.org/10.1109/ISCA.2016.30 |
| [10] |
Hao C, Zhang XF, Li YH, Huang ST, Xiong JJ, Rupnow K et al (2019) FPGA/DNN co-design: an efficient design methodology for iot intelligence on the edge. In: Proceedings of the 56th Annual Design Automation Conference 2019. IEEE, pp 1–6 |
| [11] |
Huo FH, Liu Y, Zheng HY (2024) Towards low-cost and energy-optimized underwater image classification based-on FPGA. In: OCEANS 2024. IEEE, pp 1–6 |
| [12] |
|
| [13] |
Jai Surya S, Balihallimath S, Senapati A, Kasthuri Bha JK (2025) Real-time and energy-efficient ship detection using YOLOv8 on FPGA platforms for marinetime surveillance. In: 2025 International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI). IEEE, pp 1–7 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
Que ZQ, Nakahara H, Nurvitadhi E, Fan HX, Zeng CL, Meng JX et al (2020) Optimizing reconfigurable recurrent neural networks. In: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp 10–18 |
| [18] |
|
| [19] |
|
| [20] |
Shi XJ, Gao ZH, Lausen L, Wang H, Yeung DY, Wong WK et al (2017) Deep learning for precipitation nowcasting: a benchmark and a new model. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). NIPS, pp 5617–5627 |
| [21] |
Si N, Wen Q, Ko SB, Zhang H (2025) Efficient multi-precision approximate posit multiply-accumulate unit. In: 2025 IEEE 14th International Conference on Communications, Circuits and Systems (ICCCAS). IEEE, pp 75–80 |
| [22] |
|
| [23] |
Tang SN, Chen YH, Chang YW, Chen YT, Chou SH (2023) Hybrid CNN-LSTM network for ECG classification and its software-hardware co-design approach. In: 2023 20th International SoC Design Conference (ISOCC). IEEE, pp 173–174 |
| [24] |
|
| [25] |
|
| [26] |
Wang JS, Lou QW, Zhang XF, Zhu C, Lin YH, Chen DM (2018a) Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp 163–169 |
| [27] |
Wang S, Li Z, Ding CW, Yuan B, Qiu QR, Wang YZ et al (2018b) C-LSTM: enabling efficient LSTM using structured compression techniques on FPGAs. In: FPGA’18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 11–20 |
| [28] |
Xu ZL, Ren J, Zhang YP, Ondina JMG, Olabarrieta M, Xiao TS et al (2025) Accelerate coastal ocean circulation model with AI surrogate. In: 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 223–235 |
| [29] |
Yang Y, Ge F, Qiu DF, Yue X, Li ZY, Zhou F et al (2021) Implementation of reconfigurable CNN-LSTM accelerator based on FPGA. In: 2021 IEEE 21st International Conference on Communication Technology (ICCT). IEEE, pp 1026–1030 |
| [30] |
|
| [31] |
|
The Author(s)
/
| 〈 |
|
〉 |