Energy-efficient reconfigurable FPGA accelerator for marine convolutional neural network-long short-term memory applications

Qiuyu Wang; Qi Wen; Zhiqiang Wei; Haiyang Mao; Ruitao Tao; Hao Zhang

doi:10.1007/s44295-026-00098-3

Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) :9 DOI: 10.1007/s44295-026-00098-3

Research Paper

research-article

Energy-efficient reconfigurable FPGA accelerator for marine convolutional neural network-long short-term memory applications

Qiuyu Wang ¹^,²
, Qi Wen ¹^,²
, Zhiqiang Wei ¹^,³
, Haiyang Mao ²^,⁴
, Ruitao Tao ²^,⁵
, Hao Zhang ¹^,²^,^a

Author information +

History +

PDF

Abstract

Deep learning models that combine convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have demonstrated strong capabilities in spatiotemporal feature extraction, proving effective for applications such as ocean environment monitoring and forecasting. Specialized artificial intelligence (AI) processors are often required for marine equipment with constrained computational resources and energy budgets to handle AI workloads. However, the distinct computational and memory access patterns of CNNs and LSTMs present significant challenges for designing efficient edge AI processors; existing hardware accelerators often struggle to efficiently support the heterogeneous computational patterns, irregular dataflow, and dynamic precision requirements of such hybrid models. To address these challenges, this paper proposes a dynamically reconfigurable field-programmable gate array (FPGA)-based accelerator tailored for parallel CNN-LSTM computation. The proposed architecture integrates a mixed-precision computation array, multilevel reconfigurable processing elements, and a triple-mode dataflow controller supporting weight-stationary/output-stationary/row-stationary dataflow, thereby enabling adaptive resource allocation and enhanced data reuse under diverse computation patterns. The accelerator is designed to efficiently execute both individual and hybrid CNN-LSTM workloads. Experimental evaluation on a representative ConvLSTM-based sea surface temperature prediction task demonstrates that the proposed design achieves high throughput and energy efficiency in both convolutional and recurrent computation phases.

Keywords

FPGA accelerator / Convolutional neural network-long short-term memory / Mixed precision / Dataflow optimization / Ocean monitoring

Cite this article

Download citation ▾

Qiuyu Wang, Qi Wen, Zhiqiang Wei, Haiyang Mao, Ruitao Tao, Hao Zhang. Energy-efficient reconfigurable FPGA accelerator for marine convolutional neural network-long short-term memory applications. Intelligent Marine Technology and Systems, 2026, 4(1): 9 DOI:10.1007/s44295-026-00098-3

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Al Amin R, Hasan M, Wiese V, Obermaisser R. FPGA-based real-time object detection and classification system using YOLO for edge computing. IEEE Access. 2024, 12: 73268-73278.

[2]	Cao SJ, Zhang C, Yao ZL, Xiao WC, Nie LS, Zhan DC et al (2019) Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: FPGA’19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 63–72

[3]	Chen YH, Emer J, Sze V (2016) Eyeriss: a spatial architecture for energy-efficient convolutional neural networks. In: 2016 ACM/IEEE 43rd International Symposium on Computer Architecture (ISCA). IEEE, pp 367–379

[4]	Farhangi F, Sadeghi-Niaraki A, Bazargani JS, Razavi-Termeh SV, Hussain D, Choi SM. Time-series hourly sea surface temperature prediction using deep neural network models. J Mar Sci Eng. 2023, 11(6): 1136.

[5]	Gao C, Neil D, Ceolini E, Liu SC, Delbruck T (2018) DeltaRNN: a power-efficient recurrent neural network accelerator. In: FPGA’18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 21–30

[6]	Garduño E, Ciprian-Sanchez J, Vazquez-Garcia V, Gonzalez-Mendoza M, Rodriguez-Hernandez G, Palacios-Rosas Aet al. . An FPGA smart camera implementation of segmentation models for drone wildfire imagery. Comput Sist. 2023, 274965-977

[7]	Hale C (2024) Dynamic reconfigurable CNN accelerator for embedded edge computing: a hardware-software co-design approach to minimize power and resource consumption. Trans Comput Sci Methods 4(9):1–10. https://pspress.org/index.php/tcsm/article/view/139

[8]	Han S, Kang JL, Mao HZ, Hu YM, Li X, Li YB et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: FPGA’17: Proceedings of the 20178 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 1–10

[9]	Han S, Liu XY, Mao HZ, Pu J, Pedram A, Horowitz MA et al (2016) EIE: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 243–254. https://doi.org/10.1109/ISCA.2016.30

[10]	Hao C, Zhang XF, Li YH, Huang ST, Xiong JJ, Rupnow K et al (2019) FPGA/DNN co-design: an efficient design methodology for iot intelligence on the edge. In: Proceedings of the 56th Annual Design Automation Conference 2019. IEEE, pp 1–6

[11]	Huo FH, Liu Y, Zheng HY (2024) Towards low-cost and energy-optimized underwater image classification based-on FPGA. In: OCEANS 2024. IEEE, pp 1–6

[12]	Irmak H, Corradi F, Detterer P, Alachiotis N, Ziener D. A dynamic reconfigurable architecture for hybrid spiking and convolutional FPGA-based neural network designs. J Low Power Electron Appl. 2021, 11(3): 32.

[13]

Jai Surya S, Balihallimath S, Senapati A, Kasthuri Bha JK (2025) Real-time and energy-efficient ship detection using YOLOv8 on FPGA platforms for marinetime surveillance. In: 2025 International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI). IEEE, pp 1–7

[14]	Kim VH, Choi KK. A reconfigurable CNN-based accelerator design for fast and energy-efficient object detection system on mobile FPGA. IEEE Access. 2023, 11: 59438-59445.

[15]	Lin M, Yang CJ. Ocean observation technologies: a review. Chin J Mech Eng. 2020, 33132.

[16]	Lu CX, Wang Z, Wu ZL, Zheng YX, Liu YX. Global ocean wind speed retrieval from GNSS reflectometry using CNN-LSTM network. IEEE Trans Geosci Remote Sensing. 2023, 61: 5801112.

[17]	Que ZQ, Nakahara H, Nurvitadhi E, Fan HX, Zeng CL, Meng JX et al (2020) Optimizing reconfigurable recurrent neural networks. In: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, pp 10–18

[18]	Rahman RH, Frater MR. Delay-tolerant networks (DTNs) for underwater communications. Advances in delay-tolerant networks (DTNs). 20152Oxford, Woodhead Publishing81-103.

[19]	Shi KS, Wang MW, Tan X, Li QH, Lei T. Efficient dynamic reconfigurable CNN accelerator for edge intelligence computing on FPGA. Information. 2023, 143194.

[20]	Shi XJ, Gao ZH, Lausen L, Wang H, Yeung DY, Wong WK et al (2017) Deep learning for precipitation nowcasting: a benchmark and a new model. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). NIPS, pp 5617–5627

[21]	Si N, Wen Q, Ko SB, Zhang H (2025) Efficient multi-precision approximate posit multiply-accumulate unit. In: 2025 IEEE 14th International Conference on Communications, Circuits and Systems (ICCCAS). IEEE, pp 75–80

[22]	Sze V, Chen YH, Yang TJ, Emer JS. Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE. 2017, 105122295-2329.

[23]	Tang SN, Chen YH, Chang YW, Chen YT, Chou SH (2023) Hybrid CNN-LSTM network for ECG classification and its software-hardware co-design approach. In: 2023 20th International SoC Design Conference (ISOCC). IEEE, pp 173–174

[24]	Trice A, Robbins C, Philip N, Rumsey M. Challenges and opportunities for ocean data to advance conservation and management. 2021, Washington DC, Ocean Conservancy Inc.

[25]	Wang HS, Li DJ, Isshiki T. A low-power reconfigurable DNN accelerator for instruction-extended RISC-V. IPSJ Trans Syst LSI Des Methodol. 2024, 17: 55-66.

[26]	Wang JS, Lou QW, Zhang XF, Zhu C, Lin YH, Chen DM (2018a) Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, pp 163–169

[27]	Wang S, Li Z, Ding CW, Yuan B, Qiu QR, Wang YZ et al (2018b) C-LSTM: enabling efficient LSTM using structured compression techniques on FPGAs. In: FPGA’18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Association for Computing Machinery, pp 11–20

[28]	Xu ZL, Ren J, Zhang YP, Ondina JMG, Olabarrieta M, Xiao TS et al (2025) Accelerate coastal ocean circulation model with AI surrogate. In: 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, pp 223–235

[29]	Yang Y, Ge F, Qiu DF, Yue X, Li ZY, Zhou F et al (2021) Implementation of reconfigurable CNN-LSTM accelerator based on FPGA. In: 2021 IEEE 21st International Conference on Communication Technology (ICCT). IEEE, pp 1026–1030

[30]	Zhang N, Wei X, Chen H, Liu WC. FPGA implementation for CNN-based optical remote sensing object detection. Electronics. 2021, 103282.

[31]	Zhou X, Xie W, Zhou H, Cheng YJ, Wang XM, Ren Yet al. . An accelerated FPGA-based parallel CNN-LSTM computing device. IEEE Access. 2024, 12: 106579-106592.