Deep learning models fusing transformer-stacked long short-term memory: An efficient prediction method for nonlinear seismic response of buildings and bridges

Chang HE; Gao ZHANG; Jin ZHANG; Zhongyao ZHANG

doi:10.1007/s11709-025-1251-y

Front. Struct. Civ. Eng. ›› 2025, Vol. 19 ›› Issue (12) :1951 -1966. DOI: 10.1007/s11709-025-1251-y

RESEARCH ARTICLE

Deep learning models fusing transformer-stacked long short-term memory: An efficient prediction method for nonlinear seismic response of buildings and bridges

Author information +

History +

PDF (9801KB)

Abstract

The destructive mechanisms of earthquakes on engineering structures are highly complex, and developing high-precision response prediction models is a critical scientific issue for enhancing the seismic resilience of infrastructure. This paper innovatively proposes a hybrid deep learning architecture based on a stacked long short-term memory network combined with a Transformer (ST-LSTM). By constructing a spatiotemporal feature fusion mechanism, the model significantly improves the accuracy of structural seismic response prediction. To systematically validate the model’s performance, two typical engineering cases were selected for comparative analysis: first, a six-story concrete hotel building in the United States was studied to thoroughly analyze its historical seismic displacement response characteristics; second, a 426-m-span rigid frame bridge was targeted to predict its nonlinear curvature response behavior. The results indicate that the proposed ST-LSTM model significantly outperforms traditional long short-term memory models in terms of computational efficiency and prediction accuracy. In the two cases, the model’s coefficient of determination (R²) reached 0.975 and 0.983, respectively, with the predicted peak error controlled within 5% and 7%. The research findings provide new technical means for real-time health monitoring and intelligent seismic assessment of engineering structures, holding significant theoretical value and engineering significance for enhancing the seismic resilience of infrastructure.

Graphical abstract

Keywords

deep learning / long short-term memory / transformer / earthquake engineering / response prediction

Cite this article

Download citation ▾

Chang HE, Gao ZHANG, Jin ZHANG, Zhongyao ZHANG. Deep learning models fusing transformer-stacked long short-term memory: An efficient prediction method for nonlinear seismic response of buildings and bridges. Front. Struct. Civ. Eng., 2025, 19(12): 1951-1966 DOI:10.1007/s11709-025-1251-y

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

In recent years, the damage to buildings caused by natural disasters such as earthquakes and mudslides has become increasingly severe. Traditional approaches for analyzing structural dynamic responses under such hazards have primarily relied on experimental modeling or linear/nonlinear time-history analysis [1–3]. Despite significant advancements in related research over the past decades, building structures subjected to strong seismic actions remain at high risk of severe damage or even collapse. With the continuous expansion of building scales, effectively mitigating structural damage under seismic loading has become a critical challenge that requires breakthroughs in China’s engineering field [4,5].

The rapid advancement of deep learning techniques has brought transformative changes to structural engineering. While significant progress has been made in applying deep learning methods to this field, research remains in the exploratory and developmental stages [6–19]. Current investigations primarily focus on two key directions: convolutional neural network (CNN)-based image recognition techniques and time series-based neural network prediction methods. In image recognition research, several notable breakthroughs have been achieved. Yang et al. [19] proposed a U-Net++ CNN model that demonstrated robust predictive performance under varying data missing rates, achieving 92% recognition accuracy. Zoubir et al. [20] developed an innovative end-to-end semantic segmentation framework that significantly outperformed traditional segmentation models. Chen et al. [21] successfully constructed an efficient automated structural damage recognition system by optimizing the AlexNet CNN through transfer learning techniques. Teng et al. [22] introduced a damage detection method based on population analysis, establishing a random model population and integrating multi-source damage indicators as CNN inputs, thereby demonstrating the method’s strong engineering applicability. Additional advancements include Fu et al. [23], who collected field acoustic emission data from concrete components and innovatively combined attention mechanisms with multi-scale convolutional modeling for efficient signal denoising and analysis. Zhang et al. [24] successfully achieved high-precision inversion of concrete wall damage processes under seismic action using advanced diffusion modeling techniques. However, it should be noted that image recognition-based methods still face challenges in practical engineering applications. These include the requirement for extensive image data sets for model training and computational time constraints, which somewhat limit the technique’s potential for widespread engineering implementation.

In numerical prediction, deep learning models such as recurrent neural network (RNN), long short-term memory (LSTM) networks, and Transformers have been widely employed. RNN have demonstrated success in predicting structural seismic responses by processing time-series data sequentially while retaining historical information through hidden states. Compared with conventional RNN, LSTM and gated recurrent unit networks exhibit superior performance in structural dynamics analysis due to their enhanced long-term dependency modeling capabilities [25,26]. Recent years have witnessed significant breakthroughs in this domain. Zhang et al. [27] achieved accurate seismic response prediction for multistory frame structures using a dual-layer LSTM network. Zhao et al. [28] precisely forecasted bridge member responses under combined earthquake-train loading by developing an LSTM-RNN model. Additionally, Tang et al. [29] introduced an improved LSTM method to mitigate modeling challenges arising from limited experimental data for composite structures. Hao et al. [30] leveraged the complementary strengths of LSTM and CNN architectures to enable precise seismic response prediction for tunnel models. Yu and Li [31] innovatively implemented a hybrid LSTM-Transformer framework to accurately predict load-displacement curves of concrete structures. Furthermore, Abu Zouriq et al. [32] addressed inherent limitations of traditional structural health monitoring (SHM) systems, particularly concerning direct physical response measurements in complex infrastructures, through RNN-based modeling. These studies collectively demonstrate that deep learning methods achieve remarkable accuracy in structural response prediction, enabling rapid post-earthquake bridge damage assessment. Such capabilities significantly enhance post-disaster emergency response efficiency and provide critical time savings for rescue decision-making.

While deep learning models including RNN, LSTM, CNN, and Transformer [33] have demonstrated considerable advancements in structural response prediction, current research predominantly focuses on enhancing their capacity to capture both short- and long-term dependencies in time-series analysis [34]. Nevertheless, individual models present inherent limitations: LSTM, despite their proficiency in modeling temporal dependencies, exhibit constrained feature representation capabilities when implemented independently, a critical factor for improving prediction accuracy and reliability [35]. In comparison, Transformers leverage attention mechanisms to offer superior performance in feature extraction and deep representation learning, enabling more effective identification of complex data relationships and long-term dependencies.

The stacked long short-term memory-Transformer deep learning (ST-LSTM) model (Fig. 1), which incorporates a stacking window mechanism to optimize computational efficiency while effectively combining the temporal modeling capabilities of LSTM with the feature enhancement properties of Transformer. The proposed architecture consists of three key components: 1) an LSTM network for capturing temporal dependencies in structural dynamic responses; 2) a multi-head attention mechanism from Transformer for enhanced feature extraction; 3) a spatio-temporal feature fusion module for integrated multi-dimensional information processing. Through comprehensive evaluation on both frame structures and long-span rigid bridges, the model demonstrates superior performance in seismic response prediction, exhibiting significant improvements in both accuracy and robustness compared to conventional approaches. These advancements not only overcome the limitations of traditional single-model architectures but also provide an effective technical solution for real-time SHM and intelligent disaster early-warning systems.

2 Model introduction

Deep learning has gained widespread adoption across numerous domains owing to its exceptional predictive capabilities, robust classification performance, and advanced machine learning characteristics. As a specialized variant of RNN, LSTM networks retain the recursive node connections of traditional RNN for effective time-series modeling while substantially enhancing network performance through a sophisticated gating mechanism (comprising input, forget, and output gates). This gating architecture enables precise regulation of information flow, effectively mitigating the persistent challenges of gradient vanishing and explosion inherent in conventional RNN during extended sequence training.

2.1 Long short-term memory cell framework

The LSTM network represents a specialized RNN architecture comprising input layers, multiple hidden layers, and output layers. The architecture’s core innovation resides in its unique LSTM unit design, which incorporates four fundamental components that collectively establish an advanced gating system [36].

Input gate: regulates information inflow by employing a sigmoid activation function to determine which new information should be stored in the cell state, while coordinating with a tanh layer to facilitate state updates.

Forget gate: utilizes a sigmoid function to evaluate historical information within the cell state, implementing intelligent decisions regarding data retention or discarding to enable dynamic long-term dependency management.

tanh layer: applies hyperbolic tangent activation to normalize values to the [−1,1] range, thereby controlling information flow magnitude, maintaining gradient stability, and contributing to candidate value generation.

Output gate: governs final network outputs by combining sigmoid and tanh operations to determine which features should be propagated based on current inputs and cell state.

As shown in Fig. 2, the input data are defined is x_t, the forgetting gate is f_(t), the input gate is i_(t), the output gate is o_(t), the cell is stored as c_t, and the output of the hidden layer is h_t; whereas, for the data of the previous time step, the cell is stored as c_t−1, and the output of the hidden layer is h_t−1. Each variable is defined as follows:

(1)

f (t) = σ W x f x t + W h f h t − 1 + b f,

(2)

i (t) = σ W x i x t + W h i h t − 1 + b i,

(3)

o (t) = σ W x o x t + W h o h t − 1 + b o,

(4)

c ~ (t) = tan ⁡ h W x c x t + W x c h t − 1 + b c,

(5)

c (t) = f (t) ⊙ c (t − 1) + i (t) ⊙ c ~ (t),

(6)

h t = o (t) ⊙ t a n h c (t),

where W_xβ (β = {f,i,o}) is the weight matrix of the forgetting gate, input gate, and output gate, σ is the sigmoid function, tanh is the hyperbolic tangent function, b_β (β = {f,i,o}) is the bias vector of the forgetting gate, the input gate, and the output gate, and

⊙

is the element-by-element multiplication of the matrices or vectors.

2.2 Long short-term memory model framework

The deep learning framework for time series analysis employs a LSTM architecture, comprising multiple LSTM layers followed by a fully connected layer (Fig. 3). The fully connected layer establishes complete connections with all activation units from the preceding layer, serving as an intermediary between the LSTM layers and the output layer to ensure sequential prediction outputs.

To enhance training efficiency and model generalizability, we adopt the Adam optimization algorithm for network training. This optimizer integrates the advantages of momentum-based gradient descent and adaptive learning rates, significantly accelerating convergence.

To minimize the number of samples required for training and prediction while fully leveraging the data set’s information, we employ the time series k-means algorithm proposed by Huang et al. [37].

(7)

J (U, Z, W) = ∑ p = 1 k ∑ i = 1 n ∑ j = 1 m β i p w p j (x i j − z i j) 2 + 12 α ∑ p = 1 k ∑ i = 1 m − 1 (w p j − w p j − 1) 2,

where U is an n × k binary matrix, Z = {Z₁,Z₂,Z₃,...,Z_k} and W = {W₁,W₂,W₃,...,W_k} denote a set of k-vectors, respectively.

β_ip and w_pj are constrained:

(8)

{∑ p = 1 k β i p = 1, β i p ∈ {0, 1}, ∑ j = 1 m w p j = 1, 0 ⩽ w p j ⩽ 1.

The smoothness of neighboring time series is determined by the weighting factor α:

(9)

α = ∑ i = 1 m ∑ j = 1 n (x i j − ∑ i − 1 m x i j m) 2 .

The optimization problem is solved by an iterative sequential process:

(10)

β i p = {1, i f (D p i ⩽ D p, i, p ≠ p,, 1 ⩽ p,), 0, o t h e r s,

where

D p i = ∑ j = 1 n w p j (x i j − z i j) 2

Through iterative updates of U, Z, and W, the algorithm repeats this process until the objective function converges.

2.3 Full sequence-to-sequence long short-term memory (F-LSTM)

The input ground motion time history is represented as

X = {x 1, x 2, . . ., x m} T

, while the corresponding structural response output is

Y = {y 1, y 2, . . ., y m} T

, where both X and Y are matrices with rows representing time steps and columns containing the characteristic values of inputs and outputs.

The LSTM framework processes the ground motion sequence X as input to predict the structural response sequence Y, requiring iteration through all time steps x_m. This results in an ultra-long chain structure characteristic of LSTM networks. However, two significant computational challenges emerge: 1) the increasing computational cost and training time with longer time steps; 2) substantial memory requirements for storing the extended network architecture, both of which considerably reduce computational efficiency. Consequently, there is a critical need to develop alternative prediction methods that maintain accuracy while enhancing computational efficiency.

2.4 Stacked long short-term memory (S-LSTM)

To optimize computational efficiency and memory usage, the LSTM network processes the input ground motion time history

X = {x 1, x 2, . . ., x m} T

and corresponding structural response

Y = {y 1, y 2, . . ., y m} T

by partitioning the time series into non-overlapping subsequences using a sliding window of length w [38].

As illustrated in Fig. 4, the original m-step ground motion input is segmented into n subsequences, yielding modified input and output sequences

X = {x 1, x 2, . . ., x m} T

and

Y = {y 1, y 2, . . ., y m} T

, respectively. The LSTM framework predicts only the terminal response value Y = y_t for each subsequence, resulting in n output prediction steps. Notably, the subsequences must remain mutually exclusive (non-overlapping), and the window length w critically influences prediction accuracy.

2.5 Transformer model

As shown in Fig. 5, the core component of the Transformer model, the self-attention mechanism, is inherently alignment-invariant, which prevents the model from directly sensing the relative or absolute positional information of individual elements in the input sequence. To compensate for this shortcoming, the Transformer architecture needs to explicitly introduce positional encoding to provide the model with the necessary sequence order information [39,40].

For the implementation of position encoding, Transformer uses a fixed position encoding scheme based on sine and cosine functions:

(11)

P E (p o s, 2 i) = sin ⁡ (p o s 10000 2 i / d m o d e l),

(12)

P E (p o s, 2 i) = cos ⁡ (p o s 10000 2 i / d m o d e l),

where pos is the position of the data in the sequence, i is the dimension index of the position encoding vector, d_model is the dimension of the data embedding vector.

The self-attention mechanism operates as follows: for an input sequence X = (x₁,x₂,...,x_n), where each x_i denotes a sequence element, the model computes the corresponding Query (Q), Key (K), and Value (V) vectors through three independent linear transformations:

(13)

Q = X W Q, K = X W K, V = X W V,

where W_Q, W_K, W_V are trainable weight matrices.

As shown in Fig. 5, Q is the relevant current position; K is the potential matches of the query; V contains the information to be aggregated based on the attention weights. Thus, the attention scores and outputs for each header can be computed in the following way:

(14)

A t t e n t i o n (Q, K, V) = s o f t max (Q K T d k) V,

where d_k is the dimension of the key vector and the scaling factor

d k

is used to prevent the gradient from disappearing due to too large dot product values.

2.6 Stacked long short-term memory-Transformer deep learning model

The proposed ST-LSTM architecture integrates one Transformer layer with two LSTM layers were optimized through Bayesian optimization [41]. And all experiments were conducted on NVIDIA GeForce RTX 3060 GPUs using PyCharm 2024.2.3 development environment.

Use R to represent the correlation between the target sequence P_i and the predicted sequence Y_i.

(15)

R = C o v (P i, Y i) V a r (P i) V a r (Y i),

where Cov (P_i,Y_i) represents the covariance between predicted and target sequences, while Var (P_i) and Var (Y_i) denote their respective variances. The R-value ranges between −1 and 1, with values approaching 1 indicating superior prediction performance.

The coefficient of determination (R²) and root mean square error (RMSE) are used as the model evaluation indexes, expressed as follows:

(16)

R 2 = 1 − ∑ i = 1 n (y i − y ~ i) 2 ∑ i = 1 n (y i − y ~) 2,

(17)

R M S E = 1 n ∑ i = 1 n (y i − y ~ i) 2 .

In addition, the mean square error (MSE) is used as the loss function:

(18)

M S E = 1 n ∑ i = 1 n (y i − y ~ i) 2 .

In Eqs. (16)–(18), n is the sample size,

y ~ i

is the predicted value,

y i

is the target value,

y ~

is the mean of the target value.

3 Case 1 (building structure)

3.1 Model introduction

Case 1 examines a 6-story reinforced concrete hotel structure constructed in 1970 in San Bernardino, California [27,42]. This representative mid-rise concrete building was instrumented with nine seismic displacement recorders distributed across the first, third, and roof levels (Fig. 6) for SHM. From the recorded seismic events between 1987 and 2018, we selected and preprocessed 21 valid data sets. To ensure robust analysis, the data were partitioned into training (11 data sets), validation (4 data sets), and test (6 data sets) subsets. The model training epoch was 7000 and the stacking window was set to 20.

3.2 Parameter settings

Case 1 hyperparameter optimization results indicate that the optimal configuration for the network architecture was a two-layer LSTM structure (64 neurons in the first layer and 200 neurons in the second layer), a dropout rate of 0.2, a batch size of 32, an initial learning rate of 4.5 × 10⁻⁴, and a fully connected layer dimension of 100. The model topology constructed based on the optimal parameters is detailed in Table 1. Notably, the number of neurons in the second LSTM layer was significantly higher than that in the first layer (200 vs 64), indicating that the system may require stronger deep feature extraction capabilities.

3.3 Loss curve

The training loss curve in Fig. 7 shows that the ST-LSTM model converges significantly better than the traditional LSTM model on both the training set and the validation set. For instance, the minimum validation loss values for ST-LSTM, S-LSTM, and F-LSTM were 5.27 × 10⁻⁷, 9.15 × 10⁻⁷, and 1.27 × 10⁻⁶, respectively. The ST-LSTM model achieved the lowest error and demonstrated faster convergence than the other two models. Notably, the S-LSTM model displayed more stable convergence behavior in later training stages, suggesting superior generalization capability and convergence performance.

3.4 Results discussion

To comprehensively evaluate the performance of the proposed ST-LSTM model for earthquake response prediction, five representative deep learning models were selected for S-LSTM, F-LSTM, stacked bidirectional LSTM (S-BILSTM) [43], stacked Transformer (S-Transformer), and stacked RNN (S-RNN) [44]. These comparative models encompassed the prevailing architectures in time-series prediction, enabling systematic validation of the ST-LSTM model’s predictive capabilities from multiple perspectives.

Figure 8 presents the comparative prediction performance of the models on the test set. In terms of the correlation coefficient (R), the ST-LSTM model achieved optimal performance (R = 0.987), surpassing all other compared models (S-LSTM: 0.975; F-LSTM: 0.984; S-BILSTM: 0.977; S-Transformer: 0.975; S-RNN: 0.928). All deep learning models based on stacked architectures (R > 0.975) exhibit excellent prediction performance except S-RNN, confirming the robustness and reliability of stacked architectures for earthquake response prediction. In addition, the traditional F-LSTM model requires 59.48 h to complete training, while the ST-LSTM proposed in this paper only requires 4.52 h, representing a 13.16-fold improvement in training efficiency. This indicates that the model proposed in this paper not only achieves higher accuracy but also significantly improves computational efficiency compared to traditional models.

To thoroughly validate the prediction reliability of the proposed ST-LSTM model, this study conducted a comparative analysis of three representative test cases (Work1, Work2, Work3) from the test set, as illustrated in Fig. 9. The results demonstrated strong agreement between the displacement response curves predicted by the ST-LSTM model and the actual measured data, with R of 0.959 and 0.988 for Work1 and Work2, respectively, significantly higher than those of other models. Notably, under the most challenging Work3 condition, the ST-LSTM model exhibited superior performance, achieving a R of 0.917 compared to actual data, substantially outperforming all comparative models (S-LSTM: 0.79; F-LSTM: 0.831; S-BILSTM: 0.807; S-Transformer: 0.853; S-RNN: 0.676). Furthermore, the model maintained excellent peak displacement prediction accuracy, with peak errors consistently within 5% across all test cases, further confirming its reliability for practical engineering applications.

For quantitative evaluation, the R² and RMSE were employed to validate the superiority of the ST-LSTM model through comparative experiments. As illustrated in Fig. 10, the proposed ST-LSTM model demonstrated consistent advantages across the training, validation, and test sets, with all evaluation metrics surpassing those of comparative models. Notably, in the test set evaluation, the most representative performance indicator, the ST-LSTM model achieved an R² value of 0.975, showing marked improvement over competing models (S-LSTM: 0.933; F-LSTM: 0.891; S-BILSTM: 0.945; S-Transformer: 0.964; S-RNN: 0.927).

As detailed in Table 2, the ST-LSTM model exhibited significant enhancements over the S-LSTM baseline across all evaluation metrics. The R² values improved by 1.4%, 2.4%, and 4.5% for the training, validation, and test sets, respectively, while corresponding RMSE reductions reached 32.8%, 20.5%, and 17.6%. These results not only confirmed the ST-LSTM model’s superior feature extraction capability and prediction accuracy but also validated the effectiveness of its spatiotemporal feature fusion mechanism.

4 Case 2 (bridge structures)

The preceding section has demonstrated the superior predictive capability of the ST-LSTM model in forecasting seismic responses of building structures. This section extends the evaluation to assess the model’s efficacy in predicting nonlinear structural responses of bridge systems, with particular emphasis on its comparative performance against two conventional LSTM variants: S-LSTM and F-LSTM. The comprehensive comparative analysis aims to further validate the predictive accuracy and robustness of the proposed ST-LSTM framework.

4.1 Rigid bridge model

Case 2 was a large-span rigid bridge, the span combination was (68 m + 145 m + 145 m + 68 m), the total length of the bridge was 426 m. The side span piers were single piers, the middle span was double-limbed thin-walled piers, the foundations were all based on group piles, and the main girders all used hollow sections. The whole bridge structure was mainly reinforced concrete, the bridge crossed the mountainous canyon, connecting with the tunnel at both ends, and the geotechnical characteristics of the bridge foundation were: silt, fine sand, granite (Fig. 11 for the bridge model) [45].

After extracting the nodal and mass information from the bridge finite element model based on MIDAS/Civil, a refined finite element model of the rigid bridge considering the material nonlinearity was established using the OPENSEES platform. The main girders were simulated by elastic beam-column units, and the piers were simulated by nonlinear beam-column units based on the flexibility method combined with the fiber cross-section method to simulate their nonlinear characteristics; the pile foundations were simulated by p–y springs to simulate the horizontal pile-side force (PySimple1 material), t–z springs to simulate the vertical frictional resistance of the pile-side force (TzSimple1 material), and q–z springs to simulate the vertical force at the pile tip (QzSimple1 material); Concrete02 fiber material was used to simulate the core and non-core concrete, Steel02 material was used to simulate the steel reinforcement, Harding Materials material was used to simulate the bearing, and the bearing connection was realized by the Two-node Link unit. The modeling method realized the accurate simulation of the overall mechanical properties of the rigid bridge through the combination of fiber cross-section and various spring units.

A comparative analysis of the natural vibration periods obtained from three finite element software (OPENSEES, MIDAS, and LS-DYNA) was conducted, as presented in Table 3. The results demonstrated that the OPENSEES finite element model employed in this study accurately captured the dynamic characteristics of the actual structure, and established finite element model can be effectively utilized for generating computational data sets to support subsequent machine learning training and validation processes.

Case 2 The results of hyperparameter optimization show that the optimal configuration of the network architecture was: a two-layer LSTM structure (128 neurons in the first layer and 100 neurons in the second layer), a dropout rate of 0.23, a batch size of 32, an initial learning rate of 3.7 × 10⁻⁴, and a fully connected layer dimension of 100. The model topology constructed based on the optimal parameters was shown in Table 4.

4.2 Database setup

The study selected suitable ground motion records from the PEER seismic database as model inputs, with all data undergoing uniform preprocessing to standardize the input format. Specifically, the time step of acceleration time histories was standardized to 0.005 s, and each record was truncated to a 20 s duration, resulting in 4000 data points per record to ensure dimensional consistency. The curvature response at the base of 3#Pier was selected as the prediction target due to its effectiveness in reflecting structural nonlinear deformation characteristics. For data set construction, 37 ground motion records (including both pulse-like and non-pulse-like motions) and their corresponding structural responses were used for training, while 13 records served as the validation set. Additionally, 13 pulse-like ground motions were selected as the test set to evaluate model prediction capability, with detailed ground motion parameters provided in Tables 5 and 6. The computational parameters included set stacking window was 20, epoch was 7000 steps. This approach not only ensured training data diversity but also effectively validated the model’s predictive performance for unknown ground motions, thereby enhancing its engineering applicability.

4.3 Results discussion

Figure 12 demonstrated the comprehensive evaluation results of the prediction performance of each comparison model on the test set. In terms of prediction accuracy index, the ST-LSTM model proposed in this paper showed excellent performance, with a R of 0.946 between its prediction results and the actual values, which was 3.4% higher compared to the S-LSTM model (R = 0.915). It was worth noting that although the F-LSTM model performed optimally in terms of prediction accuracy (R = 0.988), its computational cost was significantly higher, and the training time (t = 167 h) reached 18 times of the ST-LSTM model (t = 8.63 h). Considering the two key metrics of prediction accuracy and computational efficiency, the ST-LSTM model showed the best overall performance: it maintained a high prediction accuracy (R > 0.94) while significantly improving the computational efficiency (the training time was only 5.6% of the F-LSTM). This finding corroborated with the findings of Case 1, which further validated the advantages of the ST-LSTM model in balancing prediction accuracy and computational efficiency, and reflected the practical value of the model in real engineering applications.

A detailed comparison of bridge curvature predictions revealed strong agreement between the ST-LSTM model outputs and actual measurements, as illustrated in Fig. 13. All models demonstrated good correlation with observed data (R > 0.95), with the ST-LSTM predictions showing particularly close alignment throughout the entire time history. The ST-LSTM model exhibited superior performance in peak value prediction, achieving a peak error of just 6.2%, significantly lower than the S-LSTM (14.7%) and F-LSTM (9.2%) models. These results not only confirmed the ST-LSTM model’s exceptional capability in temporal feature extraction but also demonstrated its engineering practicality, with prediction accuracy falling well within the acceptable engineering margin of 5%–10%. The model’s reliable prediction of extreme structural responses provided valuable technical support for seismic safety assessments.

The model performance was quantitatively evaluated using the R² and RMSE metrics. As shown in Fig. 14, while the test set prediction performance was slightly lower than that of the training and validation sets, the ST-LSTM model still achieved excellent results on the test set, with an R² value of 0.983, representing a 3.5% improvement over the S-LSTM model (R² = 0.950). Furthermore, Table 7 demonstrates that the ST-LSTM model consistently outperformed other models across all data sets, showing R² improvements of 0.5%, 2.1%, and 3.5% on the training, validation, and test sets, respectively. Corresponding RMSE reductions were even more substantial, reaching 27.6%, 4.3%, and 7.5% for these data sets. These findings confirmed that the ST-LSTM model possessed both strong generalization capability and high prediction accuracy for bridge response forecasting.

5 Conclusions

This study proposes a novel ST-LSTM hybrid deep learning model for seismic response prediction. Through comprehensive comparative experiments conducted on two representative structural types (buildings and rigid bridge), the model demonstrates superior accuracy and reliability in earthquake-induced response prediction. The principal findings of this research are summarized as follows.

1) The developed ST-LSTM model exhibits remarkable accuracy and practical applicability in predicting structural responses to seismic excitations, particularly excelling in long-term sequence predictions compared to conventional prediction models.

2) In comparative case studies involving building structures and rigid bridge, the ST-LSTM model achieves substantial improvements over traditional S-LSTM and F-LSTM models across key evaluation metrics, including R, R², and RMSE. The innovative architecture of ST-LSTM, which synergistically combines LSTM’s temporal modeling strengths with Transformer’s feature enhancement capabilities, effectively overcomes the challenges associated with long-term dependencies and nonlinear feature extraction in seismic response sequences.

3) The model maintains exceptional prediction accuracy while demonstrating significant computational efficiency advantages. Experimental results reveal that in building structure prediction tasks, the model achieves a training speed 13.16 times faster than conventional F-LSTM. For bridge structure predictions, the computational efficiency improvement is even more pronounced, reaching an 18-fold increase over F-LSTM. This substantial enhancement in computational performance enables the model to significantly reduce resource requirements while maintaining high prediction accuracy.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Tong L , Wang D S , Sun Z G , Shi F , Dai J C . Seismic performance and control methods of end span uplift for long-span rigid-frame bridges subjected to near-fault ground motions. Structures, 2024, 64: 106567

[2]	Earij A , Alfano G , Cashell K , Zhou X . Nonlinear three-dimensional finite-element modelling of reinforced–concrete beams: Computational challenges and experimental validation. Engineering Failure Analysis, 2017, 82: 92–115

[3]	Jumaa G B , Yousif A R . Numerical modeling of size effect in shear strength of FRP-reinforced concrete beams. Structures, 2019, 20: 237–254

[4]	Lee S H , Abolmaali A , Shin K J , Lee H D . ABAQUS modeling for post-tensioned reinforced concrete beams. Journal of Building Engineering, 2020, 30: 101273

[5]	Shi Y , Wang W , Qin H , Shi Y , Jiao Y . Nonlinear seismic response and damage analysis for continuous prestressed concrete rigid-frame bridge considering internal force state under near-fault ground motions. Structures, 2024, 61: 105993

[6]	Zhang H , Long H , Chen F , Luo Y , Xiao X , Deng Y , Lu N , Liu Y . Temperature field prediction for a PC beam bridge with corrugated steel webs using BP neural network and measured data. Structures, 2024, 68: 107232

[7]	EshaghiM SValizadehNAnitescuCWangYZhuangXRabczukT. Multi-head neural operator for modelling interfacial dynamics. 2025, arXiv: 2507.17763

[8]	Zhang G , He C , Zhang J , Zhai Y , Zhang Z , Jiang L , Guo W . Prediction of bridge structure response and resilience assessment under main-aftershock: LSTM-Transformer model based on adaptive learning rate framework. Engineering Structures, 2025, 345: 121449

[9]	Eshaghi M S , Bamdad M , Anitescu C , Wang Y , Zhuang X , Rabczuk T . Applications of scientific machine learning for the analysis of functionally graded porous beams. Neurocomputing, 2025, 619: 129119

[10]	Es-haghi M S , Anitescu C , Rabczuk T . Methods for enabling real-time analysis in digital twins: A literature review. Computers & Structures, 2024, 297: 107342

[11]	WangYBaiJLinZWangQAnitescuCSunJEshaghiM SGuYFengX QZhuangX, . Artificial intelligence for partial differential equations in computational mechanics: A review. 2024, arXiv: 2410.19843

[12]	Wang Y , Sun J , Li W , Lu Z , Liu Y . CENN: Conservative energy method based on neural networks with subdomains for solving variational problems involving heterogeneous and complex geometries. Computer Methods in Applied Mechanics and Engineering, 2022, 400: 115491

[13]	Sun J , Liu Y , Wang Y , Yao Z , Zheng X . BINN: A deep learning approach for computational mechanics problems based on boundary integral equations. Computer Methods in Applied Mechanics and Engineering, 2023, 410: 116012

[14]	Bai W , Zhang Z , Zhang J , Guo X , Yang X , Luo Y , Guo F , Zhang B , Wang L . Biomass-derived N-doped dendritic 3D carbon@ ZnO nanoparticles as high-performance anode materials for Lithium-ion batteries. Energy Storage, 2025, 7(3): e70150

[15]	Lyu P , Tang T , Ling F , Luo J J , Boers N , Ouyang W , Bai L . ResoNet: robust and explainable ENSO forecasts with hybrid convolution and transformer networks. Advances in Atmospheric Sciences, 2024, 41(7): 1289–1298

[16]	Wang Y , Sun J , Rabczuk T , Liu Y . DCEM: A deep complementary energy method for linear elasticity. International Journal for Numerical Methods in Engineering, 2024, 125(24): e7585

[17]	Eshaghi M S , Anitescu C , Thombre M , Wang Y , Zhuang X , Rabczuk T . Variational physics-informed neural operator (VINO) for solving partial differential equations. Computer Methods in Applied Mechanics and Engineering, 2025, 437: 117785

[18]	Wang Y , Bai J , Eshaghi M S , Anitescu C , Zhuang X , Rabczuk T , Liu Y . Transfer learning in physics-informed neurals networks: Full fine-tuning, lightweight fine-tuning, and low-rank adaptation. International Journal of Mechanical System Dynamics, 2025, 5(2): 212–235

[19]	Yang Y , Xin J , Tang Q , Wang Y , Yang S X , Zhou J . Prediction method of condition degradation for network-level bridges based on U-Net++ convolutional neural network. Measurement, 2025, 241: 115748

[20]	Zoubir H , Rguig M , El Aroussi M , Saadane R , Chehri A . Pixel-level concrete bridge crack detection using convolutional neural networks, gabor filters, and attention mechanisms. Engineering Structures, 2024, 314: 118343

[21]	Chen L , Chen W , Wang L , Zhai C , Hu X , Sun L , Tian Y , Huang X , Jiang L . Convolutional neural networks (CNNs)-based multi-category damage detection and recognition of high-speed rail (HSR) reinforced concrete (RC) bridges using test images. Engineering Structures, 2023, 276: 115306

[22]	Teng S , Chen X , Chen G , Cheng L , Bassir D . Structural damage detection based on convolutional neural networks and population of bridges. Measurement, 2022, 202: 111747

[23]	Fu W , Zhou R , Gao Y , Guo Z , Yu Q . A diffusion model-based deep learning approach for denoising acoustic emission signals in concrete. Measurement, 2025, 251: 117143

[24]	Zhang T , Lu Y , Cai Y , Xu W , Wang S , Du D , Miao Q . Rapid inversion of seismic damage to masonry infill walls based on diffusion models. Engineering Failure Analysis, 2025, 171: 109371

[25]	Zhong Q M , Feng D C , Chen S Z . Multi-fidelity enhanced few-shot time series prediction model for structural dynamics analysis. Computer Methods in Applied Mechanics and Engineering, 2025, 434: 117583

[26]	Yu D , Gai T , Yang S , Zeng S , Lin J C W . An effective multi-time series model of RC column backbone curve identification. Case Studies in Construction Materials, 2024, 20: e03183

[27]	Zhang R , Chen Z , Chen S , Zheng J , Büyüköztürk O , Sun H . Deep long short-term memory networks for nonlinear structural seismic response prediction. Computers & Structures, 2019, 220: 55–68

[28]	Zhao H , Wei B , Zhang P , Guo P , Shao Z , Xu S , Jiang L , Hu H , Zeng Y , Xiang P . Safety analysis of high-speed trains on bridges under earthquakes using a LSTM-RNN-based surrogate model. Computers & Structures, 2024, 294: 107274

[29]	Tang K , Cui Y , Chen P . A deep learning method for addressing the scarcity of experimental data in composite structures: Multi-Fidelity Triple LSTM. Thin-walled Structures, 2025, 211: 113106

[30]	Hao W , Yin H , Liu J , Ma R , Zhao D , Lu S . Seismic response prediction of soil tunnel structure based on CNN-LSTM model. Structures, 2025, 73: 108381

[31]	Yu Z , Li B . Reinforced concrete beam full response prediction with hybrid feature-orientation transformer-LSTM model. Engineering Structures, 2025, 332: 120040

[32]	Abu Zouriq M F , Linzell D G , Azam S E . Image-based strain response estimation of in-situ bridge using Recurrent Neural Networks (RNNs). Structures, 2025, 73: 108436

[33]	Zhang Q , Guo M , Zhao L , Li Y , Zhang X , Han M . Transformer-based structural seismic response prediction. Structures, 2024, 61: 105929

[34]	Wang X , Bai Y , Liu X . Prediction of railroad track geometry change using a hybrid CNN-LSTM spatial-temporal model. Advanced Engineering Informatics, 2023, 58: 102235

[35]	Sun X , Wang H , Mei S . Explainable highway performance degradation prediction model based on LSTM. Advanced Engineering Informatics, 2024, 61: 102539

[36]	Lim J Y , Kim S , Kim H K , Kim Y K . Long short-term memory (LSTM)-based wind speed prediction during a typhoon for bridge traffic control. Journal of Wind Engineering and Industrial Aerodynamics, 2022, 220: 104788

[37]	HuangXYeYXiongLLauR Y KJiangNWangS. Time series k-means: A new k-means type smooth subspace clustering for time series data. Information Sciences, 2016, 367–368: 1–13

[38]	Liao Y , Lin R , Zhang R , Wu G . Attention-based LSTM (AttLSTM) neural network for seismic response modeling of bridges. Computers & Structures, 2023, 275: 106915

[39]	Bao G , Liu X , Zou B , Yang K , Zhao J , Zhang L , Chen M , Qiao Y , Wang W , Tan R . . Collaborative framework of transformer and LSTM for enhanced state-of-charge estimation in lithium-ion batteries. Energy, 2025, 322: 135548

[40]	Li W , Liu C , Xu Y , Niu C , Li R , Li M , Hu C , Tian L . An interpretable hybrid deep learning model for flood forecasting based on Transformer and LSTM. Journal of Hydrology. Regional Studies, 2024, 54: 101873

[41]	Ueki M . A deflation-adjusted Bayesian information criterion for selecting the number of clusters in K-means clustering. Computational Statistics & Data Analysis, 2025, 209: 108170

[42]	HamidHShakalAStephensCSavageWHuangMLeithWParrishJBorcherdtR. Center for engineering strong-motion data (CESMD). In: Proceedings of the 14th World Conference on Earthquake Engineering, Beijing, 2008, 12–17

[43]	Yuan Q , He M , Chen Z , Liu M , Chen X . A real-time prediction method for rate of penetration sequence in offshore deep wells drilling based on attention mechanism-enhanced BiLSTM model. Ocean Engineering, 2025, 325: 120820

[44]	Kaushal A , Gupta A K , Sehgal V K . Earthquake prediction optimization using deep learning hybrid RNN-LSTM model for seismicity analysis. Soil Dynamics and Earthquake Engineering, 2025, 195: 109432

[45]	Zhang G , Zhang J , Liu Y , Cao Y . Seismic fragility analysis of long-span rigid-frame bridge on mountainous soft clay site. Advances in Bridge Engineering, 2024, 5(1): 25

RIGHTS & PERMISSIONS

Higher Education Press

PDF (9801KB)

948

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Contact us

Latest issue

Just accepted

Collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors