Dynamic simulation of gas turbines via feature similarity-based transfer learning

Dengji ZHOU; Jiarui HAO; Dawen HUANG; Xingyun JIA; Huisheng ZHANG

doi:10.1007/s11708-020-0709-9

Front. Energy ›› 2020, Vol. 14 ›› Issue (4) : 817 -835. DOI: 10.1007/s11708-020-0709-9

RESEARCH ARTICLE

Dynamic simulation of gas turbines via feature similarity-based transfer learning

Author information +

History +

PDF (3133KB)

Abstract

Since gas turbine plays a key role in electricity power generating, the requirements on the safety and reliability of this classical thermal system are becoming gradually strict. With a large amount of renewable energy being integrated into the power grid, the request of deep peak load regulation for satisfying the varying demand of users and maintaining the stability of the whole power grid leads to more unstable working conditions of gas turbines. The startup, shutdown, and load fluctuation are dominating the operating condition of gas turbines. Hence simulating and analyzing the dynamic behavior of the engines under such instable working conditions are important in improving their design, operation, and maintenance. However, conventional dynamic simulation methods based on the physic differential equations is unable to tackle the uncertainty and noise when faced with variant real-world operations. Although data-driven simulating methods, to some extent, can mitigate the problem, it is impossible to perform simulations with insufficient data. To tackle the issue, a novel transfer learning framework is proposed to transfer the knowledge from the physics equation domain to the real-world application domain to compensate for the lack of data. A strong dynamic operating data set with steep slope signals is created based on physics equations and then a feature similarity-based learning model with an encoder and a decoder is built and trained to achieve feature adaptive knowledge transferring. The simulation accuracy is significantly increased by 24.6% and the predicting error reduced by 63.6% compared with the baseline model. Moreover, compared with the other classical transfer learning modes, the method proposed has the best simulating performance on field testing data set. Furthermore, the effect study on the hyper parameters indicates that the method proposed is able to adaptively balance the weight of learning knowledge from the physical theory domain or from the real-world operation domain.

Keywords

gas turbine / dynamic simulation / data-driven / transfer learning / feature similarity

Cite this article

Download citation ▾

Dengji ZHOU, Jiarui HAO, Dawen HUANG, Xingyun JIA, Huisheng ZHANG. Dynamic simulation of gas turbines via feature similarity-based transfer learning. Front. Energy, 2020, 14(4): 817-835 DOI:10.1007/s11708-020-0709-9

登录浏览全文

4963

注册一个新账户忘记密码

Introduction

With the progress made in energy science and technology, the scale and complexity of the thermodynamic systems are gradually increasing, especially the deployment on the electricity generation, therefore the requirements on the safety and reliability of thermodynamic systems are becoming gradually strict [¹]. According to the statistical data collected by the International Energy Agency (IEA), natural gas-fired power generating accounts for 21% of the electric power all over the world, 490.7 GW in total in 2016 [²], and by 2024, it will approximate to 40% [³]. It is reported that natural gas will become the second major energy source by 2030, whose consumption rate will be the same as that of oil by 2040, and will surpass oil and become the largest consumer energy source by 2050 [⁴]. The gas turbines are widely used as an important energy equipment in natural gas-fired power plants and natural gas transmission [⁵].

As a large amount of renewable energy is integrated into the power grid, the ability of deep peak load regulation is required urgently for electricity generating of traditional thermodynamic systems. Gas turbines, which is a kind of fast response thermodynamic system of direct circle with good dynamic response characteristics, is often used for dynamic peak regulation to satisfy the varying demand of users and maintain the stability of the whole power grid [⁶]. Thus, the startup, shutdown, and load fluctuation are dominating the operating condition of gas turbines [⁷]. Hence simulating and analyzing the dynamic behavior of gas turbines under such transient process conditions are important in improving their design, operation, and maintenance [⁸].

Gas turbine dynamic models, whether based on physic equations or data-driven methods, attempt to simulate the behavior in a wide range of operating conditions with dynamic environmental and operating condition parameters as the input and the other thermal parameters and performance indices (like efficiency and degradation) as the output, and thus can determine the performance of gas turbines on off-design points and transient process. Accurate dynamic simulation models can facilitate the identification of potentially harmful transient situations in real operating conditions. Therefore, a more robust and stable control system can be developed to prevent the occurrence of abnormal working conditions and ensure the safe and efficient operation of gas turbines [⁹]. For the purposes mentioned above, studies on the AI technique in dynamic operating conditions of power systems have attracted broad attention, and numerous approaches have been proposed to better simulate the dynamic behavior [¹⁰].

Traditional dynamic simulation methods for gas turbine are based on physical differential equations, including thermodynamic and fluid mechanics principles based on the mass, energy and momentum conservation law [¹¹]. The general constitutive relationship between the thermal parameters are settled while some unknown structure parameters exist. For an accurate dynamic simulation model, such unknown structure parameters must be evaluated precisely, i.e., the model need to be calibrated. Thus, the traditional models need a large amount of high-quality data to correct the parameters of the simulating models to accurately and quantitatively describe the system characteristics. There are two obstacles for traditional dynamic simulation. First, the lack of dynamic operational data makes it difficult to calibrate such simulation models. Secondly, the strong nonlinearity of the system and the strong coupling of the components make it more difficult for the solver to achieve satisfying result for global optimization. Due to the two obstacles, it is hard to use the traditional dynamic simulation of the gas turbine to accurately describe the behavior of the system in startup, shutdown, and load fluctuation in a wide range of operation conditions. Wang et al. [¹²] proposed a simulation model for the aircraft gas turbine system with a control system, achieving a good dynamic response performance on each component. However, the accuracy of the model is not validated in real case. Chaibakhsh and Amirkhani [¹³] studied the dynamic transient behavior of the gas turbine by calibrating the unknown parameters of the traditional model with real data, while the case is conducted around a relative narrow range of operation conditions.

Nowadays, dynamic models of gas turbines are mainly based on the mechanism model and data model. Xie et al. proposed a new method which could build extended dynamic models of the gas turbine [¹⁴]. A novel model was developed in a MATLAB/Simulink environment by Tsoutsanis et al. After verification, the simulation results of the model were found to be consistent with the filed data [¹⁵]. Badami et al. developed a dynamic model of gas microturbine with few data, which model was able to predict the dynamic performances and the simulation results matched well with the experimental results [¹⁶]. Mehrpanahi et al. combined thermodynamic equations with some key parameters in loading and unloading modes to generate a dynamic adaptive model which was validated [¹⁷]. Tsoutsanis et al. developed a dynamic engine model of gas turbine for diagnosis and prognosis and demonstrated the effectiveness of the method [⁹].

The data model was also proposed by many researchers. Mehrpanahi et al. built dynamic models in the loading and startup modes using neural networks based on nonlinear auto-regressive exogenous (NARX) and Hammerstein-Wiener (HW) [¹⁸]. Asgari et al. also trained NARX models of a gas turbine for predicting. It turns out that the models could learn the trend change of data and predict accurately [¹⁹]. Nikpey et al. reported an artificial neural network model for appropriate monitoring of a micro gas turbine and verified the effectiveness by simulation and experiment [²⁰]. Tsoutsanis and Meskin proposed a regression method which could estimate the performance of a gas turbine in a dynamic engine model for diagnosis [²¹]. Baklacioglu et al. generated a data-driven dynamic model of a turboprop engine and the results showed that the simulation data fit with the field data [²²]. Weng et al. reviewed the modeling, optimization, and dispatch of integrated energy systems with a gas turbine [²³].

The ideal scenario for machine learning is a large number of labeled training data with the same testing data distribution, while in many applications, it is often expensive, time consuming, or even impractical to collect sufficient training data. Semi-supervised learning solves this problem to some extent by reducing the need for large amounts of labeled data, which usually require only a small amount of labeled data and use a large amount of unlabeled data to train the models and achieve the required predicting accuracy. However, in many cases, unlabeled samples are also difficult to be collected. Therefore, the accuracy of the above traditional machine learning modes hardly attains the satisfactory results in real engineering applications, especially for dynamic systems. Transfer learning, regarded as a promising technology, focuses on transferring the knowledge across domains with a similarity by leveraging knowledge from a related domain (source domain) to mitigate data insufficient problems and improve the predictive performance in a target domain.

Transfer learning is a technology that is expected to fundamentally solve the complexity of the coupling thermodynamic system and the lack of labeled field dynamic operating data. Related research on gas turbine has been conducted and proven effective [²⁴,²⁵]. The basic operating pattern of the dynamic behavior is learned by training the networks via rich simulation data based on the knowledge of physics [²⁶]. Then, the detailed characteristics of the dynamic system are approximated by a small amount of real data on fine grain.

Transfer learning has three main advantages. To begin with, the simulating neural networks is built through an overall procedure of training on the data sets, and thus avoids tackling the coupling relationship among complicated parts, which is a great obstacle when using the traditional methods. Next, the qualitative descriptions on differential equations and data are considered separately in traditional methods, while transfer learning combines the two perspectives and fully considers the complementarity between the two. Finally, updating the model via real-data training with a small cost, can effectively overcome the time variability of the whole thermodynamic system.

Some studies were performed to fully explore the potential of the transfer learning method on engineering physics systems for accurately simulating and analyzing the behavior of their operation conditions. Klenk and Forbus [²⁷] attempted to move knowledge in the vocabulary of abstractions, assumptions, causal relationships, and models from similar examples to AP Physics style problem-solving system only on the qualitative reasoning level, which can be considered to be the first time that the researcher applied the transfer learning methodology to solve the engineering problem. Toward the thermal dynamics modeling problems, Jiang and Lee [²⁸] attempted to better predict the building indoor temperature evolution and energy consumption with a limited amount of data using a pretrain-finetune transfer learning framework. Moreover, Tang et al. [²⁶] presented a multi-state data-driven gas path analysis method by constructing a sub-model diagnostic network and applying the multi-source task-related boosting algorithm [²⁹] to transfer the knowledge among sub-models, which enhanced the effectiveness of data and attained a stable diagnostic accuracy all over the running life. Although the above researches on thermodynamic systems truly mitigated the data-insufficient problem and improved the performance on system modeling and diagnostic, the behavior of the system was not discriminatively studied from the steady-state and dynamic-state perspectives. Therefore, the applied transfer learning methods do not fully consider the feature adaptation between the data in the source domain and the target domain, which could cause the negative transferring performance during the training process.

In this paper, a feature adaptive transfer learning method focusing on the dynamic simulation of the gas turbine system is proposed to realize the data self-adaption from the ideal physics knowledge domain to the real-world operating domain. The main contributions of this paper lie in the fact that since the transfer learning frameworks reviewed above do not fully considered the dynamic-state working conditions and the target knowledge is only obtained from the field data, the typical dynamic characteristics have not been learned by the model. Hence, a set of simulation data of strong dynamic characteristics with different slopes are created via physics differential equations to boost the transfer learning performance for dynamic simulation. Moreover, to avoid the negative transferring and improve the transferring learning performance during the training process, a cosine distance similarity measurement with an encoder-decoder LSTM neural network is presented to achieve self-adaption transferring, and a feature similarity-based transfer learning model is built and trained on created artificial data sets and field data sets. Furthermore, the impacts of different hyper-parameters of the method are studied to inspect the inside transferring mechanism.

Background

Dynamic equations

Compared with the steady-state model, the dynamic model fully considers the dynamic characteristics of the gas turbine including thermal inertia, rotating inertia, and volume inertia. The component efficiency η and mass flow rate Q of the components are calculated by consulting the relative characteristic map. Details on those functions are listed in Table 1.

Combustion thermal and volume inertia

The combustor model focuses on the dynamic response processes of the pressure and temperature. The pressure P is calculated in the combustor model while the mass flow rate Q_c and Q_ht are obtained through the characteristics maps as shown in Table 1. Hence, the state equation of pressure is expressed as

(1)

d P d t = R g T g V (Q f + Q c − Q h t),

where f denotes the fuel, V represents the combustor volume, t is the time variable, R_g is the gas constant, and T_g is the gas temperature.

The inlet temperature is crucial for the performance analysis of a gas turbine, which can be obtained in the combustion model. The state equation for inlet temperature can be described as

(2)

d T g d t = Q f (h f + H V) + Q c h c − Q h t h h t − (h h t − R t T 3) (Q f + Q c − Q h t) ρ g V c p, g,

where HV denotes the fuel heating value, h denotes the enthalpy, c_p,gstands for heat capacity of the gas on a certain pressure level, and ρ_g is combustor gas density.

Rotating inertia

The compressor and turbine load are connected by the rotor. The dynamic performance of the rotor can be described by Eq. (3).

(3)

d n c d t = 900 n / π 2 (W h t − W c),

where I means the inertia moment, W_ht denotes turbine input power, n_c is the rotational speed of gas compressor, and W_c represents the output power to drive the gas compressor.

Volume inertia

The space in the compressor and turbine can be regarded as a space with input and output flows. The pressure and temperature can be expressed as

(4)

d P d t = R g T g V (Q i n − Q o u t),

(5)

d T d t = Q i n h i n − Q o u t h o u t ρ g V c p, g,

where Q_in and Q_out are the input and output mass flow rate, h_in and h_out denote the enthalpy of input and output flows, and V is the volume of the space. For more details, please refer to Refs. [⁹,³⁰,³¹].

Long-short-term memory

Recurrent neural network (RNN) [³²] research began in the 1980s, and was developed into an important deep learning algorithm in the early 2000s. RNN with memory, parameter sharing, and Turing completeness is able to learn the nonlinear features of multi-dimensional data sequences with a high efficiency. The network structure of RNN is shown in Fig. 1. Long-short-term memory (LSTM) [³³] is an improved neural network unit based on RNN. The same as RNN, the sequence data are taken as input, recursion is performed in the direction of sequence evolution along the time axis, and all neurons are connected by chains.

Given the input sequence x, the forward propagation can be calculated via Eqs. (6) and (7), where the hidden vector is h and the output vector is y.

(6)

h t = F (W x h x t + W h h h t − 1 + b h),

(7)

y t = W h y h t + b y,

where W and b are the weight and bias parameter vectors of the RNN, F is the activation function, t means the time step. Note that the subscript of the W and b means the related vector of the linear transformation. For example, the xh in W_xh represents the weight matrix, W_xh is used to transfer the input sequence x to the hidden vector h, and the subscript h in b_h means the bias added to the hidden vector h. The same form of representation is used in the following description to clarify the complex transformation inside the LSTM unit.

To mitigate the gradient disappearance and find the optimal length of memory time, LSTM neurons are used to replace the hidden layer neurons in the RNN network. The structure of LSTM neurons is improved by RNN, as demonstrated in Fig. 2. i, f, c, and o respectively represent the vectors of input gate, forgetting gate, cell state, and output gate.

(8)

i t = σ (W x i x t + W h i h t − 1 + W c i c t − 1 + b i),

(9)

f t = σ (W x f x t + W h f h t − 1 + W c f c t − 1 + b f),

(10)

c t = f t + i t tan h ⁡ (W x c x t + W h c h t − 1 + b c),

(11)

o t = σ (W x o x t + W h o h t − 1 + W c o c t + b o),

(12)

h t = o t tan h ⁡ (c t) .

As demonstrated in Fig. 2, the hidden layers of the LSTM neuron at time step t accepts the last hidden vector and the present input vector x^t, and then filters the information via Eq. (8). Equation (9) updates f^t based on the cell state vector c^t. Based on f^t, Eq. (10) calculates the present cell state vector c^t while the final output of the neuron h^tandy^tis calculated via Eqs. (6) and (7). The LSTM model adopts the linear accumulation form when processing the sequence data to avoid the gradient disappearance, and has the ability to learn long-period information, which overcomes the shortcomings of the RNN model and has been broadly utilized and studied in various engineering applications to extract the features of the multidimensional time-related data.

Transfer learning

In recent years, transfer deep learning has become one of the main data-driven methodologies to tackle the high nonlinearity and heavy multidimensionality in engineering problems. However, it is strict with the data richness and extensiveness to achieve required fitting accuracy of the deep model. In many real-world applications, it is often expensive, time consuming, or even impractical to collect the corresponding training data. Transfer learning has been developed as one of the main frameworks to address such data deficiency problem. By referring to Ref. [³⁴], transfer learning is defined in three ways.

Definition 1. (Domain) A domain D is defined as a two-tuples, D={X,P(x)}, where X is the feature space which is defined as $X = {x | x j ∈ χ, j = 1, 2, ..., m}$ Z, and P(X) is the marginal distribution.

Definition 2. (Task) A task T is consisted of a label space and a predicting function F, which is defined as T={Y,F}, where the predicting function F mapping from samples to predictive variables is learned from labeled data.

Definition 3. (Transfer learning) Given some data sets of source domains and tasks ${(D s J, T S j) | j = 1, 2, ..., n S}$ , and some data sets and tasks about target domain ${(D T k, T T k) | k = 1, 2, ..., n T}$ , transfer learning is to leverage the knowledge in the source domains to the target domains to improve the predictive performance of the learned decision function $F T k (k = 1, 2, ..., n T)$ , where D and T means the domain and task, while the subscript S and T represents the source and target.

Moreover, according to different control perspectives on the training process [³⁵], the transfer learning methods can be categorized into data-based, model parameter-based, and feature-based methods. Data-based transfer learning approaches are mainly based on the data mixture and instance weighting algorithm. Model parameter-based transfer learning controls the parameter of the neural networks via parameter sharing, pretrain-finetuning, parameter restriction and so on. Feature-based approaches attempt to map or encode the features of the source domain and the target domain into the similar representation through training so as to enable the target decision function to learn the knowledge of source domain adaptively. As presented in Table 2, three kinds of transferring modes are performed and compared in this paper. Data-based transfer learning is simply using the data-mixed mode and the model parameter-based transfer learning to transfer the knowledge embedded in the neural networks at model parameter level via pretrain-finetuning. Especially, a novel feature similarity-based adaptive transfer learning framework is proposed, which transfers the knowledge on data and feature level.

The parameters in all networks are updated and optimized by applying a back-propagation algorithm [³⁶] to reduce the overall error between the actual values and the predicted ones. This supervised process iterates until the terminal condition is fulfilled.

Methodology

In this paper, three kinds of knowledge transfer learning methods are performed using different data sets generated from the physical knowledge-based simulation system, where a novel feature similarity-based transfer learning method (FSTL) are proposed, compared and studied. Note that the source domain is the simulation system built based on the differential equation while the target domain is the real world startup simulation of the gas turbine. This section will illustrate the two core techniques of the proposed FSTL framework, which are transient process simulation data acquisition and the feature similarity-based knowledge transfer learning model.

Artificial transient process sample generation

To capture the dynamic behavior of the gas turbine in actual startup operating conditions from the thermodynamic differential equation depicted in Section 2.1, a dynamic simulation system modeling the startup condition for a double-shaft gas turbine is constructed and applied to generate the simulation data for LSTM neural networks to learn the dynamic operating patterns. Here, two kinds of data sets are constructed respectively with the same input signal as the actual startup operating conditions. This section illustrates the construction method of the two simulation data sets. As exhibited in Fig. 3, the input controlling signals of the built dynamic simulation system are the rotating speed of the gas generator N_GG and the compressor surge margin, while the compressor inlet temperature T₁, pressure P₁, and the rotating speed of power turbine N_PT are set as default values. Here, to simulate the startup conditions of a double-shaft gas turbine and attain the data sequences able to reflect the dynamic relationship between the thermodynamic parameters in gas path through time axis, the rotating speed of the gas generator N_GG and the power turbine N_PT, the compressor inlet temperature T₁, and pressure P₁ are set as the input signals, while the combustor inlet temperature T₂ and pressure P₂, the gas generator outlet temperature T₃₄ and pressure P₃₄, the power turbine outlet temperature T₄ are set as the outputs of the simulating system.

To enhance the generalization of the simulating data, the gas generator inlet pressure P₁ is set as the standard atmospheric pressure 101325 Pa plus the Gaussian noise with a variance of 100 Pa, and the temperature T₁ is set as 273.15 K plus the Gaussian noise with a variance of 10 K.

The collections of input signals are represented as vector $X ∈ R m × 4$ and the output signals of the simulation model are denoted as vector $Y ∈ R m × 6$ as depicted Eqs. (13) and (14).
(13) $X = {{N G C} j = 1 m, {N P T} j = 1 m, {T 1} j = 1 m, {P 1} j = 1 m},$
(14) $Y = {{T 2} j = 1 m, {P 2} j = 1 m, {T 34} j = 1 m, {P 34} j = 1 m, {T 4} j = 1 m} .$
where m is the length of the data sequence. Thus, the data set constructed by X and Y is formulated as
(15) $D = {(X, Y)} j = 1 N,$
where n is the number of the sequences in data set after data preprocessing.

To fully study the impact of different types of simulation data on differential equation-embedded physics knowledge transferring methods presented in this paper, some signals which are regarded as much more “dynamic” than those in the real starting-up conditions are simulated as well as the 4 field input signals of the training data set. As seen in Tables 3 and 4, 12 startup inputs of N_GG and N_PT with more steep accelerating procedures are constructed, where the signals of N_GG are determined by the starting point (0, 0), ending point (24, 9000). Moreover, 6 different controlling points are set to regulate the slope of the accelerating process and enrich the startup patterns of the transient process data set. Note that N_PT is given as 60% of N_GG_.

The responses of the gas path parameters of field data, field-input-signal simulation data, and steeply ideal transient process data are displayed in Fig. 5 as examples, where the field data set is acquired from the real operation database, the field-input-signal simulation data set is generated from the physics differential equation-based simulation using the environmental and working condition parameters (N_PT, N_GG, T₁, P₁) as inputs and outputting the simulated gas path parameters, and the steeply ideal transient process data set is also acquired from physics differential equation-based simulation while its inputs are set manually, as listed in Table 3. Note that the field data set is denoted as D_field, the field-input-signal simulation data set is marked as D_sim1, and the transient process simulation data set is D_sim2.

Feature similarity-based transfer learning (FSTL)

Since the above knowledge transferring methods do not take into consideration the adaption of data or model parameters from the source domain to the target domain, i.e., from the simulation domain to the real-world domain, there is no mechanism of information selection to determine what kind of physics knowledge the neural networks really need to learn the dynamic behavior better during the training process, like more intense accelerating operation or the flat startup procedure. Moreover, when merely simulating the real operation condition, the different outputs and the same inputs between the real field data set and the simulation data set indeed make it more difficult to transfer knowledge between the two. To tackle this issue, an ideal transient simulation data set is created, i.e., D_sim2, and a feature-based transfer learning method is proposed to conduct the self-adaption of transfer learning by evaluating the similarity between the simulation data and the field data and leveraging the learning weights from the simulation data set according to the calculated similarity. Note that this method is performed above the field data set D_field and the transient process simulation data set D_sim2, since the field-input-signal simulation data set D_sim1 has the same inputs but different outputs as D_field, which has been proved to cause serious performance degradation.

In the first place, taking the self-correlation of input variables on time axis and mutual correlation across different input variables into account at the same time, a feature mapping strategy based on an LSTM encoder-decoder network is presented to better measure the similarity between input signal X_field and X_sim2. As plotted in Fig. 6, both the encoder and the decoder are constructed via LSTM unit and have the same structure. The output vector $V ∈ R 12 × 1$ of the last LSTM unit of the encoder is regarded as the encoded feature vector of input signal X. To guarantee the feature extraction ability of the encoder, V is then feed into the decoder as the input of all LSTM units to reconstruct signal X.

The proposed encoder-decoder network is trained by minimizing the reconstruction error described with the mean square error function formulated as
(16) $min ⁡ W ∑ j = 1 n l o s s (X j X^k (X j, W)),$
where W denotes the parameters of the network, n is the size of the training data set, and loss means the mean square error operation between the input and target vectors.

In the second place, based on the trained encoder-decoder networks, the cosine distance d between the encoded vectors V_field and V_sim2 extracted from input signal X_field and X_sim2 are calculated to measure the similarity between the field data and the transient process simulation data, which is depicted as
(17) $d = cos ⁡ (V r a w, V s i m 2) = ∑ j = 1 m V r a w j ⋅ V s i m 2 j ∑ j = 1 m (V r a w j) 2 ⋅ ∑ j = 1 m (V s i m 2 j) 2,$
where superscript j means the j-th time step.

In the last place, the dynamic simulating neural networks are constructed and trained via the proposed transfer learning method. To extract the time-dependent and nonlinear features $L ∈ R 16 × 12$ in the dynamic operation conditions of gas turbines, a feature extraction neural network module with LSTM layers and MLP (multi-layer perceptron) are first built, and then a linear layer is constructed to perform the final regression of predicted output vector $Y^∈ R 16 × 6$ . Note that the parameters of the above neural network module are shared at each time step to overcome overfitting.

As shown in Fig. 7, simulation data D_sim2 is used to train the dynamic simulating neural networks NET_sim for extracting latent vector $L s i m ∈ R 16 × 12$ , which is the output of the feature extraction module. Then simulation data D_sim2 and field data D_field are mixed to train target network NET_real with the same structure for simulating the dynamic behavior in the real startup operating conditions, while NET_sim is fixed. Here, a feature similarity-based measurement is proposed and applied during the training process to make a self-adaptive selection of the simulation knowledge for proper transfer learning. The most similar data between the real-world domain and the simulation domain is inclined to extract the similar latent feature by the feature extraction module. The transfer learning training is to minimize the objective function as expressed in Eq. (18), in which the first item is to minimize the predictive error on output signals and the second is to perform the feature adaption alignment to transfer the knowledge from the simulation domain.
(18) $min ⁡ W ∑ j = 1 n l o s s (Y j, Y^j (X i W)) + β ⋅ d i * ⋅ l o s s (L j *, L j (X j W)),$
where $β$ is the weighting factor to balance the feature-transferring error and the predictive error, and is the smallest cosine distance between the field data sequence X_j input in the NET_real and all sequences in the simulation data sequence, which is defined as expressed in Eq. (19). Likewise, represents the latent feature vector of the simulation data sequence extracted by the pre-trained NET_sim, which is the most similar to the input X_j of the NET_real, as defined in Eq. (20).
(19) $d j * = cos ⁡ (arg max ⁡ d j, V r a w V s i m 2),$
(20) $L j * = L j N E T s i m (arg max ⁡ d j V s i m 2),$
where $L j N E T s i m$ means the latent feature extraction module including the LSTM and MLP layers in NET_sim.

Case study

Data preprocessing

Due to the restriction of real-world measurement deployment, field data set contains 10 types of measure points, which are the rotating speed of gas generator N_GG, the rotating speed of power turbine N_PT, gas generator inlet temperature T₁ and pressure P₁, gas generator outlet temperature T₂ and pressure P₂, gas generator outlet temperature T₃₄ and pressure P₃₄, and power turbine outlet temperature T₄. Six startup field thermodynamics data sequences of the gas path of the two-shaft gas turbine with a time length of 24 min, where the time interval between two data points is 1 min, is used, four of the field sequences is applied to train the neural networks, and the rest 2 is used for testing. It is to be noted that for better testing the performance of the dynamic simulation model on different patterns of startup operations, two different types of sequence are set as testing data set. As shown in Fig. 8, the changing pattern of the startup testing data 1 is regarded as more “dynamic” due to its steep speed rising and following fluctuations, while the rotation speed of testing data 2 is gradually increasing and is much flatter compared with testing data 1. Therefore, testing data 2 is considered as a relatively stable startup procedure.

Moreover, data augmentation splitting data sequence by sliding windows of fixed size 16 with step size 2 is executed as illustrated in Fig. 9 to fully take the advantage of field data and learn the dynamic operating features of startup condition. Therefore, each field data sequence with a length of 24 has been extracted 5 subsequences as 5 samples to train the neural networks.

All data sequences are normalized as Eq. (21).
(21) $x n o r m = x − x min ⁡ x max ⁡ − x min ⁡,$
where x_norm is normalized value, x is the value of measuring points at one single time point, x_max, x_min denotes the maximum and minimum values of all samples on a certain measuring point respectively.

Generally, the data sets generated and used for follow-up network training includes field data set D_field with 20 sequences, field-input-signal simulation data set D_sim1 with 20 sequences, and transient process simulation data set D_sim2 with 60 sequences, while the number of data sequences for testing is 10. In this paper, the data generated from the simulator is considered to embed in the physical knowledge hiding in the differential equations. Therefore the data set above are mixed and used to train LSTM neural networks to perform knowledge transfer learning.

Evaluation metrics

To better evaluate the overall predictive performance of the proposed models, the relative Error, the R² score, and the MSE (mean square error) are applied in this paper, calculated as
(22) $E r r o r j = | z^j − z j z j |,$
(23) $M S E = ∑ j (z j − z^j) 2$
where z_j denotes the measured ground-truth response, $z^j$ denotes the corresponding model prediction.

R² ranges from 0 to 1, and a higher value of R² score indicates that the overall system behavior can be more accurately depicted by the model. The calculation of R² score is described as
(24) $R 2 = 1 − ∑ j (z j − z^j) 2 ∑ j (z j − z ¯) 2 = 1 − M S E var ⁡ i a n c e .$
where $z ‾$ is the mean of the ground-truth values.

Results and discussion

Comparative study

In this paper, four kinds of data-driven simulation methods including the baseline method which uses D_field directly to train the built neural networks and three modes of transfer learning including simulation data-based transfer learning, model parameter-based transfer learning, and the proposed feature similarity-based transfer learning are compared and described in Section 2.3. The details are listed as follows.

(1) The structure of the neural networks of the four methods are the same, all built by one-layer LSTM with sizes from 4 to 8 and MLP with sizes of 8, 16, and 6.

(2) The data sets for the 3 transfer learning modes are also the same. Experiments of each kind of transfer learning mode have been conducted on two combinations of the 3 data sets, respectively (D_field, D_sim1) and (D_field, D_sim2). Note that FSTL model training on (D_field, D_sim1) are actually the same as the data mixture transfer learning on (D_field, D_sim1) since the similarity measurement mechanism would not work with the same input signals of D_field and D_sim1, leading exactly to the same training process and results as Method B in Table 4.

(3) The objective function of the FSTL and other methods are different. The objective function of the FSTL method proposed is constructed with two parts, the distance to the labels and the distance to the features of the source domain considering the similarity measurement of the features, as seen in Eq. (18), while the baseline method and other transfer learning methods only minimize the distance to the labels.

Here, both the encoder-decoder network and the dynamic simulating neural networks are trained via the Adam optimizer [³⁷], with a learning rate of 0.001 and a mini-batch size of 30. After 4000 epoch of training, the results of the simulation model are tested on the field-testing data set.

As shown in Table 3, the performance of the 6 methods constructed including the baseline method which only use the field data to simply train the built neural networks are evaluated on the testing data set, where the presented feature-based transfer learning method via the similarity measurement attain the best performance on fast startup testing data and in general. The accuracy of the 6 methods on the 6 predicted output variables are compared and the FSTL method proposed performs best on all measuring points. However, the accuracy on testing data 2 (0.963) is relatively low compared with the baseline model (0.978), which is considered as the sacrifice since FSTL puts more learning weights on D_sim2 to make the model predict more accurately on steep startup condition and thus boost the overall predicting performance.

The iterations of R² and MSE of the 6 methods on the testing data set are shown in Fig. 10. It is clearly observed that the training process of Method E applying the model parameter-based knowledge transferring from simulation data set D_sim1 to the real-world startup condition is unstable, caused by the gap of data distribution of D_sim1 and D_feild. Notice that the encoder-decoder networks are trained with the mixed data sets with D_field and D_sim2, reaching an overall testing accuracy of 0.967. Figure 11 shows the reconstruction of the testing input signals N_GG and N_PT in data set D_field and D_sim2. As shown in Fig. 11, testing data 1 and transient simulation input sequence are well reconstructed by the encoder-decoder net, proving that the encoder net is able to encode the input signals of the simulation model, which will be used to calculate the similarity between samples.

As shown in Fig. 12, the FSTL method proposed attains the highest predictive accuracy on all measurement points, of which the prediction of T₂ is the most accurate with a R² value of 0.983, while the predicting accuracy of P₃₄ is the lowest with a R² value of 0.609. To fully study the effectiveness of the presented transient simulation data mixture method and the feature-based transfer learning framework, the predicted accuracy on the 2 selected important measuring parameters including P₂ and T₃₄, and the two testing data sequences are calculated and analyzed in detail separately, as shown in Figs. 13 and 14.

As shown in Fig. 13(a), compared with other knowledge transferring methods, the feature similarity-based transfer learning method (Method F) proposed has the best overall predictive performance on testing data set 1 with a relative Error of 10% averagely and a R² of 0.762, which is able to capture the intense boosting of P₂ at the early stage of startup operation and then follow the variation after the peak, even though the predicting accuracy during the period of 9 to 12 min from the starting point is sliding, while the baseline model (Method A) with a relative Error of 25.8% averagely and a R² of –1.607, trained only via field data set D_feild, can hardly follow the trend during the whole startup operation. Both Method B and Method C use the data mixture transfer learning perform differently and Method B is worse since mixing field data set D_feild and simulating data set D_sim2 with the same input signals but different outputs can confuse the trained neural networks with different targets. It is clearly seen that Method D with data set D_field and D_sim1 performs better than Method E with data set D_feild and D_sim2, both using the model parameter transfer learning method. It is considered that the latent pattern of the behavior of the system with transient process input in the D_sim2 can be transferred more suitably by applying the data mixture method which can enhance the data diversity, while the pretrain-finetuning mode may generate the negative impact on such knowledge transfer learning.

Figure 13(b) shows the predictive performance on T₃₄. Similar to the simulating results on P₂, the FSTL method proposed in this paper achieves the highest overall predictive accuracy with a relative Error of 4% averagely and a R² of 0.917 while the baseline model only achieves a R² of –0.308 with a relative Error of 9%. The other methods can capture the variation but hard to precisely track the dynamic changings of T₃₄ with a larger fluctuation on predictive error.

Figure 14 illustrates the simulation results of the 6 methods on testing data 2 with similar startup patterns to field training data sets D_feild, where the 6 methods can all generally learn the changing rules better than on testing data 1 with a R² value of 0.308 and a relative Error of 9% of the FSTL method proposed, while the degradation of the predictive performance on T₃₄ can be inspected in Fig. 14(b) during the 14–20 min from the startup point, which is considered as a trade-off on better simulating the rapid startup operation like testing data 1 and the flatter accelerating conditions like testing data 2. Noticing that the absolute error at the early stage of startup conditions is actually very low while the relative error can be larger than 1 since the ground truth values of the measurement points may be too small when the gas turbine is not fully accelerated.

To sum up, the feature similarity-based transfer learning method can boost the predictive performance on fast accelerating working conditions and better simulate the dynamic system of the gas turbine in general, with a R² from 0.707 to 0.881, despite the fact that to some normal startup conditions, the simulation accuracy may discount, averagely 1.5% lower than the baseline model. The method proposed is compared with the other 4 methods which apply the data mixture-based transferring and model parameter-based transferring on different data sets and prove the better overall predictive performance. Moreover, by comparing Method B with Method D, it can be concluded that the effective method to transfer knowledge from the simulation data, when the data distribution on input X is the same as that of field data, to the real-world domain is the model parameter-based transferring mode, while comparing Method C with Method E, it can be concluded that data mixture-based transfer learning is more productive for the totally different data distribution between the simulating domain and the real-world domain.

Effect study

To further inspect the mechanism of the transfer learning method proposed, three aspects of the impact analysis on the hyperparameters and operation algorithm are executed.

(1) Effect of simulation data sample size on R²

The impact of the sample size of D_sim2 is analyzed. The samples are randomly selected from initial D_sim2 with 3, 6, 9 and are used to train under N_real with the FSTL method proposed. The results on the two data sets are shown in Fig. 15. Noticing that the methods with the sample size of 0 and 12 actually mean the baseline method and the FSTL model respectively.

As shown in Fig. 15, with the increase in the size of simulation data set D_sim2, the overall accuracy of testing data set rises from 0.701 to 0.881. However, there is a small decline in the sample size of 12 compared with the sample size of 9, despite the fact that the accuracy in testing data set 1 is boosting with the size of D_sim2 ranging from 0 to 9, while the accuracy in testing data 2 is degrading when the sample size is increasing from 0 to 9. This variation in predicting accuracy proves that the feature-similarity measurement mechanism is finding a balanced point to reach a more generalized predicting performance on different data with different operating patterns. With the sample size increasing, the method proposed can fully exploit the advantage of the physics knowledge hidden in the simulating data set and has more potential to find the optimum parameters of the neural nets built.

(2) Effect of weighting factor β on R²

The impact of weighting factor β, which is used to find a balanced position between transfer learning for knowledge transferring from the simulating domain to the real-world domain and regression learning for predicting accuracy is studied at different level of 0, 0.2, 0.4, 0.6, 0.8, 1, and 1.2, where the 0 value of β is actually the baseline Method A.

As shown in Fig. 16, the method proposed achieves the best performance when β is 0.4. As the value of β increases in the range of 0 to 0.4, the R² score in both the whole testing data set and testing data 1 gradually rises and reaches the peak of 0.881 and 0.755 respectively at 0.4 while the accuracy in testing data 2 keeps declining. When β ranges from 0.6 to 0.8, the accuracy in the overall testing data and single testing data 1 is much lower than the peak at β= 0.4 while the accuracy in testing data 2 is increasing. Thus, this variation reflects the fact that the weight of the knowledge transferring loss item depicted in Eq. (18) plays an important role in making a trade-off between different patterns of the data. To attain a more general simulating performance to adapt the various kinds of dynamic operation, β= 0.4 is used in the method proposed.

(3) Effect of metrics of feature similarity measurement on R²

To further explore the impact of different similarity evaluation methods when applying the transfer learning method proposed, experiments on 4 typical similarity metrics are performed, which are Pearson correlation d_corr, Manhattan distance d_m, Euclidean distance d_e, and Cosine distance d_cos, as described in Table 5.

As shown in Fig. 17, the cosine distance attains the highest testing accuracy in testing data 2 with R² equaling to 0.755 and 0.963, respectively. Besides, all the distance evaluation metrics except the Pearson correlation metrics are able to enhance the overall predicting performance while the accuracy in testing data 2 are reduced, which has been discussed above and considered as the trade-off behavior of the similarity measurement mechanism. The results further prove the effectiveness of the transfer learning method proposed and interpret feature similarity-based transfer learning mechanism which is attempting to reach a general predictive performance to adapt to a wider range of operating conditions of gas turbines.

Conclusions

This paper proposes a physics knowledge-transfer learning method to make a compensation for the deficiency of field dynamic data and improve the accuracy of the LSTM neural networks built for dynamic simulation of a gas turbine. A simulation system based on the physics differential equations with the same input signals as the field is built to generate simulation data set with different startup patterns and transient process signals. The 3 typical kinds of transfer learning methods including the feature similarity-based transfer learning method proposed are then studied on the created data sets. Finally, the impacts of different hyperparameters and similarity metrics are analyzed to inspect the inner mechanism of the transfer learning method proposed. The following conclusions can be reached.

(1) The feature similarity-based transfer learning framework proposed using created transient process simulation data sets with embedded physics knowledge from differential equations can achieve the best predictive performance on all the output signal dimensions of all the 6 transfer learning methods conducted in this paper, and the overall R² score of 24.6% has been improved and a MSE of 63.6% has been reduced compared with the baseline model.

(2) In the study of the hyperparameters of the method proposed, the mechanism of feature similarity measurement during the training process is able to make a trade-off across different data distributions to achieve a general simulating performance as the sample size and weighting factor β change. The value of weighting factor β is optimized as the best and the cosine similarity is proved to be the most proper metrics for feature adaptively transfer learning.

(3) For further research, the weighting mechanism of feature similarity loss and predictive loss can be constructed to be updated adaptively through the training process but not a hyperparameter. Moreover, since the data-based and model parameter-based transfer learning methods are not fully explored on the gas turbine dynamic simulation problem, further research of the knowledge transferring from the physics theory domain to real-world application can be performed from the two perspectives.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]
Zhou D J, Wei T T, Ma S X, Study on meta-modeling method for performance analysis of digital power plant. Journal of Energy Resources Technology, 2020, 142(4): 042005

[2]
International Energy Agency. Electricity statistics. 2018, available at website of International Energy Agency

[3]
Wang H L, He J K. China’s pre-2020 CO₂ emission reduction potential and its influence. Frontiers in Energy, 2019, 13(3): 571–578

[4]
Chong Z R, Yang S H B, Babu P, Review of natural gas hydrates as an energy resource: prospects and challenges. Applied Energy, 2016, 162(1): 1633–1652

[5]
Zhou D J, Wei T T, Huang D W, et al . A gas path fault diagnostic model of gas turbines based on changes of blade profiles. Engineering Failure Analysis, 2020, 109: 104377

[6]
Ling Z, Yang X, Li Z L. Optimal dispatch of multi energy system using power-to-gas technology considering flexible load on user side. Frontiers in Energy, 2018, 12(4): 569–581

[7]
Li J, Liu G D, Zhang S. Smoothing ramp events in wind farm based on dynamic programming in energy internet. Frontiers in Energy, 2018, 12(4): 550–559

[8]
Zhou D J, Yu Z Q, Zhang H S, A novel grey prognostic model based on Markov process and grey incidence analysis for equipment degradation. Energy, 2016, 109: 420–429

[9]
Tsoutsanis E, Meskin N, Benammar M, et al. A dynamic prognosis scheme for flexible operation of gas turbines. Applied Energy, 2016, 164(2): 686–701

[10]
Gao D W, Wang Q, Zhang F, et al. Application of AI techniques in monitoring and operation of power systems. Frontiers in Energy, 2019, 13(1): 71–85

[11]
Zeng D T, Zhou D J, Tan C Q, et al . Research on model-based fault diagnosis for a gas turbine based on transient performance. Applied Sciences (Basel, Switzerland), 2018, 8(1): 148

[12]
Wang C, Li Y G, Yang B Y. Transient performance simulation of aircraft engine integrated with fuel and control systems. Applied Thermal Engineering, 2017, 114: 1029–1037

[13]
Chaibakhsh A, Amirkhani S. A simulation model for transient behaviour of heavy-duty gas turbines. Applied Thermal Engineering, 2018, 132(3): 115–127

[14]
Xie Z W, Su M, Weng S L. Extensible object model for gas turbine engine simulation. Applied Thermal Engineering, 2001, 21(1): 111–118

[15]
Tsoutsanis E, Meskin N, Benammar M, Dynamic performance simulation of an aeroderivative gas turbine using the Matlab Simulink environment. In: ASME 2013 International Mechanical Engineering Congress and Exposition, San Diego, California, USA, 2013: 56246

[16]
Wang H, Li X S, Ren X, et al. A thermodynamic-cycle performance analysis method and application on a three-shaft gas turbine. Applied Thermal Engineering, 2017, 127(12): 465–472

[17]
Badami M, Ferrero M G, Portoraro A. Dynamic parsimonious model and experimental validation of a gas microturbine at part-load conditions. Applied Thermal Engineering, 2015, 75(1): 14–23

[18]
Mehrpanahi A, Payganeh G, Arbabtafti M. Dynamic modeling of an industrial gas turbine in loading and unloading conditions using a gray box method. Energy, 2017, 120(2): 1012–1024

[19]
Asgari H, Chen X Q, Morini M, et al. NARX models for simulation of the start-up operation of a single-shaft gas turbine. Applied Thermal Engineering, 2016, 93(1): 368–376

[20]
Nikpey H, Assadi M, Breuhaus P. Development of an optimized artificial neural network model for combined heat and power micro gas turbines. Applied Energy, 2013, 108(8): 137–148

[21]
Tsoutsanis E, Meskin N. Derivative-driven window-based regression method for gas turbine performance prognostics. Energy, 2017, 128(6): 302–311

[22]
Baklacioglu T, Turan O, Aydin H. Dynamic modeling of exergy efficiency of turboprop engine components using hybrid genetic algorithm-artificial neural networks. Energy, 2015, 86(6): 709–721

[23]
Weng S L, Gu C H, Weng Y W. Energy internet technology: modeling, optimization and dispatch of integrated energy systems. Frontiers in Energy, 2018, 12(4): 481–483

[24]
Zhong S S, Fu S, Lin L. A novel gas turbine fault diagnosis method based on transfer learning with CNN. Measurement, 2019, 137: 435–453

[25]
Zhou D J, Yao Q B, Wu H, et al. Fault diagnosis of gas turbine based on partly interpretable convolutional neural networks. Energy, 2020, 200: 117467

[26]
Tang S X, Tang H L, Chen M. Transfer-learning based gas path analysis method for gas turbines. Applied Thermal Engineering, 2019, 155: 1–13

[27]
Klenk M, Forbus K. Analogical model formulation for transfer learning in AP Physics. Artificial Intelligence, 2009, 173(18): 1615–1638

[28]
Jiang Z H, Lee Y M. Deep transfer learning for thermal dynamics modeling in smart buildings. In: 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019: 2033–2037

[29]
Yao Y, Doretto G. Boosting for transfer learning with multiple sources. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010: 1855–1862

[30]
Zhou D J, Yu Z Q, Zhang H S, A novel grey prognostic model based on Markov process and grey incidence analysis for equipment degradation. Energy, 2016, 109: 420–429

[31]
Ma S X, Sun S N, Wu H, et al. Decoupling optimization of integrated energy system based on energy quality character. Frontiers in Energy, 2018, 12(4): 540–549

[32]
Little W A. The existence of persistent states in the brain. Mathematical Biosciences, 1974, 19(1–2): 101–120

[33]
Gers F A, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. In: 9th International Conference on Artificial Neural Networks, Technical report, 1999

[34]
Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359

[35]
Zhuang F Z, Qi Z Y, Duan K Y, A comprehensive survey on transfer learning. arXiv preprint, 2019: 1911.02685

[36]
Rumelhart D E, Hinton G E, Williams R J. Learning internal representations by error-propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1986, 1: 318–362

[37]
Kingma D P, Ba J. Adam: a method for stochastic optimization. arXiv preprint, 2014: 1412.6980

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary ^{中
Eng} ×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.

AI Summary AI Mindmap

Share on WeChat

PDF (3133KB)

Part of a collection:

3872

Accesses

0

Citation

Altmetric

Detail

Sections

Recommended

Received	Accepted	Published
2020-04-08	2020-09-21	2020-12-15
Issue Date	Revised Date
2020-12-09

About the journal

Browse

Authors & reviewers