RESEARCH ARTICLE

Hybrid method integrating machine learning and particle swarm optimization for smart chemical process operations

  • Haoqin Fang 1 ,
  • Jianzhao Zhou 1 ,
  • Zhenyu Wang 1 ,
  • Ziqi Qiu 1 ,
  • Yihua Sun 2 ,
  • Yue Lin 1 ,
  • Ke Chen 1 ,
  • Xiantai Zhou , 1 ,
  • Ming Pan , 1
Expand
  • 1. School of Chemical Engineering and Technology, Sun Yat-Sen University, Zhuhai 519082, China
  • 2. School of Mathematics, Sun Yat-Sen University, Zhuhai 519082, China

Received date: 10 Dec 2020

Accepted date: 19 Jan 2021

Published date: 15 Feb 2022

Copyright

2021 Higher Education Press

Abstract

Modeling and optimization is crucial to smart chemical process operations. However, a large number of nonlinearities must be considered in a typical chemical process according to complex unit operations, chemical reactions and separations. This leads to a great challenge of implementing mechanistic models into industrial-scale problems due to the resulting computational complexity. Thus, this paper presents an efficient hybrid framework of integrating machine learning and particle swarm optimization to overcome the aforementioned difficulties. An industrial propane dehydrogenation process was carried out to demonstrate the validity and efficiency of our method. Firstly, a data set was generated based on process mechanistic simulation validated by industrial data, which provides sufficient and reasonable samples for model training and testing. Secondly, four well-known machine learning methods, namely, K-nearest neighbors, decision tree, support vector machine, and artificial neural network, were compared and used to obtain the prediction models of the processes operation. All of these methods achieved highly accurate model by adjusting model parameters on the basis of high-coverage data and properly features. Finally, optimal process operations were obtained by using the particle swarm optimization approach.

Cite this article

Haoqin Fang , Jianzhao Zhou , Zhenyu Wang , Ziqi Qiu , Yihua Sun , Yue Lin , Ke Chen , Xiantai Zhou , Ming Pan . Hybrid method integrating machine learning and particle swarm optimization for smart chemical process operations[J]. Frontiers of Chemical Science and Engineering, 2022 , 16(2) : 274 -287 . DOI: 10.1007/s11705-021-2043-0

1 Introduction

The chemical processing industry manufacturing products by mixing, separating, forming, and chemical reactions, is being promoted for its greatly important contribution to economic development, providing high quality products to many sectors, such as building, transportation, construction and health [1]. It is also the largest energy consumer among industrial sectors with 1078 million tons of oil equivalent of energy consumption in the year 2016 [2]. Unlike general artificial intelligence which builds computerized systems possessing intelligence like humans, industrial intelligence is more concerned with the application of technologies to build a smart system that can adapt to the changes in industrial environment. This requires process optimization by adjusting key operating parameters to maximize/minimize one or more of the process specifications (such as profit/cost and efficiency/consumption), while keeping all others within their constraints. It has been proved that optimal operations can influence the economic and energy efficiencies of the whole process by 24% [3].
Many optimization strategies have been reported for process operations, which can be mainly categorized as mechanism-based methods and data-driven methods [4]. The mechanism-based methods reflect the real material balance, equilibrium, summation, enthalpy balance, and hydraulic performance in each detailed unit and through the whole process. They are generally used for conceptual design, and have been implemented in many commercial simulators, such as Aspen Plus, Aspen Hysys, Pro-II, and gPROMS [5]. Recent studies have focused on linking external optimization algorithms to the above commercial simulators. Ibrahim et al. used a genetic algorithm to optimize a crude oil processing system, and defined the operational variables of the process in Aspen Hysys [6]. However, the mechanism-based models usually suffer from the computational difficulty of solving complex nonlinear problems caused by the first-principle calculations [7]. In order to reduce the complexity of full-rigorous models, simplified models have been proposed. In some commercial software, e.g., PIMS (Aspen Tech) and RPMS (Honeywell), highly nonlinear models are simplified using linear or bi-linear constraints, and solved with the successive and sequential linear programming method [8]. Although the simplified models have been widely applied into industry, they may still cause a large deviation because of assuming approximate or near linearity [9].
Compared to the mechanism-based models, data-driven models presenting the relationship between input variables and output variables as “black boxes” can reduce the complexity. K-nearest neighbors (KNN), support vector machine (SVM), decision tree (DT), and artificial neural network (ANN) are the most commonly-used machine learning algorithms for data-driven models [10]. KNN is capable of addressing the issues arising from nonlinearity and insufficient training data, and has been applied in different areas such as pattern recognition [11], data mining [12], and outlier detection [13], to improve process operations [14]. SVM models combined with a genetic algorithm optimizer had been used for controlling production quality [15] and optimizing equipment structure [16], especially for minimizing energy consumption up to 43% in carbon fiber industry [17]. Both DT [18] and ANN [19,20] can predict the output measures very well in pyrolysis reaction networks, but DT is much faster to be trained than ANN, as it is less complicated and can be constructed by practitioners without deep understanding required for training ANN [21]. However, ANN shows prospective application in the correlations between molecular structures and properties [2224]. Recently, Su et al. [25] developed innovative architecture of deep neural network combining tree structured long short-term memory network and back-propagation neural network to provide an intelligent tool to predict properties in the design or synthesis of chemical, pharmaceutical, biochemical products and processes. Further, for solving the optimization problems on chemical engineering effectively and efficiently, Schweidtmann et al. [26] maximized the net power generation in an organic Rankine cycles by deterministic global process optimization based on ANNs, and Chandrasekaran and Tamang [27] employed ANN and particle swarm optimization (PSO) technique to minimize the machining time in turning of Al-SiCp metal matrix composites. Besides, random forest and genetic algorithm were integrated to optimize the operating conditions of chocolate chip cookie production [28].
As discussed above, existing works mostly focus on developing one kind of machine learning method while rarely consider the four machine learning algorithms in the same chemical process to compare and analyze their performances. It is also difficult to obtain enough datasets for model training, as a practical chemical process is usually operated continuously and steadily in a small region. Thus, this paper presents an efficient dataset generation strategy based on industrial data validation and commercial software simulation at first. A hybrid framework integrating machine learning algorithms and PSO is then proposed to achieve smart process operations. In this hybrid framework, the models of chemical processes are trained by using four well-known machine learning algorithms, namely KNN, DT, SVM, and ANN, and the most efficient model can be selected and combined with PSO for further process optimization.

2 Experimental

2.1 Data generation of chemical process operation

Machine learning comprises models that learn from existing data. These data require preprocessing to identify missing and spurious elements. Since a chemical process must run in a steady state, the data collected from industry are only in a limited operation region, which will affect the model prediction. Many specialized simulation software tools (such as Aspen Plus and Hysys) have been utilized for adjusting chemical process operations [29]. This can avoid the risk of operating the real processes in an unexpected region. In this paper, an industrial propane dehydrogenation (PDH) process is carried out as the case study, where Aspen Plus simulation process is built and validated by the industrial data, and the simulation model then can be used to generate the entire data set for model training and testing.

2.1.1 Aspen Plus simulation of PDH process

A PDH process is simulated by using Aspen Plus according to the Oleflex process developed by UOP company [30]. Figure 1 describes the detailed flowsheet of the PDH process developed in Aspen Plus simulation environment. The process feed is pre-processed to make the propane volume fraction to meet the feed requirements for the reaction in Unit 1 at first and then reacts with H2 in a moving bed reactor with four parallel stages and radial flow in Unit 2. The reaction product enters the rapid cooling, high pressure dehydration, and cryogenic dehydrogenation in Unit 3. In the refining section (Unit 4), light ends (C1 and C2) are removed from a deethanizer column, and the propylene and propane are then separated via a propylene rectification column (RC). Propylene can be sold as a final product, while propane enters the reactor as a cyclic feed. The Flowrate of propane and propylene in the four moving bed reactors are shown in Table 1.
Fig.1 Flowsheet of the PDH process built in Aspen Plus.

Full size|PPT slide

Tab.1 The flowrate of propane and propylene in moving bed reactors
Feed stream Reactor 1 Reactor 2 Reactor 3 Reactor 4
Inlet Outlet Inlet Outlet Inlet Outlet Inlet Outlet
C3H8/(kg·h−1) 84461.9 74492.6 74492.6 67199.7 67199.7 61336.7 61336.7 56342.1
C3H6/(kg·h−1) 891.1 10166.3 10166.3 16746.8 16746.8 21905.0 21905.0 26296.5
Actually, it is difficult to compare simulation results with industrial data directly due to the commercial confidentiality. In this study, we compare our reaction process (Unit 2) with the model which performed identically to plant data in recent literature [31]. The details of our simulation process and comparison are shown in Table 2. Both of them are highly consistent with each other based on the same reaction kinetics [32,33], reactors (adiabatic moving bed reactors) and heating method (inter-stage reheating in fired furnaces), which means that our simulation can be used to reflex the real solutions. The propane conversion (C), propylene selectivity (S), propylene yield (Y) are defined, respectively, as:
C= F C 3H8,inF C3H8,out FC3H 8,in ,
S= F C 3H6,inF C3H6,out FC3H 8,inFC3 H8,out,
Y=C× S,
where FC3H8,in and FC3H6,in are the input molar flow rates of C3H8 and C3H6, respectively, and F C3H8,out and F C 3H6,out are the output molar flow rates of C3H8 and C3H6, respectively.
Tab.2 The results of previous model [31] and Aspen Plus simulation in reaction process
Index Previous model Aspen Plus simulation Relative error/%
C/% 33.2 33.29 0.271
S/% 94.8 94.68 −0.127
Y/% 31.5 31.52 0.0635

2.1.2 Data generation for model training and testing

The PDH process is composed of pumps, heat exchangers, moving bed reactors, compressors and rectifying columns. For data-driven modeling, the total annual profit (y1) and the propylene yield (y2) are set as the outputs, and the sensitivity analyses are conducted in Aspen Plus to find out the key parameters affecting the outputs. Firstly, ten candidate parameters are considered, such as the flowrate of propane feed, the flow ratio of hydrogen-to-propane, the pressure of propylene rectifier, the temperature and pressure of reactor, reflux rate, condenser temperature, condenser pressure, refining temperature, and refining pressure. By comparing the changing magnitude of outputs with the change of operating parameters, the flowrate of feed (x1), the flow ratio of hydrogen-to-propane (x2), the pressure of propylene rectifier (x3), and the temperature and pressure of reactor (x4 and x5) are identified as the key parameters. Data for model training and testing are further obtained by varying the five input parameters on Aspen Plus simulation. The operating ranges of the five parameters are determined according to the practical operating region, and thus 8750 groups of data are generated for model training with sampling average method as shown in Table 3. For the known operating region, average interpolation method is employed, i.e., 1024 groups of data within the training set are selected for testing, where four points are sampled for each input parameter (45 = 1024). Besides, random generation also can be used to obtain the training and testing data, if these data cover the most of the process operating region. Figure 2 presents the sampling features for both model training and testing, which has covered most of the process operating region uniformly and densely to avoid over-fitting.
Tab.3 Operating ranges and sampling points of inputs for data-driven modeling
Input Operating range Sampling points for training Sampling points for testing
x1/(kmol·h−1) 560−840 560, 580, 600, 620, 630, 640, 660, 680, 700, 720, 740, 760, 770, 780, 800, 820, 840 595, 665, 735, 805
x2 1.6−40 1.6, 1.8, 2, 2.2, 2.4 1.7, 1.9, 2.1, 2.3
x3/bar 6.5−10 6.5, 7, 8, 9, 10 6.5, 7.5, 8.5, 9.5
x4C 570−610 570, 580, 590, 600, 610 575, 585, 595, 605
x5/bar 1.8−2 1.8, 1.85, 1.9, 1.95, 2 1.83, 1.88, 1.93, 1.98
Fig.2 Sampling for model training and testing of the two outputs in the PDH process: total annual profit (y1) and propylene yield (y2).

Full size|PPT slide

In order to consider more practical scenarios, some uncertain disturbances are also added in the Aspen Plus simulation, i.e., some parameters not selected as model inputs are changed in a small range randomly. In the deethanizer column and the propylene rectification column, the temperature varying within ±2 °C of error and the pressure varying within ±1% of relative error randomly can represent the minor errors caused by the temperature and pressure sensors in practical process. Therefore, the process outputs might vary in an acceptable range even the same process inputs are set in the procedure of collecting training data.

2.2 Hybrid method for modeling and optimization of chemical processes

In our proposed hybrid method, four machine learning algorithms, KNN, support vector regression (SVR), DT, and ANN, are used for training the data-driven models based on the data provided in Section 2.1. Once the best model is identified, PSO can be applied to find the optimal operational solution of the process.

2.2.1 KNN method for modeling

KNN is a non-parametric method used for classification and regression [34]. KNN regression is presented to predict process performance in this study. It can estimate the property value of the object by using the average of the values of k nearest neighbors. The procedure of KNN can be described as the following five steps.
Step 1: initialize k to a positive integer value.
Step 2: select the type of distance, and calculate the distance based on the training data and testing data. There are several kinds of calculation, where the distances between testing data (xitext)and training data (xitrain) are determined based on the dimensionality of their feature space (m).
Euclidean distance:
d=i=1m (x itestxitrain)2.
Manhattan distance:
d= i=1 m| xitestxitrain|.
Chebyshev distance:
d=maxi(| xitestxitrain|).
Step 3: predict the values of testing data. The k nearest neighbors can be used to predict the output values (ypre):
ypre = i =1kλi yitrain.
There are two strategies of calculating. One is the average of the k nearest neighbors:
λi =1/k.
The other is inversely proportional to the distance between testing data and train data (di):
λi= edi/βi=1k (edi /β),
where β=1 or β=2 or β=50 or β= i=1 k (x itext xitrain)2/(2k).
Step 4: examine the accuracy of the model. R2 indicates the accuracy of the prediction:
R2=1S SresS Stot ,
SS res= i(yiprey test)2,
SS tot= i(yipre ypre¯)2.
Step 5: if the model accuracy is acceptable, stop the procedure; otherwise, update the value of k, and execute Steps 2 to 5 iteratively.

2.2.2 SVR method for modeling

SVR is developed from SVM [35]. SVM is a supervised learning theory originally used for solving classification problem, and later generalized to solve regression problems (called SVR). SVR is used to fit a hyperplane (y= f (x)) through the input data (x), where f (x,ω) = ω∙Φ(x) + b, ω is the weight vector, b is a parameter, and Φ(x) is the function of x for mapping the data into high-dimensional feature space. The goal of SVR is to minimize:
12ω 2+c i= 1m(ξi+ξi),
subjected to:
ε ξi yiΦ (xi)bε+ ξi ,
ξi,ξi 0,
where ε is a parameter representing the radius of insensitive zone around the hyperspace, ξ i and ξi are the slack variables which describe the difference between the outside point and insensitive zone, and c determines the trade-off between the training error and model flatness.
The above convex optimization problems (Eqs. (13–15)) can be transformed into a dual problem by introducing Lagrange multipliers, and shown as:
min 12 i=1,j=1m(αiαi)( αj αj)K ij+ i=1m [(ε yi)α i+(ε+ yi)α i],
subjected to:
i=1m (αi+αi )=0,
0 αi,α iCm,
where Kij=Φ( xi )Φ (xj) is kernel function and α are the Lagrange multipliers.
There are three types of kernel function in this study.
Linear function:
Kij=xixj.
Polynomial function:
Kij=( γxixj+ r)d,
where γ, r, d are kernel parameters.
Radial basis function:
Ki j=exp( 12σ2 xix j2 )=exp(γ xi xj2),
where σ is the width of Gaussian kernel and γ is kernel function parameter (a positive value).
The dual problem can be solved by using sequential minimal optimization, and thus the values of α iand αi are obtained. Then the weight vector ω can be calculated by:
ω=i=1m (α i α i) Φ( xi),
while the support vectors (xs, ys) are calculated as:
αsαs 0,
bs= ys( αsα s)K(x,xs )ε( 0< αs< c),
bs= ys( αsα s)K(x,xs )+ε(0<αs< c).
Finally, the parameter b is computed as:
b= bss.

2.2.3 DT method for modeling

DT has been widely used in fault detection and classification, but less in solving process operating problems. In this study, classification and regression trees (CRT) is applied to model training [36]. The core of CRT is the linear regression for the parent node and child nodes in the tree:
y^=X K,
K=[ XTX]1(XT Y),
X=( x 11 x1i xj1 xji),
Y=( y 11 y1i yj1yji) ,
where X is the matrix of model inputs, Y is matrix of model outputs, K is the regression matrix, and y^ is the prediction matrix. Once the predicted y^ is obtained, the total variance (V) can be given:
V= j=1J k=1 K( y^jkyj k) 2.
The above regression will be used in the CRT procedure as follows. Step 1: set a value for parameter N. This N restricts the minimum number of training instances in a single node. Step 2: set the whole training data set as the parent node (P), and calculate its prediction ( y^) and total variance (Vp). Step 3: split the data from the parent node (P) into two parts, namely left child node (L) and right child node (R). The left child node and right child node are separated based on a splitting value of x i(sx ji). xi and its splitting value are selected arbitrarily at first. The data set with xjisxji is assigned to the left child node (L), while the rest data set is assigned to the right child node (R). It must be restricted that the amounts of data in L and R must be equal to or larger than the minimum number N set in Step 1. Thus, the variances of L and R related to sxji splitting (VL,sxj iand V R,s xji) can be obtained according to Eqs. (24–28). The total variance related to sxji splitting (V sxji) is the sum of its left variance (VL,sxj i) and right variance (VR,sxj i). Step 4: find all possible xiand its value ( sxji) for splitting, and calculate their relevant variances ( Vsxji). The d xi and its splitting value (d xji) with the minimum variance ( Vdxji) compared with all possible variances is selected as the best division of the parent node. Step 5: if the minimum variance (V dxji) found in Step 4 is less than the variance of parent node (Vp), L and R will be set as two new parent nodes, and the splitting strategies proposed in Step 3 to Step 4 are executed to split the new parent nodes iteratively. The whole procedure will be terminated, if no better variance can be found in any child nodes.
Figure 3 illustrates a CRT structure with two parent nodes and three child nodes, where y^=f( X) can be formulates as a piecewise linear function.
Fig.3 Illustration of a CRT structure with two parent nodes and three child nodes.

Full size|PPT slide

2.2.4 ANN method for modeling

Typical ANN consists of three layers, the input layer (l1), hidden layer (l2), and output layer (l3) [37]. Each layer is a vector. The dimension of the vector is also called the number of nodes in the layer. The input layer receives the input data from training dataset. The adjacent layers are connected via weight matrices (Wi), bias vectors (bi), and an activation function g:
li+1=g(Wili +bi) , i=1,2.
Four types of activation function (ReLU, Softplus, Sigmoid, and Tanh) are addressed.
ReLU(x)=max (x,0).
ReLU is a piecewise linear function, and works only if x is greater than or equal to 0. When x is less than 0, the node will not be updated during the whole training process.
Softplus(x)=ln (1 + ex).
Softplus can be regarded as a smoothing ReLU. It overcomes the disadvantage of ReLU but takes more calculations.
Sigmoid(x)= 1 1+ ex.
Sigmoid is a non-linear function that converts the range of x from (,+) to (0, 1). However, when | x| is too large (such as |x |+), Sigmoid'(x) 0 , which may slow down the training process of the model.
Tanh(x)=exexex+ e x.
Tanh is a non-linear function that maps the range of x from (,+) to (−1, 1). It may also slow down the training process of the model when |x| is too large like sigmoid.
Then, a loss function is set to measure the loss between the output layer and the validation data. In this work, the loss function is expressed with the mean square error (MSE in 37), or the sum of MSE (37) and L2-regulation function (38), where k is the regularization rate, and ωi is the weights of the neural network. The aim of L2-regulation function is to prevent overfitting while minimizing the loss of prediction.
MSE=i=1n(y i y^i)2n.
L2= 12 k iωi 2.
f=MAE+L 2.
Five optimization algorithms, namely, gradient descent, momentum, adaptive gradient (AdaGrad), root mean square prop (RMSProp), and adaptive moment estimation (Adam), can be used to minimize the loss. Gradient descent works by taking the gradient of the weight space to find the path of steepest descent. Momentum improves gradient descent. It can accumulate velocity in the direction where the gradient is pointing towards the same direction across iterations, and thus avoid getting stuck in a local minimum. AdaGrad is a modified stochastic gradient descent with per-parameter learning rate. It increases the learning rate for sparser parameters and decreases the learning rate for fewer sparse ones. RMSProp is also a method of addressing learning rate. It describes the learning rate for a weight by a running average of the magnitudes of recent gradients. Adam is the combination of momentum and RMSProp.
The aforementioned activation functions and optimization algorithms for minimizing the loss are utilized in the ANN training procedure as follows. Step 1: set the nodes in the hidden layer, and generate the initial weights (ωi) and bias (bi) of the neural network randomly. Step 2: calculate output value based on Eqs. (32) to (35), and then evaluate the loss between the output value and the real value through the loss function Eqs. (37–39). Step 3: update the weights (ωi) and bias (bi) of the neural network by using one of the optimization algorithms to minimize the loss. Step 4: execute Step 2 and Step 3 iteratively, until the maximum number of iteration is reached or the loss is acceptable.

2.2.5 PSO for process optimization

Once the most efficient model of the chemical process is selected by comparing the four machine learning algorithms, the process operation can be optimized with PSO algorithm, a global random search algorithm with simple structure and easy programming [38] based on swarm intelligence [39].
In PSO algorithm, each particle i can be described with an N-dimensional position vector (xi=(xi1, xi2,..., xiN)) and velocity vector (vi=(vi1,v i2,...,viN), vi(vmax,vmax)). The particles will search the xi for optimal objective value in each individual particle, and then determine the values of xi for the global optimal objective value among all particles.
In the entire algorithm procedure, PSO requires a series of searching iterations. The position and velocity vector of each particle is randomly selected in the first iteration. In the rest iterations, particle swarm will update the position vector (xi) and velocity vector (vi) according to the inertia, individual optimal value (pi) and global optimal value (pg) as follow:
vik +1 =ωvik+c 1 r1(pi xik)+c2r2(p g xik),
xik+1=xik+vik+1,
where xik and vik denote the position and the instantaneous velocity of particle i in iteration k, ω is the inertia coefficient, c1 and c2 are the acceleration factor, and r1 and r2 are the random number ranging from 0 to 1.
Thus, the model objective value found by particle r in iteration k(objik) is calculated based on its updated position vector (xik). If objik is better than the previous optimal value (objik−1), xik is set to pi; otherwise, pi is retained. Consequently, we can compare the objik of all particles, and select the best objik as the global optimal value. If this global optimal value is better than the pg found in the previous iteration, pg can be replaced with xik; otherwise, pg is retained. The procedure will stop when the maximum iteration number is reached.

3 Results and discussion

The datasets generated in Section 2.1 are used to train the data-driven models of process operations. Firstly, the training models are presented individually given by KNN, SVR, DT, and ANN. And then the most efficient model is selected to be combined with PSO for searching the optimal operations of PDH process.

3.1 Data-driven models of the PDH process

As mentioned in Section 2.1, 8750 groups and 1024 groups of datasets for model training and testing are generated by changing the values of the five key inputs (Table 3), which covers all possible operating region of the PDH process related to total profit (y1) and propylene yield (y2) (Fig. 2). The following results present the performances of the four machine leaning algorithms for the process modeling.

3.1.1 KNN for process modeling

It is noted that three options can be adjusted to improve the KNN model, i.e., the value of k nearest neighbors, the type of distance calculation (Euclidean distance, Manhattan distance, and Chebyshev distance), and the calculations of λi for prediction (8) and (9), as shown in Section 2.2.1. Figure 4 presents the R2 of test data with different combinations of k, distance calculations, and predictive equations, where R2 increases significantly with k and reaches the maximum in a certain k value; but after this k value, R2 is slightly inverse to k increment. When k is very small, the model is very sensitive to the nearest neighbors. But when k becomes too large, the distinct data with great differences are classified into the same category, which leads to the error in model prediction.
Fig.4 R2 of test data with different combinations of k, distance calculations, and predictive equations based on (a) average KNN and (b) distanced-weighted KNN.

Full size|PPT slide

Table 4 lists the best KNN models under different combinations of k, distance calculations, and predictive equations, where the combination of Chebyshev distance and distance-weighted KNN can get the best results of y1 with (k = 76) and y2 with (k = 77).
Tab.4 The best KNN models under different combinations of k, distance calculations, and predictive equations
Output Average KNN Distance-weighted KNN
Chebyshev distance Euclidean distance Manhattan distance Chebyshev distance Euclidean distance Manhattan distance
k R2 k R2 k R2 k R2 k R2 k R2
y1
/(M$·year−1)
76 0.99899 217 0.99841 232 0.99870 76 0.99900 217 0.99844 234 0.99872
y2/% 77 0.99890 217 0.99827 232 0.99858 77 0.99891 217 0.99830 234 0.99860

3.1.2 SVR for process modeling

As stated in Section 2.2.2, parameter c determines the trade-off between the training error and flatness of the SVR model, and the kernel functions are the core of pattern analysis for finding and studying general types of relations. Thus, the three kernel functions (linear function, polynomial function, and radial basis function) and various values of c are taken into account for SVR model training. Figure 5 describes the R2 of test data under different combinations of kernel functions and parameter c. It can be found in Fig. 5 that the radial basis function performs the best when c increases to a certain value; the linear function is less affected by c but gives low R2 while the polynomial function performs similarly to the radial basis function for predicting y2, but will suffer from overfitting problem for predicting y1 when c increases to 100.
Fig.5 R2 of test data under different combinations of kernel functions and parameters c in SVR models.

Full size|PPT slide

Table 5 presents the best R2 of SVR models under the specific c and kernel functions, and indicates that the SVR model with radial basis function can predict the most accurate y1 and y2 in the conditions of c = 5 and c = 90, respectively.
Tab.5 The best SVR models under different combinations of Kernel functions and parameter c
Output Linear function (linear) Polynomial function (poly) Radial basis function (rbf)
c R2 c R2 c R2
y1/(M$·year−1) 0.0002 0.84363 3 0.81170 5 0.99050
y2/% 0.07 0.24939 0.04 0.97768 90 0.99910

3.1.3 DT for process modeling

In the DT algorithm (Section 2.2.3), the minimum number of training instances in a single node (N) is the key control parameter to determine the DT model. If N is too small, a singular matrix will occur, which may cause infeasible calculation to obtain the regression matrix (K). However, a higher number of N will result in a smaller number of child nodes, and leads to inaccurate prediction. As shown in Fig. 6, when N ranges from 2501 to 3875, the DT models perform very well for predicting y1 (R2 = 0.99482) and y2 (R2 = 0.99239) while the predictions of y1 and y2 have a sharp decline (R2 = 0.08526 and 0.04145) when N increases over 3876 and no DT model is found when N is lower than 2500 due to the singular matrix. Thus, the DT models generated in the condition of 2500<N<3876 were used for the process prediction in this study.
Fig.6 R2 of test data under different parameter N in DT models.

Full size|PPT slide

3.1.4 ANN for process modeling

Four activation functions and five optimization algorithms have been introduced for ANN training in Section 2.2.4. Thus, the combination of different activation functions and optimization algorithms are compared. Figure 7 shows the above combinations with the best operating parameters under the ANN structure of 10 nodes in the hidden layer (l2). It is observed that all combinations can achieve high accuracy when the training sets are sufficient. The activation functions ReLU and Tanh, and the optimization algorithms Adam and RMSProp perform much better than the others. Some methods have fluctuation in the beginning steps showing their procedure of jumping out of a local optimum to find the global optimization. Table 6 lists the best ANN models under the different combinations of nodes in l2, activation functions, and optimization algorithms, where the combination of Softplus and RMSProp can get the best results of prediction for y1 and y2.
Fig.7 R2 of test data with different optimization algorithms based on different activation functions: (a) ReLU, (b) softplus, (c) sigmoid, and (d) Tanh.

Full size|PPT slide

Tab.6 The best ANN models under different combinations of nodes in L2 (N), activation functions, and optimization algorithms
Output ReLU Tanh Softplus Sigmoid
N R2 N R2 N R2 N R2
Root mean square prop (RMSProp)
y1/(M$·year−1) 12 0.9934 8 0.9938 10 0.9940 8 0.9841
y2/% 12 0.9894 12 0.9897 12 0.9902 8 0.9901
Adaptive moment estimation (Adam)
y1/(M$·year−1) 12 0.9934 6 0.9936 6 0.9937 10 0.9847
y2/% 12 0.9891 12 0.9901 12 0.9901 10 0.9897
Adaptive gradient (AdaGrad)
y1/(M$·year−1) 10 0.9850 12 0.9817 8 0.9855 6 0.9805
y2/% 12 0.9897 6 0.9910 10 0.9723 8 0.9701
Momentum
y1/(M$·year−1) 6 0.9922 8 0.9833 12 0.9850 12 0.9817
y2/% 12 0.9903 10 0.9757 8 0.9702 10 0.9696
Gradient descent
y1/(M$·year−1) 8 0.9900 12 0.9827 12 0.9848 12 0.9820
y2/% 6 0.9899 12 0.9806 8 0.9713 12 0.9697

3.2 Optimization of the PDH process operations

For better optimization ability, the impact of key operating parameters (stated in (40)) on controlling PSO procedure is also considered, including the acceleration factors c1 and c2, and the inertia coefficient ω (Fig. 8). It can be found that, c1 does not influence the global optimization. A larger value of c1 will enhance the individual local optimization ability of the particles, which is not observed in Fig. 8, while c2 improves the global optimization ability of the particle swarm, thereby increasing the probability of finding the global optimal value as described in Fig. 8. A medium range (such as 0.45−3.0) of inertia coefficient ω is required. In this range of ω, most values of acceleration factors c1 and c1 can find the global optimization efficiently. To balance computational cost and optimization ability, proper parameters were selected for further improvements.
Fig.8 The impact of key operating parameters (c1, c2) on controlling PSO procedure: (a) ω = 0.05, (b) ω = 0.45, (c) ω = 3.0, and (d) ω = 10.

Full size|PPT slide

Since the training data generated from the PDH Process have covered most of the process operating region (Fig. 2), all the four machine learning methods can achieve very high accuracy in the prediction with suitable algorithm parameters. However, regarding process optimization, PSO algorithm (described in Section 2.2.5) need to evaluate the process objective values (y1 or y2) based on the data-driven model by updating the position vector (xik) in each iteration k and particle i. It is noted that KNN must compute the distance and sort all the training data at each prediction, which is very slow as there are a large number of training data (8750) in our case. Thus, due to the huge computational resources required by KNN, it is not recommended to use KNN for evaluating the process objective values in the PSO procedure. As demonstrated in the former section, SVR, DT, and ANN can learn knowledge from the training data, and give simple and accurate models for process operations. After obtaining data-driven models, the models with the biggest R2 are selected for optimizing. Therefore, DT model (R2 = 0.99482) is used to predict y1, and SVM model (R2 = 0.99910) is used to predict y2.
In the PSO procedure, the particle group can reach a steady state around 5−10 iterations, showing a fast convergence. Table 7 presents the solutions of maximization of total profit (y1) and propylene yield (y2) obtained by PSO. These optimal solutions are identical to the Aspen Plus results under the same input parameters. Thus, the efficiency of our hybrid method is validated. It seems that enumeration is more straightforward than the data-driven models for optimization as there are four variables (x1, x2, x4 and x5) which reach the boundary of training set in this study. However, the proposed optimization method is capable of solving general nonlinear problems. Specially, if the product yield (y2) is restricted in a certain range, using the data-driven models is more effective than enumeration for maximizing total annual profit (y1), as there is a trade-off between y1 and y2. Improving operating conditions for high yield may increase annual profit but it may increase the cost of operating, resulting in low profit. For example, increasing the temperature within a certain range can promote the endothermic reaction but requires a lot of heat, so there is an optimal temperature that brings about maximum profit but not the maximum yield.
Tab.7 The optimal solutions obtained by PSO a)
Max x1/(kmol·h−1) x2 x3/bar x4/°C x5/bar Obj Aspen Plus
validation
Error/%
y1/(M$·year−1) 840 1.6 9.646 610 1.8 79438 79957 0.65
y2/(%) 840 1.6 6.500 610 1.8 83.53 83.54 0.01

a) x1: feed flowrate; x2: hydrogen to propane ratio; x3: propylene rectifier pressure; x4: reactor temperature; x5: reactor pressure, y1: total annual profit; y2: propylene yield.

4 Conclusions

Machine learning algorithms may not perform well if the training data are insufficient. There is a growing concern about the lack of variety from industrial data, as the process must run in a relatively steady state. This study presents an industry-based and mechanistic simulation strategy to generate sufficient data for model training and testing. The generated data sets can cover all possible process operating region. Based on the proposed data generation strategy, four machine learning algorithms, KNN, SVM, DT, and ANN all obtained highly accurate models for the process operations. Moreover, the most efficient models are selected to be combined with PSO to find the optimal process operations under maximum profit or product yield. The proposed data collection strategy and the hybrid framework integrating machine learning with PSO are generalized to a broader class of chemical processes.

Acknowledgements

This work was supported by the “Zhujiang Talent Program” High Talent Project of Guangdong Province (Grant No. 2017GC010614); and the National Natural Science Foundation of China (Grant No. 22078372).
1
Jenck J F, Agterberg F, Droescher M J. Products and processes for a sustainable chemical industry: a review of achievements and prospects. Green Chemistry, 2004, 6(11): 544

DOI

2
Vooradi R, Anne S B, Tula A K, Eden M R, Gani R. Energy and CO2 management for chemical and related industries: issues, opportunities and challenges. BMC Chemical Engineering, 2019, 1(1): 7

DOI

3
Worrell E, Cuelenaere R F A, Blok K, Turkenburg W C. Energy consumption by industrial processes in the European Union. Energy, 1994, 19(11): 1113–1129

DOI

4
Ding J, Modares H, Chai T, Lewis F L. Data-based multiobjective plant-wide performance optimization of industrial processes under dynamic environments. IEEE Transactions on Industrial Informatics, 2016, 12(2): 454–465

DOI

5
Hammer M. Management Approach for Resource-Productive Operations. Wiesbaden: Springer Gabler, 2018, 11–26

6
Ibrahim D, Jobson M, Guillén-Gosálbez G. Optimization-based design of crude oil distillation units using rigorous simulation models. Industrial & Engineering Chemistry Research, 2017, 56(23): 6728–6740

DOI

7
Pattison R C, Gupta A M, Baldea M. Equation-oriented optimization of process flowsheets with dividing-wall columns. AIChE Journal. American Institute of Chemical Engineers, 2016, 62(3): 704–716

DOI

8
Menezes B C, Kelly J D, Grossmann I E. Improved swing-cut modeling for planning and scheduling of oil-refinery distillation units. Industrial & Engineering Chemistry Research, 2013, 52(51): 18324–18333

DOI

9
Bo D, Yang K, Xie Q, He C, Zhang B, Chen Q, Qi Z, Ren J, Pan M. A novel approach for detailed modeling and optimization to improve energy saving in multiple effect evaporator systems. Industrial & Engineering Chemistry Research, 2019, 58(16): 6613–6625

DOI

10
Butler K T, Davies D W, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature, 2018, 559(7715): 547–555

DOI

11
Hotta S, Kiyasu S, Miyahara S. Pattern recognition using average patterns of categorical k-nearest neighbors. In: Proceedings of the 17th International Conference on Pattern Recognition. Washington, DC: IEEE, 2004

12
Adeniyi D A, Wei Z, Yongquan Y. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 2016, 12(1): 90–108

DOI

13
Dang T T, Ngan H Y T, Liu W. Distance-based k-nearest neighbors outlier detection method in large-scale traffic data. In: IEEE International Conference on Digital Signal Processing (DSP). Washington, DC: IEEE, 2015

14
Zhu W, Sun W, Romagnoli J. Adaptive k-nearest-neighbor method for process monitoring. Industrial & Engineering Chemistry Research, 2018, 57(7): 2574–2586

DOI

15
Al-Jamimi H A, Bagudu A, Saleh T A. An intelligent approach for the modeling and experimental optimization of molecular hydrodesulfurization over AlMoCoBi catalyst. Journal of Molecular Liquids, 2019, 278: 376–384

DOI

16
Yang D, Zhong W, Chen X, Zhan J, Wang G. Structure optimization of vessel seawater desulphurization scrubber based on CFD and SVM-GA methods. Canadian Journal of Chemical Engineering, 2019, 97(11): 2899–2909

DOI

17
Golkarnarenji G, Naebe M, Badii K, Milani A S, Jazar R N, Khayyam H. Support vector regression modeling and optimization of energy consumption in carbon fiber production line. Computers & Chemical Engineering, 2018, 109: 276–288

DOI

18
Yu Z, Yousaf K, Ahmad M, Yousaf M, Gao Q, Chen K. Efficient pyrolysis of ginkgo biloba leaf residue and pharmaceutical sludge (mixture) with high production of clean energy: process optimization by particle swarm optimization and gradient boosting decision tree algorithm. Bioresource Technology, 2020, 304: 123020

DOI

19
Hough B R. Computational approaches and tools for modeling biomass pyrolysis. Dissertation for the Doctoral Degree. Washington: University of Washington, 2016, 78–94

20
Saleem M, Ali I. Machine learning based prediction of pyrolytic conversion for red sea seaweed. In: 7th International Conference on Biological, Chemical & Environmental Sciences. Budapest (Hungary), 2017, 27–31

21
Hough B R, Beck D A, Schwartz D T, Pfaendtner J. Application of machine learning to pyrolysis reaction networks: reducing model solution time to enable process optimization. Computers & Chemical Engineering, 2017, 104: 56–63

DOI

22
Mirshahvalad H, Ghasemiasl R, Raoufi N, Malekzadeh dirin M. A neural network QSPR model for accurate prediction of flash point of pure hydrocarbons. Molecular Informatics, 2019, 38(4): 1800094

DOI

23
Wang Z, Su Y, Jin S, Shen W, Ren J, Zhang X, Clark J H. A novel unambiguous strategy of molecular feature extraction in machine learning assisted predictive models for environmental properties. Green Chemistry, 2020, 22(12): 3867–3876

DOI

24
Sosa A, Ortega J, Fernández L, Palomar J. Development of a method to model the mixing energy of solutions using COSMO molecular descriptors linked with a semi-empirical model using a combined ANN-QSPR methodology. Chemical Engineering Science, 2020, 224: 115764

DOI

25
Su Y, Wang Z, Jin S, Shen W, Ren J, Eden M R. An architecture of deep learning in QSPR modeling for the prediction of critical properties using molecular signatures. AIChE Journal. American Institute of Chemical Engineers, 2019, 65(9): e16678

DOI

26
Schweidtmann A M, Huster W R, Lüthje J T, Mitsos A. Deterministic global process optimization: accurate (single-species) properties via artificial neural networks. Computers & Chemical Engineering, 2019, 121: 67–74

DOI

27
Chandrasekaran M, Tamang S. ANN-PSO integrated optimization methodology for intelligent control of MMC machining. Journal of the Institution of Engineers (India): Series C, 2017, 98(4): 395–401

28
Zhang X, Zhou T, Zhang L, Fung K Y, Ng K M. Food product design: a hybrid machine learning and mechanistic modeling approach. Industrial & Engineering Chemistry Research, 2019, 58(36): 16743–16752

DOI

29
Zhu Y, Hou Z, Qian F, Du W. Dual RBFNNs-based model-free adaptive control with aspen HYSYS simulation. IEEE Transactions on Industrial Informatics, 2016, 28(3): 759–765

30
Myers D N, Myers D N, Zimmermann J E. US Patent, 20100916969, 2010-11-1

31
Chin S, Radzi S, Maharon I, Shafawi M. Kinetic model and simulation analysis for propane dehydrogenation in an industrial moving bed reactor. World Academy of Science, Engineering and Technology, 2011, 52: 183–189

32
Loc L C, Gaidai N, Kiperman S, Thoang H S. Kinetics of propane and n-butane dehydrogenation over platinum-alumina catalysts in the presence of hydrogen and water vapor. Kinetics and Catalysis, 1996, 37(6): 790–796

33
Røsjorde A, Kjelstrup S, Johannessen E, Hansen R. Minimizing the entropy production in a chemical process for dehydrogenation of propane. Energy, 2007, 32(4): 335–343

DOI

34
García-Pedrajas N, del Castillo J A R. A proposal for local k values for k-nearest neighbor rule. IEEE Transactions on Industrial Informatics, 2017, 28(2): 470–475

35
Zhu F, Gao J, Xu C, Yang J, Tao D. On selecting effective patterns for fast support vector regression training. IEEE Transactions on Industrial Informatics, 2018, 29(8): 3610–3622

36
Loh W Y. Classification and regression trees. Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery, 2011, 1(1): 14–23

DOI

37
Hsu K, Gupta H V, Sorooshian S. Artificial neural network modeling of the rainfall-runoff process. Water Resources Research, 1995, 31(10): 2517–2530

DOI

38
Yang Q, Yang Z, Zhang T, Hu G. A random chemical reaction optimization algorithm based on dual containers strategy for multi-rotor UAV path planning in transmission line inspection. Concurrency and Computation, 2019, 31(12): e4658

DOI

39
Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks. Washington, DC: IEEE, 1995

Outlines

/