RESEARCH ARTICLE

An integrated approach for machine-learning-based system identification of dynamical systems under control: application towards the model predictive control of a highly nonlinear reactor system

  • Ewan Chee 1 ,
  • Wee Chin Wong 2 ,
  • Xiaonan Wang , 1
Expand
  • 1. Department of Chemical & Biomolecular Engineering, Faculty of Engineering, National University of Singapore, Singapore 117585, Singapore
  • 2. Chemical Engineering & Food Technology Cluster, Singapore Institute of Technology, Singapore 138683, Singapore

Received date: 02 Jan 2021

Accepted date: 16 Mar 2021

Published date: 15 Feb 2022

Copyright

2021 Higher Education Press

Abstract

Advanced model-based control strategies, e.g., model predictive control, can offer superior control of key process variables for multiple-input multiple-output systems. The quality of the system model is critical to controller performance and should adequately describe the process dynamics across its operating range while remaining amenable to fast optimization. This work articulates an integrated system identification procedure for deriving black-box nonlinear continuous-time multiple-input multiple-output system models for nonlinear model predictive control. To showcase this approach, five candidate models for polynomial and interaction features of both output and manipulated variables were trained on simulated data and integrated into a nonlinear model predictive controller for a highly nonlinear continuous stirred tank reactor system. This procedure successfully identified system models that enabled effective control in both servo and regulator problems across wider operating ranges. These controllers also had reasonable per-iteration times of ca. 0.1 s. This demonstration of how such system models could be identified for nonlinear model predictive control without prior knowledge of system dynamics opens further possibilities for direct data-driven methodologies for model-based control which, in the face of process uncertainties or modelling limitations, allow rapid and stable control over wider operating ranges.

Cite this article

Ewan Chee , Wee Chin Wong , Xiaonan Wang . An integrated approach for machine-learning-based system identification of dynamical systems under control: application towards the model predictive control of a highly nonlinear reactor system[J]. Frontiers of Chemical Science and Engineering, 2022 , 16(2) : 237 -250 . DOI: 10.1007/s11705-021-2058-6

1 Introduction

Against a backdrop of data proliferation and a surging enthusiasm in the applications of artificial intelligence in industry, there has been a growing interest in the community to incorporate the latest results reported in the machine learning (ML) literature to infer dynamical systems from data more effectively [1]. This inference of dynamical systems is also known as system identification (SysID). This discipline has vital industrial pertinence because it yields representations of the process that, firstly, improve system understanding, and secondly, enable the simulation of complex systems to obtain time-series predictions. While this discipline is anchored in statistics and applied mathematics and is thus well-situated to take advantage of the data-driven industry 4.0 revolution, it remains an art form in practice, not least due to the numerous design decisions that practitioners encounter as well as the large number of parameters and hyperparameters that need to be estimated.

1.1 Model predictive control

A prominent application of SysID can be found in advanced model-based process control strategies such as model predictive control (MPC). MPC incorporates knowledge about the process through a system model and solves a dynamic optimization problem at each time step to yield an optimal control sequence. It then applies the first control action in this sequence to the system before proceeding to re-solve the optimization problem at the next time step [2]. MPC can be especially useful because, unlike common applications of conventional process controllers, it natively supports multiple-input multiple-output (MIMO) formulations. It also allows system constraints, which represent process limits, to be directly included in the problem formulation, thereby ensuring plant safety and reliability in an explicit manner [3]. MPC has seen successful adoption in industry to ensure safe, reliable, and profitable system operation, with key examples present in process manufacturing, the control of complex machinery, as well as other value-adding activities [3].
The system model plays a determinant role in MPC. It should adequately describe the process dynamics across the controller operating range while remaining simple enough to allow fast optimization [4]. These models could either be constructed from first principles, which might be difficult and costly for complex processes [5], or directly inferred from empirical data [6]. These models could also have different forms: linear, nonlinear, hybrid or nonparametric, among others. MPC products have typically relied on linear models to exploit efficient optimization algorithms [7]. However, such linear MPCs might struggle to offer effective control outside a limited operating range [5,8]. Indeed, achieving tight control across a process’ entire operating range is generally difficult, with other conventional applications of widely-used methods like proportional-integral-derivative controllers also ill-equipped to handle systems that are highly nonlinear around their operating points [9,10]. Nonlinear MPCs (NMPCs) can overcome these limitations by employing nonlinear system models and system constraints, which have more flexible forms that permit better representation of the process over a wider operating range. This comes however at the expense of needing to solve non-convex optimization problems quickly and precisely [11]. References [3] and [12] describe successful applications of different MPC formulations in a broad class of academic and industrial problems.
Recent advances in computing and statistics have generated novel techniques that could further improve the viability of black-box nonlinear SysID methods for model-based control. Such black-box methods are useful when process understanding is limited to begin with, or when mechanistic models derived from first principles take prohibitively long to evaluate for fast-sampling applications. Reference [13] demonstrated the neural network-based identification of single-input single-output system models for a pH neutralization process and showed that the resulting NMPC was able to track set-point changes better than a linear MPC. References [14] and [15] reported the use of recurrent neural networks and ensemble techniques to generate MIMO system models for NMPC of a continuous stirred tank reactor (CSTR) system, showing that these NMPCs performed better than linear MPCs at rejecting process disturbances.

1.2 Key contributions

Related works in the SysID literature have typically focused on presenting methodologies with special properties. Reference [1] reported a specific methodology for learning efficient models through sparse regression, for instance. The present work is instead designed to be expository and comprehensive in nature, and its main contribution is to articulate an integrated, overarching approach for learning the dynamics of continuous-time systems under control. This work is targeted at interested or starting practitioners in need of a systematic, integrated, and rigorous methodology for applying ML SysID techniques for model-based control purposes.
The key steps of this integrated approach are as follows: (i) the experimental design for data generation, (ii) the regression workflow which consists of data preprocessing techniques and the critical but often overlooked hyperparameter optimization (HO) phase, and (iii) the integration of the learnt model into an ML-NMPC controller. To formalize the goals at each step of this framework, a rigorous exposition of the ML techniques employed at each step, which include feature generation and Bayesian optimization (BO), is provided. While this framework imposes a logical structure to the flow of the data-driven tasks, it remains sufficiently flexible to allow different techniques to be employed at each stage, which future-proofs this framework in the face of further advances in ML.
This approach is exemplified through the case study of a highly nonlinear MIMO CSTR process. The resulting ML-NMPCs were evaluated based on both control performance and solution times on both servo and regulator problems to ensure the solution’s applicability and practicality in real-world industrial settings. The resulting ML-NMPCs were finally benchmarked against an NMPC that employs the true system model. This NMPC is demonstrated to be able to solve all the control problems effectively.
This work is organized as follows. Section 2 formalizes the proposed integrated approach. Section 3 details the CSTR case study, presents the three control scenarios, and reports practical details in implementing this approach. Section 4 presents results and discussions. Section 5 concludes this piece and offers recommendations for future research.

2 Methodology

In the following exposition, x K q is a column vector, with K representing some field and q1. The notation [a,b] represents a row vector when ( a,b)K2, and the T superscript represents the vector transpose operation, such that [a,b]T is a column vector. The symbol represents column vector concatenation, and it returns another column vector, such that x1x2= [x 1T,x2T]T and i {1,...,n}xi=[ x1T,... ,xnT]T, where xi K qi. An n×p matrix with values in K is denoted Mn,p(K), while n-order square matrices are Mn(K). The symbol is also used to represent the vertical concatenation of matrices with the same number of columns, such that i{1,..., n} Bi= [B1T,..., BnT]T, where Bi Mq i,p (K). Sets are represented by {1 ,...,k }. Closed intervals are defined as [ α;β], with α and β real numbers. Left and right half-open intervals are ]α;β] and [ α;β[ respectively, with open intervals denoted as ] α;β[. Given x(t)Kq a continuous-time vector-valued quantity, where t +, x[k] represents its value associated with the discrete time-step k, such that x[k]=x( tk), where tk is the real time associated with the discrete time-step k. Δ finally represents the difference operator, such that Δx [k]:=x[k +1]x[k].
Equations (1) and (2) describe a general continuous-time state representation for a dynamical system.
x˙=f( x,u),
y=g( x,u)+ϵ,
where xa, ub and y c are the state, control, and output vectors respectively, with f:a×ba a general nonlinear state equation and g:a× b c a general nonlinear output equation; ϵ is a random variable corresponding to measurement noise. In what follows, we assume complete state information, such that y=x+ϵ. For a MIMO system, a, b and c are integers that are strictly greater than 1.
If f is a highly nonlinear function, conventional applications of widely-used strategies like proportional-integral-derivative control and linear MPC might not offer effective control beyond a limited range. This motivates the study of nonlinear control strategies like NMPC, which the following subsection describes.

2.1 Nonlinear model predictive control

Given a discrete-time controller, Eq. (3) defines the cost function J at any given discrete time-step k :
J(ΔU):=tr ( (YY*)T Q(Y Y*))+tr (Δ UTR ΔU),
where ΔU:=[Δu [k], Δu[k+1],... ,Δu[k+m1]]T Mm,b() is the control sequence over the control horizon m *, Y: =[y[k+ 1|k],... ,y[k+p| k]]TMp,c() is the output trajectory over the prediction horizon p *, Y*Mp,c() is the system set-point over p, Q Mc (+ ) and RMb(+) are diagonal non-negative weight matrices whose coefficients reflect the relative importance of the corresponding terms in the cost function, and tr(·) represents the trace of the matrix. The control actions are applied in a zero-order hold fashion throughout the sampling interval, such that u(t)=u[k] , t[tk; tk+1].
Equations (4–8) formulate the NMPC problem, which is solved in a receding horizon fashion:
min ΔUJ(ΔU) ,
s.t.Y=F(x [k], u[k],ΔU0 pm, b),
Δu minΔu [l] Δu max, l {k, ...,k+m 1},
uminu[ l]umax , l{k ,...,k+ m1},
yminy[ l]ymax , l{k +1,...,k+p},
where F is a nonlinear function which generates the discrete-time output trajectory given the system’s initial state and the control sequence, and 0pm,b is a zero matrix of dimension (p m)×b. The concatenation of 0pm,b is equivalent to defining Δu [l]=0,l{k +m,..., k+p1}, then appending these zero values to ΔU such that the resulting matrix has p rows. Therefore, u remains constant after the control horizon. F can be generated from f as Eq. (9) shows:
F(x[k],u[k] ,ΔU 0pm,b)= [G (1),...,G(p)]T,
where G (l):=x(tk)+ tktk+lf (x(τ),u (τ))dτ,l{ 1,... ,p}. Equations (6–8) represent constraints on the control actions’ rates of change, constraints on their magnitude, and constraints on system outputs, respectively.
This NMPC problem is a constrained nonlinear optimization problem which can be solved using active set or interior point methods. As f and G both return the same vector Y, they are equivalent from the controller’s perspective. We thus refer to f as the NMPC system model in what follows. The continuous-time SysID approach was pursued here because it may be more intuitive to scientists and engineers [16]. With that said, the proposed approach is potentially also amenable to discrete-time formulations after suitable modifications.
If f or F is not known a priori, it becomes necessary to first identify it from empirical data. The following subsection describes how recent advances in ML could be applied in the context of a broader integrative SysID approach for nonlinear control.

2.2 Black-box direct continuous-time SysID

Figure 1 illustrates the three-step framework for directly identifying f as NMPC system models from empirical data. The data generation procedure involves performing experiments that perturb the system across the controller’s operating region to reveal system dynamics. The regression workflow then proceeds to infer nonlinear continuous-time functions as NMPC system models. These system models are finally integrated into the ML-NMPC controllers, which are subsequently tested to evaluate their closed-loop control performance.
Fig.1 Integrated data-driven approach for identification of continuous-time NMPC system models.

Full size|PPT slide

2.2.1 Data generation procedure

The design of the input signal is critical to generating a rich data set, and the signal proposed in this work consists of simultaneous random step changes for all manipulated variables. The experimental sampling time h is also an important parameter as the following discussion will show.
To generate an input signal of time length tsim+ * for a single manipulated variable, a partition of the interval [0 ;tsim] is first performed by performing a random jump forward in time from the start of the interval, with the jump length in time units (tus) sampled from a discrete uniform distribution U ({lmin ,lmin+h, ...,lmax}), where lmin and lmax are the minimum and maximum jump lengths in tus respectively and are both integer multiples of h, then repeating these jumps until tsim is reached. The value of the manipulated variable at each subinterval is then sampled from a continuous uniform distribution U([ umin;umax]), where umin and umax are the lower and upper bounds of the manipulated variable as specified in the NMPC formulation, respectively. This process is repeated for each manipulated variable.
Figure 2 shows an example of an input signal and the system response when the input signal is applied to the system. As the control values reached are uniformly represented across the controller’s operating region, it facilitates the learning of ML models that interpolate well to give good dynamical predictions within this region.
Fig.2 Example input signal (green) and system response (red), where t=10000, l min= 10, lmax=100, u min= 0, umax=1.

Full size|PPT slide

Given an input signal ( ui)i{ 0,... ,tsim/h} and the system response ( yi)i{ 0,... ,tsim/h}, where ui and yi are potentially multidimensional, we pose x˜ i:=y iui and y˜ i:= [(yi+1yi )/h], i{0,...,tsim/h1}.  x˜i and  y˜i represent data points and their corresponding labels, which are alternatively the independent and dependent variables respectively, and this supervised learning task is formalized in Section 2.2.2.
As  y ˜i contain first-order approximations of the continuous derivatives representing the system dynamics, values for h that are small with respect to the system’s characteristic time would be ideal, i.e., at least an order of magnitude smaller. In this work, the initial process values are taken to be the set-point values. Combining data points from multiple experimental runs is also straightforward, and it is indeed desirable to use as many data points as could possibly be collected. This is because, in general, model variance decreases as the number of data points increases, such that more precise estimates for the model parameters are achieved.

2.2.2 Regression workflow

We pose X˜:= [ x˜i]i{1,...,N}T the design matrix with {x˜i }i {1,...,N} the data set of size N*, and Y˜:= [ y˜i]i{1,...,N}T the labels matrix with {y˜i }i {1,...,N} the set of labels associated to {x˜i}.
Before performing the regression task, the data set is split into training, validation, and test sets. The training set contains the data used to fit a given model. The validation set contains the data used to provide an unbiased evaluation of a model fit on the training set while tuning the model hyperparameters. The test data finally contains the data used to provide an unbiased evaluation of the tuned models. The validation and test sets are “held out” from the training set to prevent data leakage, which is when information outside the training set is used to create the model. Data leakage causes the estimates of the final model’s general predictive power to be overly optimistic. We note Itrain, Ival and Itest the set of indices of X˜ corresponding to these sets, with card( Itrain)=Ntrain, card(Ival) =Nval and card( Itest)=Ntest, such that Ntrain+Nval+ Ntest=N. X˜train:= [x˜ i] iItrainT is the training design matrix with Y˜ train:=[y˜i]iItrainT its labels matrix. X ˜val, Y˜val, X˜test and Y˜test are defined similarly.
Data preprocessing encompasses a broad set of strategies that prepares the data set for the model training phase in such a way as to facilitate the learning of models that generalize well. It generally involves data cleaning, which removes outliers that might have resulted from sensor faults or data input errors, and feature engineering, which encompasses other techniques like feature transformation, feature generation, and feature extraction. In this work, we focus on the generation of polynomial and interaction features, which represent higher-order and coupled nonlinear terms in the system dynamics. We pose  X˜=[X˜j]j{1,...,Nf}, where X˜j{1,...,Nf} are feature vectors with Nf the number of features. Equation (10) formalizes the feature generation procedure of degree d *:
To illustrate this procedure, suppose that X˜=[ X˜1,X˜2]. A feature generation of degree 2 yields the following matrix:
X˜(2)=[ 1, X˜1,X˜2,X ˜12, X˜1 X˜2, X˜22] .
Data standardization is also performed to allow all features to be considered equally during model training. Equation (12) describes this operation:
X ˜·,std (d)=[ X˜·,j(d) X˜¯,j (d) σ^,j] j{1,..., Nf( d) },
where X˜·, j(d ) represents the j-th column of X˜·(d), X˜¯,j(d) the sample mean of X˜,j(d), σ^,j the population standard deviation of X˜,j (d), and Nf(d) the total number of polynomial and interaction features. · is a placeholder referring either to the training, validation, or test sets. If · refers to either X˜train or X˜val, then refers to X˜train. If · refers to X˜test, then refers to . This mapping of · to is done to prevent data leakage.
The model training phase corresponds to solving for a prediction function f such that y˜f(x˜). Suppose that we have n candidate models, i.e., n different regression forms for f. For each fi{1,..., n}, we solve the mean squared error (MSE) minimization problem in Eq. (13):
θ ^i= argminθi Θi 1 NtrainjItrain(y ˜jfi, θi ( xj˜))2,
where Θi represents the parameter space for fi. fi, θ^i finally represents the solution to this regression problem, and this procedure is repeated for each fi. Let η iHi be the set of hyperparameters for fi. HO involves training fi with different ηi, then selecting η ^i which satisfies Eq. (14):
η^i= argminηiHi 1NvaljIval( y˜j fi, θ^i;ηi(xj˜))2.
BO is employed in this work for HO, with exhaustive grid searches used instead if the hyperparameter space is countable and sufficiently small. BO is a class of optimization techniques that applies Bayes Theorem to quickly find the global optima of a general nonlinear multidimensional function [17]. Unlike grid and random searches, BO takes past evaluations into account by using them to build a probabilistic surrogate model of the HO objective function. It uses this model to select the most promising hyperparameters at each iteration. The objective function is then evaluated at those hyperparameters, with this result subsequently used to update the surrogate model. This iterative procedure therefore yields posteriors that better approximate the HO objective function with each evaluation, from which the best hyperparameters can be hoped to be more easily located [17]. As function evaluations for Eq. (14) are expensive, since an iteration involves training an entire model, such a principled approach which balances exploration of the hyperparameter space and exploitation of best found values is desirable [18,19].
The formalization of BO consists of five key elements. The first element is Hi, the hyperparameter space, and the second is the HO objective function, which was present in Eq. (14) and is explicated here:
J( ηi) := 1NvaljIval(y ˜jfi, θ^i;ηi(xj˜))2.
The third element is the surrogate model, which is an alternative probabilistic representation of the objective function that is cheaper to evaluate. The fourth is the acquisition function which is defined over Hi and is computed from the surrogate model, and it quantifies the desirability of sampling a given point in Hi next. The final element is the BO history, which is a set containing the s samples drawn from J so far. The surrogate model is referred to as the posterior distribution when it is conditioned by DBO, s, and is represented by pDBO,s. We also drop the i subscript in an abuse of notation for the remaining technical development of BO.
Common acquisition functions include the probability of improvement, expected improvement (EI), entropy search, and lower confidence bound. At the ( s+1 )-th iteration, these functions have the generic form shown in Eq. (16):
As +1 (η)=EDBO ,s[us+1 (η)],
where us +1 :H is a utility function to maximize. EI is the most common acquisition function and is used in this study, and it has the following utility function at the ( s+1 )-th iteration:
u EI,s+ 1(η)=max(0,Js*J(η)),
where J s*:= min v{1,..., s}J(η v) is the best value for J found after s iterations. The acquisition function AEI,s+1(η) at the ( s+1 )-th evaluation is thus:
A EI,s+ 1(η)= max (0,Js * J) pDBO,s( J|η)dJ= Js*(Js*J)pDBO,s(J|η)dJ.
This finally yields the following maximization problem at the (s+1)-th evaluation:
ηs+ 1= argmaxηHA EI,s+1 (η)= argmaxηH Js*( Js*J)pDBO,s( J|η)dJ.
This therefore corresponds to finding η H which is expected to improve on Js* the most, given the current surrogate model p DBO ,s(J|η).
Common forms for the surrogate model include Gaussian processes and Tree-Parzen estimators (TPEs), with the latter being applied in this work. Technical details for the structure of TPEs and how EI is optimized in the TPE algorithm can be found in ref. [18]. The degree for polynomial feature generation is considered as a hyperparameter for all candidate models, with di {1,...,10}, i.
Each candidate model fi whose best hyperparameters η^i have been found is then trained on X˜train X˜val and tested on X˜test to determine its test MSE, which serves as an indicator of the model’s general predictive ability with respect to other candidate models after hyperparameter tuning. Before integration into ML-NMPC, the models are initialized with their best hyperparameters η^i and trained on the full data set  X˜.
In situations where data is sparse, it would be useful to apply cross-validation techniques to obtain better estimates for the model’s general predictive power during HO. K-fold cross-validation involves, firstly, extracting X˜test from X ˜, then splitting the remaining data into K equal folds, i.e., { X˜i}i{1,...,K}, where K NNtest. K number of models are then trained for each hyperparameter configuration, with each run l{1,..., K} having i{1,..., K}\l X˜l as the training set and X˜l as its validation set. This technique returns K validation MSEs, which can be averaged to obtain a less biased estimate of the model’s goodness. K is typically selected to be 5 or 10. Leave-one-out cross-validation involves selecting K= NNtest and, while it is the most computationally expensive, returns the least biased estimate of the model’s goodness. As it is assumed that enough data has been generated, cross-validation techniques are not employed in this work.

2.2.3 Integration of ML model into NMPC

In the last step of this framework, the models fi, θ^i; η^i obtained from the regression workflow are integrated into the ML-NMPC as system models and used to generate predictions for x˙, which themselves are used to evolve x in time to generate Y given ΔU. To solve this system of first-order ordinary differential equations, numerical methods like Runge-Kutta methods and linear multistep methods can be used.
Tuning of the ML-NMPC control parameters is done before testing its closed-loop performance on the case study. In this work, this consists of modifying m to minimize the weighted integrated absolute error (WIAE) for a ±5% set-point step experiment:
WIAE:=t0t fWy(t) y*(t )1dt,
where WMc(+) is a weight matrix, and t0+ and tf+ are the start and end times of the experiment, respectively. p= 5 was taken to be fixed in this work. The WIAE was selected as the figure of merit for its simple interpretation, though other controller tuning statistics that explicitly include ΔU or that have more elaborate forms could also be used without any loss of generality of this procedure (see ref. [20] for other statistics that could be used in place of the WIAE). The controller’s sampling time hNMPC is typically chosen to be 5 to 10 times faster than the system’s characteristic time as a rule of thumb [21].

3 Case study

3.1 Plant model

The case study, which was also studied in ref. [22], consists of a single CSTR system which houses a reversible chemical reaction described in Eq. (20), where A is the feed species, R the desired product, and S the undesired byproduct. Non-dimensionalized expressions for the system’s dynamical behavior are shown in Eqs. (21–23):
Ak 4k1R k3k2S,
dCAdt=q [CA0CA ]k1CA+k4CR,
dCRdt=q [1CA0 CR]+k1CA+k3[1CACR] [k2+k4] CR,
kj =k0,jexp{ [ ERT 0] j[ 1T 1]},j {1,2, 3,4},
where Ci {A, R,S}+ is the species’ reactor concentration, C A0 + is the feed concentration of A, q + is the feed and the exit flow rate, k j{1,..., 4}+ are the reaction rate constants, k0 ,j{ 1,...,4}+ are the Arrhenius pre-exponential constants, [E/R T0 ]j {1,...,4} + are the normalized activation energies, and T + is the system temperature. q and T are the manipulated variables in this study. The values for the model parameters are reported in Appendix A (cf. Electronic Supplementary Material, ESM).
Figure 3 schematizes the CSTR plant and plots the system’s steady-state conditions as a function of T when q is 0.8 and at its nominal point. To simplify the system, it is assumed that the low-level flow and temperature control loops have negligible dead times and present no steady-state errors. The system’s set-point was selected to maximize the ratio of CR to CA. This corresponds also to maximizing product yield and minimizing any downstream separations costs. There are strong non-linearities in how both CA and CR vary with T at the set-point. There is also input multiplicity, since two different values of T could yield the same value for CR, which suggests that there are sign changes in the determinant of the steady-state gain matrix within the operating region that make linear controllers with integral action unstable [23]. The control of this system at this set-point is therefore a relevant and interesting problem.
Fig.3 (a) Schematic diagram of CSTR plant, where q* and T* represent set-points for q and T respectively; (b) system steady-state conditions for q=0.8 (“Reactor startup” and “Upset recovery” refer to start points for the control problems in Section 3.2).

Full size|PPT slide

To numerically simulate this system, a linear multistep method implemented through the LSODA option in integrate.solve_ivp in the SciPy package (version 1.5) was used [24]. A Gaussian noise N(0,10 3) was included in every measurement of CA and CR to simulate measurement noise. Given the range of values for CA and CR, this corresponds to a measurement error of ca. 1%.

3.2 Control scenarios

Controller performance was evaluated based on 3 scenarios. The first scenario was a servo control problem representing ±5% step changes to the CR set-point. This step size was selected because the control actions needed to stabilize the system at the new set-points are large enough to pose a control problem that is challenging enough to be studied meaningfully. Each experimental run began at the set-point and was followed by the +5% step and the −5% step before reverting to the set-point, such that the experiment had the following CR profile: [ 0.406,0.426 ,0.386,0.406]. Each step lasted for 5 tus for a total experimental length of 20 tus.
The second and third scenarios were regulator control problems corresponding to “reactor startup” and “upset recovery” respectively. These two process disturbances lay at opposite sides of the set-point, with the former having low T and CR values and the latter having an abnormally high T value. A process controller would therefore need to optimize correctly for T values within an extended range of 0.8 and 1.1 to perform well in both these scenarios. All experimental runs for these two scenarios lasted for 10 tus.

3.3 Framework application

To infer f for this case study, we pose x˜ i:= [CA,CR ,q,T]iT and y˜i: =[dCA/dt ,dC R/d t]i T. Note that the polynomial and interaction features for x˜j up to degree d would be appended to this vector before the model training phase as part of the feature generation procedure described in Section 2.2.2. In this work, h=0.1 tu, t sim= 1000, lmin=1, l max= 10, qmin=0.5, q max= 0.9, Tmin=0.7 and T max= 1.1. 5 runs were performed, and an example of an experimental run is shown in Fig. 4.
Fig.4 Example system response (top) to an input signal (bottom) for the CSTR case study.

Full size|PPT slide

In this work, a 60%, 20%, 20% split for the training, validation and test sets was used. The polynomial feature generation and data standardization operations were performed with preprocessing.PolynomialFeatures and preprocessing.StandardScaler from scikit-learn (version 0.23) respectively [25]. The scikit-learn package was also used to perform model training for different candidate models. The optuna package (version 2.3) in Python 3.8 was used to perform BO [26]. The fi studied in this work are common regression forms and include linear regression, which we henceforth refer to as “polynomial” regression, support vector regression (SVR), decision tree (DT) regression, extra trees (ET) regression and gradient boosted (GB) regression. The model hyperparameters ηi for each fi are reported in Appendix B (cf. ESM).
The NMPC problem was solved using the sequential least squares quadratic programming method which was implemented in optimize.minimize from SciPy (version 1.5) [24]. The constraints on the control actions’ magnitude and system outputs, which are Eqs. (7) and (8) respectively, were manifested as soft constraints, with the coefficients of their penalty terms in the objective function taken to be 1000. The nonlinear optimizer was initialized with values sampled from U([ 103,10 3]) at the first time step and was warm-started with the solution from the previous time step in subsequent time steps. In this study hNMPC=0.2 tu.
To evolve the system in time with the ML models in the ML-NMPC, the RK23 solver in optimize.minimize from SciPy (version 1.5) was used. W= diag( [1,3]) was chosen for the WIAE to punish CR deviations 3 times more heavily than CA deviations.

4 Results and discussion

We report firstly the closed-loop performances of an NMPC controller with the exact system model, which will serve as a benchmark for ML-NMPC. The performance of the ML models from the SysID procedure will be quantified through their validation and test R2 and MSE values, following which the closed-loop performances of these ML-NMPCs on the control scenarios will be presented.
In what follows, we refer to the NMPC with the exact system model as the “exact NMPC”. For the ML-NMPCs, we use the shorthand “polynomial-NMPC” to refer to ML-NMPCs with a system model that is linear in the polynomial and interaction features, “DT-NMPC” to refer to ML-NMPCs with a DT system model, and so on for the other ML models. All calculations were performed in a Python 3.8 environment on a computer with 16 GB of RAM and a 4-core Intel® Core™ i7-4790 CPU running at 3.60 GHz.

4.1 Nonlinear MPC with exact system model

Figure 5 shows the results for the exact NMPC with p=5 and different values of m, and it can be observed that m=1 offers the lowest WIAE on the step experiments while also taking the least time to solve. Figure 6 shows the performance for the tuned exact NMPC on the three control scenarios. Stable closed-loop performance was achieved for all control scenarios. The servo, startup, and recovery problems took 3.65, 2.24 and 1.82 s to solve respectively for the m= 1 case, demonstrating average per-iteration solution times in the order of 10 ms. This therefore validates the use of the exact NMPC as a useful benchmark for rapid and stable control of this system. Appendix C (cf. ESM) reports the parameters for this exact NMPC.
Fig.5 (a) WIAE for exact NMPC with p= 5; (b) exact NMPC solution times with p= 5.

Full size|PPT slide

Fig.6 Exact NMPC performance for p= 5 and m=1: (a) servo problem, (b) startup problem, and (c) upset recovery problem.

Full size|PPT slide

While it is expected that greater values of m provide tighter control, albeit with longer solution times, m= 1 was shown to give lower values for WIAE. However, this came with more aggressive control actions, as Fig. 7 shows. The larger the value for m, the smoother the control profiles, with this demonstrating how the changes in the control action constitute an important term in the NMPC objective function. Having selected WIAE as our figure of merit for the sake of interpretability, we keep m=1 as the tuned m value, making again a further mention that other statistics for tuning m could also be employed which explicitly considers the control actions. The ML-NMPCs presented in Section 4.3 will also take m=1 as their tuned m value.
Fig.7 Exact NMPC performance for m= 1,3,5 for the servo problem: (a) output profiles, and (b) control profiles.

Full size|PPT slide

4.2 HO

Figure 8 plots the validation and test R2 scores, and the test MSE scores, for each candidate model. Figure 9 presents more insights into the BO procedure as illustrated through the optimization history plots and slice plots, taking SVR as an example. Appendix D (cf. ESM) reports the best hyperparameters  η^i for each fi found through BO.
Fig.8 (a) Validation and test R2 scores for each candidate model; (b) test MSE score for each candidate model (“Poly” refers to the polynomial regression model).

Full size|PPT slide

Fig.9 (a) Optimization history for BO of SVR; (b) slice plot for hyperparameter for SVR (objective value is validation R2).

Full size|PPT slide

The tuned polynomial model gave the highest test R2 value and the lowest test MSE value, suggesting that it would return better estimates for dCA/ dt and d CR/dt than other candidate models for this case study. The tuned SVR model gave slightly poorer R2 and MSE scores than the polynomial model, with the tree-based models, i.e., DT, ET, and GB, performing even more poorly than SVR. It is worth noting that different candidate models possess different structures that permit them to model some dynamical systems more efficiently than others. The validation and test results of each model relative to other models can therefore be expected to differ from one case study to another.
The optimization history in Fig. 9(a) illustrates how the BO procedure balances exploration of the hyperparameter space with the exploitation of best found values. The first few trials gave models that returned poor validation scores, and as better hyperparameter configurations were encountered the validation scores of subsequent models remained around similarly good values, suggesting that the sampling strategy began focusing on points close to the best-found hyperparameters. Figure 9(b) reinforces this observation by showing how later trials focused on the region around = 0.02, with sporadic evaluations at values outside of that region.

4.3 Closed-loop control results

After HO, the tuned candidate models were integrated into ML-NMPC controllers and tested against the three control scenarios. Figure 10 shows the results for ML-NMPC. Figures 11, 12 and 13 show concentration and control paths from the polynomial-NMPC and SVR-NMPC for the servo, startup, and upset recovery problems, respectively. These paths are compared against those from the exact NMPC. Figure 14 shows GB-NMPC performance on all three control problems, with these results being representative of the performances of ML-NMPC employing tree-based system models. The results from the two stable ML-NMPCs, the polynomial-NMPC and SVR-NMPC, are finally benchmarked against the exact NMPC in Tables 1 and 2.
Fig.10 ML-NMPC performance for p= 5 and : (a) WIAE, and (b) solution times.

Full size|PPT slide

Fig.11 Exact, polynomial- and SVR- NMPC performance for the servo problem: (a) output profiles, and (b) control profiles.

Full size|PPT slide

Fig.12 Exact, polynomial- and SVR- NMPC performance for the startup problem: (a) output profiles, and (b) control profiles.

Full size|PPT slide

Fig.13 Exact, polynomial- and SVR- NMPC performance for the upset recovery problem: (a) output profiles, and (b) control profiles.

Full size|PPT slide

Fig.14 GB-NMPC performance for p= 5 and m=1: (a) servo problem, (b) startup proble, and (c) upset recovery problem.

Full size|PPT slide

Tab.1 Comparison of WIAEs for polynomial- and SVR- NMPCs against exact NMPC for p= 5 and m=1
Item Controller performance in WIAE
Polynomial SVR Exact
Servo 0.209 0.200 0.094
Startup 1.090 1.031 0.870
Upset recovery 0.392 0.774 0.326
Tab.2 Comparison of solution times for Polynomial- and SVR- NMPCs against exact NMPC for p= 5 and m=1
Item Solution times in seconds
Polynomial SVR Exact
Servo 11.9 69.0 3.50
Startup 6.63 72.2 2.33
Upset recovery 5.88 41.0 1.83
Polynomial- and SVR- NMPCs succeeded in achieving tight control for all three control problems, as Figs. 11–13 show. In the servo problem, CA and CR values took slightly longer to settle at their new set-point values for the two ML-NMPCs, with this due to these controllers offering q control which is less aggressive. This suggests that the quality of the continuous-time model identified from the data is sensitive to the selection of the experimental sampling time h for the data generation procedure. While h= 0.1 tu was sufficient to generate data that allowed the polynomial and SVR models to learn the right “signs” for the time-derivatives characterizing the dynamical system, smaller h values might offer even better first-order approximations for these time-derivatives that ML models could learn from, potentially allowing the ML-NMPC to solve more accurately for smaller time-steps to achieve tighter control. The control paths in the regulator problems in Figs. 12 and 13 also reveal q and T control for ML-NMPC which lagged slightly behind those for the exact NMPC.
These two ML-NMPCs required an order of magnitude more time per iteration than the exact NMPC, though they remained fast in the order of 0.1 s per iteration. The SVR-NMPC required more time than the polynomial-NMPC due to its increased model complexity, with it taking between 5 and 7 times longer while not offering any consistent improvement in WIAE performance over the polynomial-NMPC. Taking both control performance and evaluation times into account, the polynomial-NMPC would be preferable to the SVR-NMPC for this case study.
The tree-based ML-NMPCs were ineffective in control for all three control problems, as Fig. 14 shows, though an exception exists in ET-NMPC for upset recovery. Tree-based models might not be suitable as system models for ML-NMPC in this study because their piece-wise constant nature [27] could prevent optimizers like SLSQP, which depend on local gradient or Hessian approximations, from functioning well. Understanding this observation for tree-based models lies beyond the scope of this work and can be pursued in a future study.
The results above validate the ability of the proposed integrative SysID approach to identify system models for NMPC that allow effective control of a highly nonlinear MIMO CSTR system for control problems across a wider operating range. Further comments made on observations from this case study reinforce the importance of selecting a suitable h for the data generation procedure, highlight the usefulness of simpler models with shorter ML-NMPC solution times, and hint towards more advanced applications of ML that achieve better bias-variance tradeoffs through this framework. Examples illustrating the latter point include incorporating expert system knowledge to impose a prior structure on the candidate models, thereby reducing model bias. Such models are known as grey-box models. Feature selection techniques such as principal component analysis can also be incorporated along with other model regularization techniques to reduce model variance.

5 Conclusions

In this work, an integrated SysID procedure for deriving black-box nonlinear continuous-time MIMO system models for NMPC was articulated. This procedure was validated through the successful identification of NMPC system models that enabled effective control of a highly nonlinear MIMO CSTR system across a wider operating range. This framework is sufficiently flexible and allows practitioners to employ different data generation methods, apply different HO methods, as well as test different candidate models. The choices presented here reflect the current state-of-the-art or what is reasonable, based on the authors’ experience.
It is hoped that these results can motivate further research in direct data-driven methodologies for SysID for MPC. Future directions include the following:
1.‚Applying more advanced ML techniques to this framework such as feature selection, model regularization. Other candidate models that involve local regression or ensemble methods could also be studied in greater detail. This abundance of ML techniques could be exploited to achieve better bias-variance tradeoffs and yield better system models.
2.‚Studying the impact of measurement noise, the amount of data available, and the experimental and NMPC controller sampling times on this SysID procedure to quantify its sensitivity to the quality of data and other design decisions.
3.‚Exploring how a discrete-time model could be derived from a well-learnt continuous-time model, which could serve as a fit-for-purpose NMPC system model requiring smaller solution times.

Acknowledgements

The authors thank the MOE AcRF Grant in Singapore for financial support of the projects on Precision Healthcare Development, Manufacturing and Supply Chain Optimization (Grant No. R-279-000-513-133) and Advanced Process Control and Machine Learning Methods for Enhanced Continuous Manufacturing of Pharmaceutical Products (Grant No. R-279-000-541-114).

Code Availability Statement

ƒAccess to the GitHub repository containing the source code for this project is available upon request.

Electronic Supplementary Material

ƒSupplementary material is available in the online version of this article at https://dx.doi.org/10.1007/s11705-021-2058-6 and is accessible for authorized users.
1
Kaiser E, Kutz J N, Brunton S L. Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings—Royal Society. Mathematical, Physical and Engineering Sciences, 2018, 474(2219): 20180335

DOI

2
Sommeregger W, Sissolak B, Kandra K, von Stosch M, Mayer M, Striedner G. Quality by control: towards model predictive control of mammalian cell culture bioprocesses. Biotechnology Journal, 2017, 12(7): 1600546

DOI

3
Qin S J, Badgwell T A. A survey of industrial model predictive control technology. Control Engineering Practice, 2003, 11(7): 733–764

DOI

4
Öner M, Montes F C C, Ståhlberg T, Stocks S M, Bajtnerb J E, Sin G. Comprehensive evaluation of a data driven control strategy: experimental application to a pharmaceutical crystallization process. Chemical Engineering Research & Design, 2020, 163: 248–261

DOI

5
Al Seyab R K, Cao Y. Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation. Journal of Process Control, 2008, 18(6): 568–581

DOI

6
Ljung L. Perspectives on system identification. Annual Reviews in Control, 2010, 34(1): 1–12

DOI

7
Mokhatab S, Poe W A. Handbook of Natural Gas Transmission and Processing. 2nd ed. Boston: Gulf Professional Publishing, 2012, 473–509

8
Venkateswarlu C, Venkat Rao K. Dynamic recurrent radial basis function network model predictive control of unstable nonlinear processes. Chemical Engineering Science, 2005, 60(23): 6718–6732

DOI

9
Štampar S, Sokolič S, Karer G, Žnidaršič A, Škrjanc I. Theoretical and fuzzy modelling of a pharmaceutical batch reactor. Mathematical and Computer Modelling, 2011, 53(5-6): 637–645

DOI

10
Alanis A Y, Arana-Daniel N, López-Franco C. Artificial Neural Networks for Engineering Applications. Washington: Academic Press, 2019, 55–63

11
Pan Y, Wang J. Model predictive control of unknown nonlinear dynamical systems based on recurrent neural networks. IEEE Transactions on Industrial Electronics, 2012, 59(8): 3089–3101

DOI

12
Schoukens J, Ljung L. Nonlinear system identification: a user-oriented road map. IEEE Control Systems, 2019, 39: 28–99

13
Arefi M, Montazeri A, Poshtan J, Jahed-Motlagh M. Nonlinear model predictive control of chemical processes with a wiener identification approach. In: 2006 IEEE International Conference on Industrial Technology. Mumbai: IEEE, 2006, 1735–1740

14
Wu Z, Tran A, Rincon D, Christofides P D. Machine learning-based predictive control of nonlinear processes. Part I: theory. AIChE, 2019, 65(11): e16729

DOI

15
Wu Z, Tran A, Rincon D, Christofides P D. Machine-learning-based predictive control of nonlinear processes. Part II: computational implementation. AIChE, 2019, 65(11): e16734

DOI

16
Garnier H. Direct continuous-time approaches to system identification. Overview and benefits for practical applications. European Journal of Control, 2015, 24: 50–62

DOI

17
Frazier P I. A tutorial on Bayesian optimization. arXiv:1807.02811 [stat.ML], 2018

18
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 2012, 13: 281–305

19
Berk J, Nguyen V, Gupta S, Rana S, Venkatesh S. Exploration enhanced expected improvement for bayesian optimization. In: Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G, eds. Machine Learning and Knowledge Discovery in Databases. Cham: Springer International Publishing, 2019, 621–637

20
Seborg D E, Mellichamp D A, Edgar T F, Doyle F J III. Process dynamics and control. 3rd ed. New York: John Wiley & Sons, 2010

21
Binder B J T, Johansen T A, Imsland L. Improved predictions from measured disturbances in linear model predictive control. Journal of Process Control, 2019, 75: 86–106

DOI

22
Wong W C, Chee E, Li J, Wang X. Recurrent neural network-based model predictive control for continuous pharmaceutical manufacturing. Mathematics, 2018, 6(11): 242

DOI

23
Koppel L B. Input multiplicities in nonlinear, multivariable control systems. AIChE, 1982, 28(6): 935–945

DOI

24
Virtanen P, Gommers R, Oliphant T E, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, . SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 2020, 17(3): 261–272

DOI

25
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, . Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 2011, 12: 2825–2830

26
Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: Association for Computing Machinery, 2019, 2623–2631

27
Shi Y, Li J, Li Z. Gradient boosting with piece-wise linear regression trees. arXiv:1802.05640 [cs.LG], 2019

Outlines

/