1 Introduction
The rapid technological progresses in areas, such as the Internet of Things (IoT), Internet of Services (IoS), Internet of People (IoP), smart sensing, mobile communication, process modelling and simulation, advanced data processing, storage and analytics, artificial intelligence (AI), edge and cloud computing technologies, cybersecurity, advanced robotics, multiagent technologies, or virtual and augmented reality, increase the complexity of current industrial processes that transform into so-called cyber–physical systems (CPSs) by integrating information and communication technology (ICT) with physical process objects.
Terms, such as Industry 4.0 or the Fourth Industrial Revolution, embody the vision of such production systems, where intelligent objects, such as machines, process units, and robots can measure and assess their own situation and communicate, make decisions and dynamically adapt, and reconfigure on the basis of local and global information (
Bendul and Blunck, 2019).
Thus, the production system becomes all-automated, eventually integrated into a supply chain, where human intervention is reduced to an indispensable minimum (
Castelo-Branco et al., 2019), and takes advantage of the advanced intelligence provided by the abovementioned technologies to achieve flexibility, reduced complexity, and modularity (
Zhong et al., 2017).
The main implementation principles toward Industry 4.0 are as follows (
Mohamed, 2018):
• Interoperability between CPS, enterprises, and humans connected by IoT and IoS;
• Virtualisation of the physical world, enabling the CPS to monitor physical processes;
• Real time capability to continuously analyse data and react to any changes in the environment;
• Decentralisation, which relates to giving autonomy, resources and responsibility to lower levels of the organisational hierarchy in the event of failures or complex situations;
• Service orientation, in which application components provide services to other components via a communication protocol;
• Security of information and its privacy.
Digitalisation creates a version of the industrial process, where all the production operations, simulation and experimental verification are virtualised. The addition of the networking, monitoring and analysis, and decision-making elements of Industry 4.0 transform the digital process into an intelligent system, with a dynamic configuration.
• For many processes, the existing infrastructure is not entirely ready to support the digital transformation to Industry 4.0, which aims at horizontal, vertical and end-to-end integration.
• Designing Industry 4.0 systems involves complexity due to the heterogeneous and high-dimensional nature of the elements that are part of the industrial CPS. This condition gives rise to increased uncertainty and risks, multiple feedback cycles and dynamics.
• Scalability, as the number of physical objects connected grows exponentially larger with the size of the system.
• The more elements connected, the larger the amount of big data from a variety of heterogeneous sources needs to be acquired, transferred, stored and analysed.
• At present, no common platform can accommodate the variety of communications technologies and applications that should be integrated and interoperable in the intelligent production systems’ network.
• Need for modularised and flexible physical objects that can be connected and work together for distributed decision-making.
• Need to develop global standards and data sharing protocols.
• Legal, data privacy and security issues need to be considered in developing sustainable business models.
The application of Industry 4.0 concepts requires the conversion of regular machines and processes into resilient, self-aware, self-learning and self-adapting systems to improve their overall performance and maintenance management (
Vaidya et al., 2018). Such a system does not currently exist, but many of the components required for its creation are already available.
The rest of this paper is organised as follows. Section 2 introduces the way digitalisation can improve the production systems. Section 3 focuses on current process sensing and monitoring technologies. Section 4 presents the way in which the big data obtained can facilitate the development of smart systems with the utilisation of various computational approaches, such as machine learning (ML), to gain understanding on the behaviour of the CPS. Section 5 focuses on fault detection and prediction, and Section 6 introduces optimisation solutions that could enable adaptation and self-regulation of industrial systems. Section 7 focuses on the elements that can contribute in facilitating decision-making in the smart plants of the future: Ontology and multiagent systems (MASs). Section 8 discusses uncertainty, a key characteristic of real systems. Section 9 introduces a novel modular and decentralised framework, which is enabled by the development of digital twins, for autonomous systems. Section 10 concludes the work.
2 Digitalisation of production systems
The term digitalisation is regarded as a step towards enabling, improving and transforming models, functions, processes and operations by leveraging a multitude of technological advancements, and is perceived to be a combination of the followings (
Kan et al., 2018;
Bendul and Blunck, 2019;
Gürdür et al., 2019):
• Integration of a multitude of wireless sensors, computing units and machines in a large-scale network that enables “things” to communicate and exchange data;
• Increased amount of heterogeneous data, computational power, and connectivity, including big data, open data and cloud technologies;
• Developments from the field of analytics and AI, such as automation of knowledge based on advanced analytics;
• Convergence between the real and virtual worlds through information and communication technologies;
• Improved human–machine interaction and integration.
The integration between the real and the virtual worlds leads to complexity emerging from interactions between cyber systems and the uncertain dynamic behaviour of physical systems whilst having to deal with limited resources, usually shared among the components of the CPS, creating a resource-constrained environment in which complex interactions can lead to serious disruptions undermining the system’s utility (
Nayak et al., 2016).
The modelling of a CPS requires a multidisciplinary approach that should focus on the separate physical and cyber components, and on their integration and interaction (
Hehenberger et al., 2016). When modelling a CPS, the following characteristics should be considered (
Seiger et al., 2015):
• The CPSs are highly dynamic with respect to the number and availability of their components, devices and services.
• Numerous heterogeneous devices and services are integrated into a so-called system-of-systems.
• Processes can be extremely complex and contain a large number of steps, requiring hierarchical structuring and aggregating.
• Numerous processes can coexist in a CPS and their execution times and cycles can vary considerably.
• The generation of various model alternatives in line with the new requirements is needed because of a change in conditions.
• The execution of the model should be performed in a distributed manner to account for the structure of the CPS.
For the purpose of handling such complex systems, the Reference Architecture Model Industry 4.0 was developed in Germany (
Willner and Gowtham, 2020), a meta-model that describes the aspects that play an important role in the Industry 4.0 production systems.
Humans are one of the inevitable components for the smart plants of the future to be successful, requiring user interfaces to bridge the human and the components of the CPS (
Dafflon et al., 2021), with the new system defined as a human–cyber–physical system (HCPS). HCPS applications are common in most areas of major infrastructure development, such as smart grids, smart cities, smart transportation, smart education, smart healthcare and medicine, and national defence (
Liu and Wang, 2020). Their application in the area of smart plants becomes essential due to the continuous evolution and integration between the cyber and physical components in the manufacturing area, although not an entirely new concept.
Intelligent manufacturing based on HCPS requires the human component to have a greater role in the formation of human–machine symbiosis, which brings diverse challenges in the form of (
Zhou et al., 2019):
• The need to develop effective approaches for division of work and cooperation between humans and intelligent machines, which fully utilise the human and machine intelligence;
• Achievement of human–machine hybrid-augmented intelligence;
• Requirements to introduce safety, privacy, ethical and other issues in AI and intelligent manufacturing.
3 Process sensing and monitoring
Efficient sensing and monitoring of the various elements of the industrial process systems is a key element in achieving the goal of a self-adaptive and self-repairing system. The mobile and wireless communication revolution is entering a new phase with the deployment of the fifth generation (5G) mobile communication which aims to furnish a real wireless world, free from present obstacles of the 4G systems. The multitude of physical devices connected through IoT require thousands times higher mobile data capabilities, user data rates greater than 1 Gbps, 10 to 100 times more connected devices, more battery life, and five times reduced latency (
Kumar and Gupta, 2018).
The declining cost and maturity and adoption of wireless standards, such as Bluetooth, ZigBee, WiFi and radio frequency identification, enable global interoperability between devices and device manufacturers and further stimulate the deployment of ubiquitous, pervasive and wireless applications, including wireless sensor networks (WSNs) (
Steinberg and Steinberg, 2009).
WSNs consist of hundreds of sensor nodes that may be deployed in relatively harsh and complex environments for remote monitoring, control and surveillance purposes (
Alsheikh et al., 2014;
Zhang et al., 2018). The sensors are capable of measuring one or more desired physical quantities. A typical WSN usually consists of a base station (or Sink Node) for data collection, processing and connection to the environmental area. Modern wireless sensor nodes usually have microprocessors for local data processing, networking and control purposes (
Moustapha and Selmic, 2008).
In complex distributed sensory systems, WSNs often operate in potentially hostile and harsh environments, and most of the applications are mission critical (
Moustapha and Selmic, 2008;
Lv et al., 2016). In chemical process engineering applications, WSNs are deployed in industrial plants to monitor or sense various aspects of the environmental or mechanical environments. The states and information rotations of machines, moving objects and chemical reagents can be captured by the WSNs.
In the very specific application of chemical process fault detection, accidents caused by the system performance dissipation and external disturbance result in huge property losses and casualties; intelligent techniques are required to detect and identify faults in complex industrial processes of large and complicated modern industrial systems (
Lv et al., 2016).
4 Big data and machine learning (ML)
The integration between the cyber and physical worlds facilitated by the Industry 4.0 technologies enables the creation and collection of huge amounts of data from different points in the manufacturing system. These data will have to be stored and fused into online/cloud-based databases to be used for tasks, such as predictive maintenance or operation and business decisions. Inferential sensors predict important variables often difficult or uneconomical to measure online by using available process data (
Chiang et al., 2017).
Under these circumstances, big data refers to large amounts of multisource, heterogeneous data generated through the product lifecycle, which is characterised by the five-V (
Tao et al., 2018):
a) A high Volume (i.e., huge quantities of data);
b) Variety (i.e., the data comes in different forms and is generated by diverse sources);
c) Velocity (i.e., the data is generated and renewed at extremely high speed);
d) Veracity (i.e., the data is associated with a level of bias, inconsistency, incompleteness, ambiguities, latency, noise, and approximation);
e) Value (i.e., huge value hidden in the data).
The utility of the data does not hinge solely on the sheer volume of information available but rather on the knowledge that lies hidden in it. A systematic computational analysis of data collected from the chemical or biochemical system can enable more informed decisions, which will enhance the efficiency of the process.
The process of mining data streams acquired from the various heterogeneous monitoring and sensing devices embedded in the physical components plays an essential role in the functionality of the CPS because it enables extraction of insight and knowledge, provides learning and predictive capabilities for decision support and autonomous behaviour, enables feedback from physical and human layers to the cyber counterpart, and facilitates the integration of the three layers (
Fei et al., 2019).
To be of value, data must be available for analysis in a sufficient volume and velocity, covering a sufficiently broad variety of relevant factors, and be trustworthy (
Udugama et al., 2020).
Appropriate techniques are needed for the collection, transmission, storage and processing of all the data and their record keeping in cloud-based portals. Innovative and effective analytic techniques are required to operate continuously and in real time on the data streams and other data sources, such as the ones based on ML and AI (
Fei et al., 2019).
An important critique of the big data era is that often “manufactured” patterns and correlations can provide false knowledge, especially in situations when the big data analytics is applied without context and domain knowledge. Given that chemical and biochemical processes are governed by first principles, fundamental modelling approaches must be combined with ML approaches to develop accurate dynamic and nonlinear models.
However, this integration to create enterprise-scale solutions remains an important technical challenge in the area of chemical and biochemical process systems engineering (
Chiang et al., 2017).
The computing architecture has an important influence on the data-driven algorithms, and the integration between solutions, such as edge, cloud and fog computing (
Xu et al., 2020), into decision-making frameworks based on hybrid mechanistic and data-driven models is another benefit that Industry 4.0 brings into the design and operation of smart plants. An important difference between the fog and cloud approaches rests in the number of available resources (
Fei et al., 2019):
• Although the cloud is considered to have virtually unlimited storage and processing capabilities in the fog, such resources are restricted, and their optimal management is crucial;
• Interhost communication in the cloud is fast due to high-speed networks, whereas wireless communication and varying network types lead to delays in the fog. Delays can also be observed in the cloud during the access to remote devices.
ML approaches offer a multitude of solutions for learning underlying patterns from big sets of data and making insightful predictions for difficult tasks in complex scenarios, such as the operation of CPS (
Ruan et al., 2022).
In the following, the focus will be on deep learning techniques or hybrid learning approaches for fault detection and prediction tasks.
Convolutional neural networks exploit translational invariance within their structure by using receptive fields and learning via weight sharing to extract features and usually include two elements (
Fei et al., 2019):
a) The feature extractor, composed of multiple similar stages and layers, which automatically learns features from raw data;
b) The trainable, fully connected multilayer perceptron or other classifiers (e.g., support vector machines (SVMs)), which performs classification based on the learned features from the feature extractor.
A hybrid learning structure can tackle with more sophisticated data processing models, whereas deep neural networks (DNNs) usually attempt to find direct solutions using raw data. DNNs are usually consisted of more than two hidden layers, where some have extremely compact and optimised architectures of interlayer connections.
This type of neural networks is often able to process and discover hidden information from extremely large volumes of data for very specific application-driven tasks that are usually unattainable using the conventional ML methods. DNNs circumvent complex data preprocessing procedures, which usually need to be taken care of manually by experts.
However, the computational demand is completely different when dealing with massive WSNs that will be collecting data samples from the entire industrial plant(s) at least one time per second continuously for up to months or even years. This condition leads to billions or even trillions of data in the long run. In this case, the volume of data becomes incredibly large that the conventional ML techniques either cannot be applied or suffer from serious accuracy performance degradations to find direct solutions for some extremely challenging applications, such as high-accuracy fault detection and highly automated sensor self-managements.
With researchers creating new deep learning algorithms and industries producing and collecting unprecedented amounts of data, computational capability (i.e., computing speed and memory) is the key to unlocking insights from data and improving learning efficiency, and efficiency brings direct profits to modern industries.
On the software part, the best that can be done is to optimise algorithms and code to minimise the computational cost as much as possible. However, this approach does not alleviate the computational demand of deep learning due to its extremely large data processing throughput.
An alternative choice is connecting a local computer to a cloud server. However, the approaches inevitably face limitations, such as per-user availability, and risks like Internet disruptions, server maintenance, and data recovery difficulties if the server encounters security issues.
Therefore, a better solution is to improve the local hardware part (i.e., employ highly featured graphic processing units) and incorporate advanced open-source application programming interfaces (e.g., Caffe, Theano, Google TensorFlow, Microsoft Azure, etc.), which has been proven to create tremendous value and can be proven to be the best means to greatly improve computational capacity.
Fig.1 shows the simplest DNN with only two hidden layers for data classifications. The hyperparameters, including the number of layers, number of neurons in each layer, choice of activation functions and optimisation, usually need to be tuned properly to deliver the optimum potential performance for some specific tasks.
5 Fault detection and fault prediction
As mentioned in Section 4, various ML-based methods are available to extract information from data. Fault detection and fault prediction, even for future events within the processing horizon of a production plant or facility, play a key role in modern industry so as to achieve safe operation primarily for safety reasons and avoidance of industrial accidents to safeguard satisfaction of environmental impact and legislation, and to ensure continuous cost minimisation and profit maximisation (process profitability) in real-time operations and not only for long-term planning. This process requires a truly “intelligent” monitoring and optimising control system.
A summary and comparison of using different ML techniques and WSNs for fault detection and prediction is shown in Tab.1.
Considerably less fault data are readily available compared with the amount of normal data. A dataset is imbalanced if the classes are not approximately equally represented, and the fault data represent only a small portion of ill process conditions for the majority of industrial processes. This condition prevents a comprehensive and generalised knowledge of the fault types to be provided and used in most fault detection algorithms.
The techniques employed for process monitoring and fault detection are primarily of three types (
Abid et al., 2021;
Arunthavanathan et al., 2021):
• Data driven techniques, such as statistical model based (
Baklouti et al., 2018;
Wang et al., 2018) or AI-based (
Jiao et al., 2020;
Said et al., 2020) techniques;
The data-driven techniques tend to dominate the domain. They rely on huge sets of historical process data and often require reduced insight into the system. They can detect data integrity issues due to sensor or process noise (
Luo et al., 2021).
Prior knowledge (or model-) based strategies involve the construction of a mathematical representation of a system’s functionality. Mechanistic models are the most detailed, but they may be extremely complex for fault detection applications. In this case, empirical-based methods can be used to describe the system or parts of it. Prior knowledge approaches show better generalisation capabilities compared with data-based approaches.
Hybrid approaches trace the development of monitoring and fault detection strategies by combining data- and knowledge-based approaches to overcome the lack of data and increase the accuracy of the detection process.
Incorporating process-specific information enables a more effective use of the data and results in outcomes complying with the operation principles of the units and the fundamental laws of nature (
Reis et al., 2019).
In particular, conventional classification algorithms tend to strongly favour the majority class and detect the minority class at extremely low rates when the class sizes are highly imbalanced (
Kwak et al, 2015). Moustapha and Selmic (
2008) employ a simple multitap delayed recurrent neural network model to perform sensor identification and fault detection based on a dynamic WSN model and compare it with the popular Kalman filter method to show its effectiveness.
A brief overview of using ML techniques for fault detection in the applications of computer system security is introduced in Kaur et al. (
2013). More detailed overviews on the techniques, nature of data, types of anomalies, detection learning modes, window models, and dataset and evaluation metrics to evaluate the performance of the proposed techniques are also available (
Al-Amri et al., 2021;
Nassif et al., 2021).
As one of standard conventional ML classification algorithms, SVM has been adopted with variations by various authors (
Rajasegarar et al., 2010;
Rashid et al., 2014;
Kwak et al., 2015;
Martins et al., 2015;
Ayadi et al., 2017;
Zidi et al., 2018) to perform fault detection tasks in WSN-assembled systems due to its superiority on handling moderate-sized high-dimensional data. With wide feature ranges of the data, the faults can be successfully separated between hyperplanes by using either a linear kernel or nonlinear kernel functions, assuming the normal and faulty data are distributed in a balanced manner.
K-nearest neighbour (KNN) and Gaussian mixture model (GMM) are proposed (
Rajasegarar et al., 2010;
Rashid et al., 2014;
Ayadi et al., 2017) for the comparison with the SVM algorithm, which give the best results regarding sensitivity, specificity and accuracy in pipeline leakage detections. Yan et al. (
2016) propose a mixed software and hardware assignment clustering scheme to detect unknown types of faults based on
K-mean unsupervised learning and GMMs. The proposed scheme achieves 75% detection accuracy on the new faults.
However, only detecting the faults in practical scenarios of real applications is inadequate. In the pure definition of fault detection, this condition only occurs after it actually happens, which is meaningless in some cases because damages and losses are already done and cannot be reverted. To this end, fault prediction and predictive maintenance become important areas of research. Thus, the WSNs need to be assembled with intelligent fault prediction algorithms that go beyond the simple task of fault detection alone.
A recursive least squares (RLS) framework combined with time series fault prognosis method based on using variable gradients and forgetting factors on data revolutions is developed in Lu et al. (
2018) for mechanical systems. Wang et al. (
2016) propose a three-level framework backpropagation (TLBP) mechanism to show satisfactory results in petrochemical industrial leakage point predictions. However, these approaches are not utilising the high throughput and low latency advantages of 5G communication-based WSNs.
Although conventional ML techniques are well exploited in the area of fault detection, deep learning network (DLN), as an emerging research area in ML, has drawn increasing attention in various areas of multidisciplinary research, including computer vision, natural language processing, speech processing, event predictions, market price forecasting and biomedical applications.
DLN-based approaches enable time-series multistep prediction and can deal with cumulative errors on different data patterns (
Lv et al., 2016;
Liu et al., 2017). In Ruan et al. (
2022), an effective end-to-end DLN with its own novel learning algorithm based on recursive gradient descent is developed. This DLN shows superior performance compared with other state-of-the-art time-series fault prediction solutions.
6 Optimization of maintenance scheduling
As mentioned in the previous sections, the addition of Industry 4.0 elements to a processing system increases its complexity and requires integration and multitasking. Thus, the system involves numerous interactions and (inter)dependencies between its individual components, in addition to operating in highly dynamic environment characteristic to any industrial environment.
Therefore, the operation of processes with decaying performance over time gives rise to challenging modelling and optimisation issues. As the performance degrades over time, process shutdown for unit cleaning – reverse osmosis networks (
Saif et al., 2019), heat exchanger networks (
Al Ismaili et al., 2018) or catalyst changeovers – catalytic processes (
Adloor and Vassiliadis, 2020) must be planned to enable its restoration.
Parallel processing lines are used in manufacturing to improve the flexibility of the system and to avoid shutdown. In this case, one unit is shut down for cleaning purposes, and the remaining units continue to meet the production targets. Although this maintenance action ameliorates the yield, negative effects, such as loss of production time or increase in energy and labour costs to restore performance, are often encountered (
Adloor and Vassiliadis, 2020).
This condition gives rise to a trade-off that must be addressed for each unit in the system: Frequent cleaning actions result in high production rates, large maintenance costs and production loss. The trade-off can be optimally managed by developing maintenance schedules that specify which units can be used and optimal use time of each of them in the parallel production set up over a fixed time horizon. The schedule may also be required to fulfil a constraint that no two units can undergo a cleaning action at the same time due to production requirements or labour or equipment availability (
Al Ismaili et al., 2018).
Identifying the optimal operating condition and ensuring that the resulting maintenance schedule and the process operation are tailored are necessary to produce an adequate product inventory for effectively meeting varying demand across the time horizon and avoiding high storage costs. An integrated execution of all these decisions in an optimal manner can greatly minimise the negative effects of the performance decaying process and maximise the profit (
Adloor and Vassiliadis, 2020).
These medium-term effect control actions aimed at: a) restoration or maintaining of productivity levels for decaying performance dynamic processes, and b) preventive maintenance actions to ensure avoidance of production breakdown or breakdown events, which include the safe operation of production processes.
The three main performance measures to characterise an equipment from the maintenance perspective are the so-called RAM parameters (
Fumagalli et al., 2017): a) Reliability, the quantification of how long equipment can operate without failure; b) Availability, the ratio at which the equipment can operate; and c) Maintainability, the ease and rapidity with which a system or equipment can be restored to operational status following a failure.
Maintenance is mainly divided into two categories (
Mazidi et al., 2018): Corrective, denoting remedial actions performed to restore operation back to its previous operating state, and Preventive, referring to actions carried out to maintain operability of an asset at an acceptable level. Interventions can be performed when needed (event-controlled actions) or at regular intervals (time-controlled actions) (
Kong and Frangopol, 2003).
However, the maintenance actions should be conducted proactively to reduce the cost and maintain the operation at the highest possible level. This process requires the transformation of the maintenance strategy from the traditional, fail-and-fix practices to predict-and-prevent methodologies (
Aivaliotis et al., 2019). The goal of predictive maintenance is to reduce the downtime and cost of maintenance under the premise of zero failure manufacturing by monitoring the working condition of equipment and predicting when the failure might occur (
Li et al., 2017).
Predictive maintenance allows the early detection of failures due to the predictive tools based on historical data (e.g., ML techniques), integrity factors (e.g., visual aspects, wear, coloration different from original, etc.), statistical inference methods, and engineering approaches (
Carvalho et al., 2019). Thus, predictive maintenance applications are a major group considerably dependent on big data analytics (
Yan et al., 2017;
Sahal et al., 2020).
Two approaches are commonly employed when dealing with the maintenance scheduling (
Santamaria and Macchietto, 2018):
• Optimal scheduling problem. In this case, the binary decision variables are associated with the operating states of the units (cleaning/operating) and the timing and sequencing of the task. The resulting problem is combinatorial in nature and is typically addressed by using (pseudo-)steady-state models.
• Dynamic optimal scheduling problem. In this case, the problem involves solution of differential and algebraic equations (DAEs). The result is a (mixed-integer) nonlinear programming problem, which offers the flexibility of accommodating various types of models (
Assis et al., 2015)
A maintenance optimisation model is a mathematical model in which the costs and benefits of maintenance are quantified, and in which an optimum balance between them is obtained whilst considering all types of constraints (
Fumagalli et al., 2017). The maintenance scheduling can be classified as: a) cost-based approaches, where the objective function is the minimisation of the maintenance costs; b) availability-based approaches, where the objective function is the minimisation of downtimes (maximisation of availability); and c) reliability-based approaches, where the objective function is the maximisation of the reliability of the system or the minimisation of maintenance costs whilst respecting constraints regarding the system reliability (
Fumagalli et al., 2017).
The optimal system maintenance policy may (
Sharma et al., 2011): a) minimise system maintenance cost rate; b) maximise the system reliability measure; c) minimise the system maintenance cost rate whilst the system reliability requirements are satisfied; and d) maximise the system reliability measures when the requirements for the system maintenance cost are satisfied.
Although many techniques can be used to schedule such operations (
Lohmer and Lasch, 2021), the ones often used in the industry are based on “greedy approaches”, which have extremely short-term economic horizons (
Khalaf et al., 2010;
Hosseini et al., 2020;
Baykasoğlu and Madenoglu, 2021;
Fadlallah et al., 2021;
Zhou et al., 2021;
Hong et al., 2022). Maintenance is mostly invariably disruptive to the production process, ranging from reduced capacity operation during maintenance to complete shutdown.
Other processes may have to be overloaded to compensate and maintain production levels during maintenance, requiring significantly higher operational costs to perform and with higher longer-term economic influence if greedy approaches are utilised for the scheduling. Improved coordination of the operation in industrial sites can be a source of enormous savings in energy and resources. This condition motivates the development of methods and software tools for efficiency monitoring, coordinated process control, and optimal planning and production scheduling of factories, industrial plants and parks under dynamically changing market conditions (
Krämer and Engell, 2018).
An effective approach to solve the scheduling problem is by considering its reformulation as a multistage optimisation (optimal control) model. This is cast in a form that promotes bang-bang type solutions for the control variables associated with the restorative action periods. The bang-bang behaviour is entirely equivalent to having a Boolean variable (integer, binary) within an otherwise smoothly represented dynamic optimisation model.
This approach has been successfully applied in solving maintenance scheduling problems for heat exchanger networks (
Al Ismaili et al., 2018) and catalytic reactor networks (
Adloor and Vassiliadis, 2020). It has enabled reliable and realistic inclusion of process uncertainty in the resulting models (
Al Ismaili et al., 2019;
Adloor and Vassiliadis, 2021).
The accuracy of the model used is of paramount importance. Maintenance of this type requires extremely well-defined mechanistic models to predict the evolution of the underlying processes with high accuracy. In their absence, big data are used via ML techniques to substitute for their lack for highly complicated processes.
Rigorous, mechanistic models capture the full representation of the physical phenomena occurring inside the system, but can be computationally expensive for large-scale scheduling problems at the same time. However, inadequately describing the physics of the process may affect the validity of the maintenance schedules obtained, leading to useless results in terms of practical applications (
van Horenbeek et al., 2010).
7 Ontology-based multiagent system (MAS)
An ontology serves as a library of knowledge components to efficiently build intelligent systems and as a shared vocabulary for communication between interacting human and/or software agents. It can be defined as a formal representation of a set of concepts within a domain and the relationships between these concepts. Every field creates ontologies to limit complexity and organise information into data and knowledge.
Ontology engineering is a field that studies the methods and methodologies for building ontologies. An ontology language is a formal language used to encode the ontology, and the Web Ontology Language (OWL) is the most commonly used ontology language. This language is originally used for better information exchange between Internet agents (
McGuiness and van Harmelen, 2004).
One of the strengths of using an ontology is that hidden relations between things can be inferred by the logic reasoners or the inference engines. Thus, ontology is useful for generating hidden new conclusions from existing data due to its mathematical logic foundation. The information you obtain from the ontologies can be validated because of its structure and semantics.
Applying domain knowledge in tasks, such as process representation and modelling using an expert system, facilitates the development of a conceptual hierarchy supporting system integration and interoperability of its components in an easily interpretable manner (
Wan et al., 2021).
In the Industry 4.0 concept (
Gilchrist, 2016;
Lu, 2017), a huge amount of data and information regarding different aspects of the potential process member need to be shared among the system’s components. This information should be shared and exchanged autonomously amongst the entities. Ontology technology has received great attention in the past decade as an advanced tool to tackle these challenges (
Batres, 2017;
Ekaputra et al., 2017).
Several ontologies have been developed in the past, paving the way towards emergent ones in the field of industrial process engineering. The ISO 15926 ontology is one example. Its objective is to enable long-term data integration, access and exchange. ISO 15926 supports the evolution of data through time. It belongs to the category of ontologies that define basic classes and relations from which subclasses and relations can be defined (
Batres et al., 2007).
OntoCAPE (
Morbach et al., 2009) introduces perhaps the most widely used ontology in process systems engineering. This modular ontology is structured into layers so that the general classes and relations are separated from those related to domains or applications. The meta-layer describes the OntoCAPE design concepts and the explanation of how to extend the ontology. The upper layer represents the general theory knowledge about the process. Subsequently, the conceptual layer (the domain layer) covers the engineering classes and its relations for entities, such as unit operations, equipment, materials, physical properties and mathematical models. The application layer extends the ontology to more specific classes, such as specific process units, including chemical reactor.
PetroHAZOP is an ontology built by using concepts from OntoCAPE and ISO 15926. It consists of four modules, namely, the case base, the case based reasoning (CBR) engine, the knowledge maintenance, and the graphical user interface module. Within the case base, HAZOP (hazard and operability) analysis is represented as cases that are organised with a hierarchical structure (
Zhao et al., 2009).
Another example of ontology, OntoSafe provides the semantics for process anomaly management (
Natarajan et al., 2012). It integrates the information necessary for forming a judgment of the condition and state of the process. It also captures the hidden links so that changes in the process descriptors are reported consistently. The existing concepts in OntoCAPE have been used for developing OntoSafe in addition to new classes and relations specific to process supervision. The process supervision task is to determine the state or condition of the process, for example, to confirm the presence or absence of a fault.
In the newly developed CPS, new ontologies need to be developed to account for the new layers/modules created by the addition of the cyber level and through the integration between the physical process and the computational elements.
Agent-based technology is becoming a powerful tool for engineering applications, and MASs have received great attention from scholars in various fields (
Kravari and Bassiliades, 2015;
Xie and Liu, 2017;
Dorri et al., 2018). The agent is defined as an entity that senses parameters in the environment, which are used to make decision in accordance with his/her goal.
The MAS is a computerised system composed of multiple interacting agents exploited to solve problems that are difficult or impossible for an individual agent to solve. The distinguishing features of the MAS include efficiency, low cost, flexibility and reliability, thereby making it effective for complex tasks. The MAS efficiency stems from the fact that a complex task is divided into smaller subtasks, each of which is assigned to an agent. Each agent decides on the action to solve his subtask using multiple inputs: A history of actions, interactions with the neighbours, and its task. Agents use these interactions to learn new contexts. Hence, agents use their knowledge to decide an action to solve their allocated subtasks.
Agents can have several properties as follows (
McArthur et al., 2007;
Dorri et al., 2018):
• Sociability: Agents can share their knowledge or request information from other agents to achieve their tasks;
• Autonomy: Each agent can work independently and execute the appropriate action;
• Proactivity: Each agent uses its own history, sensors’ information, and other agents’ knowledge to predict future actions;
• Connectivity: The performance and functionalities highly rely on the communication layer, especially the connection topology and associated protocols;
• Mobility: Agents can be static or mobile agents.
Intelligent agents are classified into several types with respect to the decision-making mechanisms. Purely reactive agents make decisions using only the present information without referring to historical data, while belief-desire-intention agents are built by using symbolic representations of the intentions, beliefs and desires of agents, and layered architectures incorporate several software layers. An MAS also can be classified in accordance with its topology, referring to the location and relations of agents, into the static or dynamic topology. A comparative review of the existing agent platforms that can be used is presented in Kravari and Bassiliades (
2015) based on a universal comparison and evaluation criteria. This review proposes classifications for helping readers to understand which agent platforms broadly exhibit similar properties and what choices should be made in various situations.
Ontologies and MASs have been used extensively in the past decades to solve problems or improve or add new capabilities for industrial processes and for the chemical processes in particular.
An ontology-based scheme has been used to describe sensors and their features for sensor networks (
Xue et al., 2015). The sensor nodes are deployed to collect information for environmental monitoring, but these sensor networks have management problems and issues in data sharing between sensors. The ontology scheme can help in providing an effective management system for the sensor networks.
Another application is focused on manufacturing process ontologies, which combine formal concept analysis with a set of criteria for characterising classes of processes (
Akmal and Batres, 2013). In an ontology-based manufacturing for flexible production (
Shi et al., 2017), the description of physical entities, such as production processes, equipment and products, and the relationships of operation logic and operation sequences in the manufacturing process are defined. Thus, the system can make automatic adjustments to ensure the completion of the process when changes occur in internal manufacturing requirements or external environment.
An agent-based method has been used for the coordination of tasks in chemical plants, stemming from the growing complexity of the current industrial processes (
Nikraz and Bahri, 2005).
A process monitoring and supervisory system is an example of application in which ontology and intelligent agents play a key role. A multiagent technology-based chemical plant supervisory system that realises the connection between the chemical equipment and monitors the entire enterprise, which can be integrated with the current systems through an interface agent, is proposed in Wang and Zhang (
2008). ENCORE contains three types of agents that can cooperate with each other: The plant information manager agent, the process supervision agent, and the user interface agent. An offshore oil and gas production process was used to test the effectiveness of the system.
Knowledge-driven approach to construct ontologies can be used to demonstrate how description logic reasoning can be used to support process supervision and fault detection and prediction without the help of external agents (
Musulin et al., 2013).
Ontology-based methods can be used to enhance maintenance decisions through the knowledge gathered during process monitoring (
Elhdad et al., 2013). The monitoring process is based on signals that are triggered during the plant safety shutdown process. The implemented framework defines the logical structure and operation of the plant with the objective of monitoring the cause and effect of the plant shutdown process.
An agent-based model has been used to evaluate the dynamic behaviour of a global enterprise by considering the system-level performance and the components’ behaviour. Thus, it can be used to predict the effects of local and operational activities on plant performance and improve the tactical and strategic decision-making at the enterprise level (
Behdani et al., 2009).
One of the strengths of ontology-based approaches is that they can integrate heterogeneous data, and ontology-based data integration is recommended to tackle this challenge (
Ekaputra et al., 2017). Reconfigurability is an important feature for the system, especially in abnormal situations. Within the context of Industry 4.0, personalised customisation requires more agile and flexible processes, indicating that the reconfigurable system is crucial for the enterprise to remain competitive. MASs can be introduced to intelligently bring about reconfigurations that restore the system performance back to its original level (
Farid, 2015).
However, most of the existing systems have to be suspended when reconfiguring because the online reconfiguration may lead the system to disorder and uncertainty. IEC 61499 function blocks combined with MASs and ontology can be used to minimise the leading time of reconfiguration whilst ensuring system stability (
Wan et al., 2017).
8 Influence of process uncertainty on smart systems
Uncertainty comes from various sources, ranging from the fact that the models used are approximations based on assumptions of the physics and chemistry involved, and have parameters that can be inaccurate and flawed in facing real case scenarios (e.g., unplanned changes, disturbances, elements of the system break down, communication system fails, etc.). It should be considered in the design phase to ensure robust solutions (
Bogle, 2017;
Palacín et al., 2018).
Uncertainty can lead to the system not being able to fulfil its requirements and production quality goals, or even lead to safety hazards for the operators, local communities and the environment (
Bandyszak et al., 2020).
In a CPS, the model predictions may be affected by uncertainty sources from (
Nannapaneni et al., 2020): The computing (cyber) subsystem (resource and communication uncertainty), manufacturing (physical) subsystem (input uncertainty, process variability and modelling errors), and sensors (measurement uncertainty). The interaction between the physical and cyber components further increases the complexity by aggregating and compounding these uncertainty sources over time.
The major challenges in the development of HCPS are as follows (
Zhou et al., 2019;
Liu and Wang, 2020):
• The mismatches of the abstractions among physical, cyber and human systems. In the case of physical systems, their state change is continuous and in real time, and results in representations, such as ordinary differential equations (ODEs), DAEs or partial differential equations. In the case of cyber systems, the changes are discrete, resulting in automata or state machine representations. In case of the humans, the abstraction of their behaviour is an important source of uncertainty although they are treated as physical systems.
• The need to develop abstractions for the interaction, concurrency and synchronisation among humans, humans and machines, and humans and physical systems to analyse and design monitoring and control systems for the human behaviour and to coordinate the behaviour of humans with the ones of the CPS.
Approaches to assess the uncertainty in a CPS require a degree of flexibility to accommodate complexity whilst maintaining a degree of robustness for satisfying key objectives within the specified confidence boundaries (
Grenyer et al., 2021). This process facilitates the mitigation of unknown/unexpected changes, enabling the system to evolve in the presence of such unpredictable challenges to the point of being reconfigurable with high degrees of freedom (
Ahmed et al., 2020).
Identifying the various types of uncertainties present within the system and having suitable methods that can deal with them together with the knowledge on how to apply such methods are important to enable the incorporation of approaches for handling uncertainty in the design of CPS or HCPS (
Al-Ali et al., 2022).
Robustness is the capability of handling a certain degree of uncertainty and dealing with unexpected disruptions without having to modify the production schedule; it has to be embedded into a smart system (
Negri et al., 2021). The scheduling must be able to quickly identify and respond to these disturbances (
Qiao et al., 2021).
The integration of the elements discussed in the previous sections provides solutions to achieve these goals: Big data analytics facilitates the knowledge support; the CPS technologies play a key role in real-time data monitoring and exchange; and optimisation of the maintenance schedules enables the computational solution. Their combination can be used to develop a decision-making framework that can transform the chemical or biochemical process into autonomous systems capable to quickly respond to changes in the environment.
9 Digital twin-based decision-making framework
The connection between the virtual (cyber) world and the real (physical) world provides the ability to create and update real-time virtual representations of physical assets to populate a digital twin that can be manipulated within the cyber world via simulation or optimisation to actuate the physical world supporting greater control of production facilities or individual machines (
Sharpe et al., 2019).
As the number of IoT devices in the industrial environment is constantly increasing, the systems are given certain intelligence by using instruments, such as smart sensors, controllers, meters, machine-to-machine communication, AI and other computational devices using big data analytics for decision-making (
Lee et al., 2011).
The increasing networking and digitisation give rise to increasing complexity, requiring spatial–temporal dynamics, coordination and intelligence, and challenges related to the interaction between human actors and the cyber and physical components of the HCPS, especially for situations in which control is required to be switched between humans and machines (
Liu and Wang, 2020). They also offer the opportunity for using IoT technologies to augment above the human abilities and develop novel diagnostics and maintenance methodologies, realising intelligent industrial systems that are able to learn, self-adapt and self-repair.
The structural scale and the dynamic complexity of the modern industrial systems make it challenging for operators to infer the conditions in the plant quickly and make timely decisions, especially during abnormal situations. Technology needs to help in preventing human errors and stop chain reactions that can transform small incidents into catastrophic failures. This condition can be achieved by forming an integrated system modelling, self-adaptive and self-repairing sensing network, and autonomous software architecture that exhibits rapid information selection, scene understanding and decision-making capabilities.
The general framework of such an architecture is shown in Fig.2. A decentralised, modular plug-and-play structure is envisaged, with the following main modules: The holistic process system and cooperative control, the massive connectivity resilient communication network, the ML-based fault detection and prediction, the intelligent adaptive decision-making framework, and the virtual reality (VR) system for visualisation and interaction.
The structure considers the two levels of the CPS: The physical layer, with the industrial process itself, the wireless sensors and actuators, the physical controllers, and the inspector robots; the cyber layer, with the wireless communication network, the centres for data and model storage, the fault detection/prediction algorithms, and the decision-making framework. The human level is considered, to include the involvement of humans in the decisions or their partial involvement in the operation of the systems by robot–human cooperation. The wireless network is working as the connecting element between the two layers.
One of the main parts of this system is the decision-making framework, which aims to achieve the process goals, monitor the system operation, control the inspector robots, reconfigure the operation (when possible) in response for any faults or changes in the process, and report to the operator. Reconfigurability is required when the system needs to cope with the changes in hardware, environment or a failure in a subsystem.
The decision-making tasks include:
• Monitoring;
• Adaptation (Reconfiguration);
• Planning (Model selection);
• Online learning.
The design of the resulting systems requires a multidisciplinary knowledge and multiple modelling paradigms for the development of the various process stages, such as gathering requirements, architecture design, simulation, and process optimisation and control. In order to handle various characteristics of the system, such as heterogeneity and collaborative behaviour of the components, a source of inherent uncertainties, with varying effect on the overall system behaviour should be considered (
Al-Ali et al., 2022).
Systematic and robust scheduling solutions, such as the ones discussed in Section 6, are required for modern smart manufacturing systems to facilitate their ability to adapt to changeable manufacturing environments (
Qiao et al., 2021).
Context models can facilitate the analysis of potential runtime situations and consequently aid in the design of a system capably to automatically cope with uncertainties (e.g., able to identify potential uncertainties during runtime and self-adapt to resolve uncertainties) (
Bandyszak et al., 2020).
A multilayer structure (Fig.3) is proposed for the decision-making module, with the following elements.
For the decision-making module, an MAS approach with two layers is used:
a) The abstraction layer — deals with the normal system operations, or in other words, the operations that do not require decisions to be taken. The key characteristic of the abstraction layer is that it is located between the decision-making layer and the rest of the general framework. This abstraction layer provides a continuous to discrete translation, taking streams of data from the subsystems and passing on discrete abstractions of this to the agent itself. Specifically, it aims to identify data of interest to the agent and packages this together into a concise form.
b) The decision-making layer — deals with the decisions. It takes a discrete information from the abstraction layer and replies with decisions by using the knowledge represented in the ontology and the inference engine that can reason the required rational agent’s enquiries from the ontology.
Each layer has its own modular substructure, which will be described as follows.
I. The abstraction layer has the following components:
a) The inspection agent — responsible for controlling the monitoring via the sensors and inspector robots, and making the routine inspection plans. These plans can be interrupted by a decision from the rational agent that may require more information or to make more inspections on a specific section of the process/system. This agent will follow the monitoring agents and send data to the rational agent in case of any anomalies.
b) The control agent — responsible to follow the control module operation and keep track of the system’s behaviour and its relation with the control module. On the basis of the information from the control module, this agent decides the optimal connectivity between the controllers based on the information from the system model resources and its previous knowledge. This condition enables better and safer control system operation.
c) The monitoring agent — responsible on the upper layer of the monitoring and management module. The agent analyses the information gathered from the fault detection and prediction modules and the process behaviour models to detect abnormalities and system behaviour deviations, and reports the failures and recommendations to the rational agent.
d) The learning agent — keeps track of all the operation states that include: The current models, the agent’s actions, the rational agent decisions, and their effect on the system. The agent tries to learn from the history of the system behaviour. To sum up, it aims to learn the optimal process operation from the previous process operation data.
II. The decision-making layer has the following components:
a) The ontology — a formal representation of the knowledge about the system that includes the system’s models, flowcharts, the standard operating procedures, and the general knowledge about process engineering. This formal knowledge is available for the rational agent to infer about process operation.
b) The inference engine — a module that can infer logical consequences from the ontology. When the rational agent needs to know an information about the process, it sends the query to the engine and infers the ontology to find the requested information.
c) The rational agent — makes high-level decisions for the system on what actions to perform given its beliefs, desires and intentions. This agent has explicit reasons for its decisions. It should be aware of the system components and their expected behaviour. The rational agent monitors the overall system performance. One of its main tasks is to reconfigure the system’s structure and models to overcome deficiencies. The rational agent decisions are made and sent to the operators with the reasoning for the final confirmation via the VR module.
A key characteristic of this structure is that the overall architecture remains distributed (rather than centralised), enabling evolutionary capabilities, such as modulation/adjustment, robustness, adaptation and reconfigurability with respect to changing environments whilst keeping relatively low computational costs.
The traditional manufacturing process becomes a smart factory, which is characterised by self-perception, operation optimisation, dynamic reconfiguration and intelligent decision-making (
Wan et al., 2021).
The modular framework for decision-making enables the implementation of a digital twin of the CPS, a virtual environment centred on the integration of modules at different levels and complexity of representation. With the whole process described as an object, various models involved in planning, design and operation can be used in an all-round manner, including modelling and collaboration of control, production, management and other levels.
The current situation of the actual real system can be synchronised to the virtual environment via the IoT elements in a timely manner, and the virtual system calculates the future operating state in a rolling horizon to manage and control the operation of the physical system, realising the ability of reliable and real-time system monitoring, risk prediction, smart regulation and operation optimisation.
10 Conclusions and outlook
Human beings acquire information from their surroundings through sensory receptors. The sensory stimulus is converted to electrical signals as nerve impulse data communicated with the brain. At this point, using the mechanism of reasoning, the sensory data are effectively analysed and used to generate a vision of the future.
This paper presents a state-of-the-art of various technologies facilitated by Industry 4.0 that can enable improved sensing of chemical and biochemical processes, together with enhanced data analysis that can be used to develop a decision-making capability similar to reasoning. The integration of these technologies can promote the creation of autonomous smart systems that can self-adapt and self-regulate, even with limited data, and predict a sequence of events for short- and long-term.
Thus, a new generation of production systems characterised by smart sensing and intelligent services, connected via ubiquitous sensors, intelligent hardware, control systems, computing facilities and information terminals can be developed via model-based agent intelligent networks of HCPS. Key capabilities include overall location awareness, forecast and early warning, collaborative optimisation and decision-making.
This decentralised, modular and hierarchical model-based approach will also support the industry-wide automation, digitalisation, visualisation, local and global modelling, and interconnection of people, devices, and a wide variety of information resources and knowledge.
The Author(s). This article is published with open access at link.springer.com and journal.hep.com.cn