Supervised projection with adaptive label assignment for enhanced visualization and chemical process monitoring

Zhi Li; Junfeng Chen; Kaige Xue; Xin Peng

doi:10.1007/s11705-025-2561-2

Front. Chem. Sci. Eng. ›› 2025, Vol. 19 ›› Issue (7) : 56 DOI: 10.1007/s11705-025-2561-2

RESEARCH ARTICLE

Supervised projection with adaptive label assignment for enhanced visualization and chemical process monitoring

Author information +

History +

PDF (2579KB)

Abstract

Data-driven process monitoring methods are widely used in industrial tasks, with visual monitoring enabling operators to intuitively understand operational status, which is vital for maximizing industrial safety and production efficiency. However, high-dimensional industrial data often exhibit complex structures, making the traditional 2D visualization methods ineffective at distinguishing different fault types. Thus, a visual process monitoring method that combines supervised uniform manifold approximation and projection with a label assignment strategy is proposed herein. First, the proposed supervised projection method enhances the visualization step by incorporating label information to guide the nonlinear dimensionality reduction process, improving the degrees of class separation and intraclass compactness. Then, to address the lack of label information for online samples, a label assignment strategy is designed. This strategy integrates kernel Fisher discriminant analysis and Bayesian inference, assigning different label types to online samples based on their confidence levels. Finally, upon integrating the label assignment strategy with the proposed supervised projection method, the assigned labels enhance the separability of online projections and enable the visualization of unknown data to some extent. The proposed method is validated on the Tennessee Eastman process and a real continuous catalytic reforming process, demonstrating superior visual fault monitoring and diagnosis performance to that of the state-of-the-art methods, especially in real industrial applications.

Graphical abstract

Keywords

visual process monitoring / supervised uniform manifold approximation and projection / kernel Fisher discriminant analysis / Bayesian inference

Cite this article

Download citation ▾

Zhi Li, Junfeng Chen, Kaige Xue, Xin Peng. Supervised projection with adaptive label assignment for enhanced visualization and chemical process monitoring. Front. Chem. Sci. Eng., 2025, 19(7): 56 DOI:10.1007/s11705-025-2561-2

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Real-time monitoring of the operating status of industrial processes is crucial for ensuring efficient, safe, and stable production within enterprises [1]. In recent years, the development of distributed systems and sensor technology has facilitated the acquisition of massive amounts of industrial process data. In this context, data-driven process monitoring methods have garnered extensive research attention because of their simplicity and effectiveness [2,3]. Among these approaches, visual process monitoring maps raw industrial data onto a 2D plane, providing a more intuitive way to monitor industrial operating status [4]. Visual monitoring of actual industrial processes helps engineers identify problems promptly, thereby minimizing downtime and reducing manufacturing costs. Dimensionality reduction techniques are widely employed for visualizing high-dimensional data, and they typically operate by retaining only two principal components. Common linear dimensionality reduction methods include principal component analysis (PCA) [5,6], independent component analysis [7], canonical variable analysis [8], canonical correlation analysis (CCA) [9], and Fisher discriminant analysis (FDA) [10]. The existing nonlinear methods mainly include locally linear embedding [11], isometric mapping [12], Laplacian eigenmaps [13], self-organizing maps (SOM) [14,15], and t-distributed stochastic neighbor embedding (t-SNE) [16,17]. Among these methods, SOM and t-SNE are widely used techniques that reduce high-dimensional data to two dimensions while preserving topological relationships to the greatest extent possible.

Liukkonen et al. [18] developed an advanced monitoring and diagnostic system for industrial processes, utilizing an SOM to address the multivariate and dynamic nature of process data. This approach visualizes the input data through trajectories and color-coded alerts, enabling engineers to intuitively understand process states and detect potential issues in real time. Wang et al. [19] proposed a deep discriminative feature learning method that combines an extended stacked autoencoder and a feedforward neural network. They extracted discriminative features by constraining data points to category centers and used t-SNE for visualization purposes, which significantly improved the accuracy and performance of their fault detection method. Tang and Yan [20] proposed an integrated approach, termed FDA-t-SNE, which combines FDA, t-SNE, and a backpropagation (BP) neural network to attain enhanced visual process monitoring. FDA is a powerful feature extraction method that can capture classification features between different categories, whereas t-SNE effectively preserves the topological structure of the given data. Finally, real-time online data conversion is achieved through the trained BP network. Lu and Yan [21] proposed a new method called variable-weighted FDA (VWFDA), which combines t-SNE with multiple extreme learning machines (ELM). This approach addresses the shortcomings of the FDA-t-SNE method by improving the FDA component to amplify fault information and employing multiple ELM models to achieve more precise data mapping. Chen and Yan [22] noted that CCA can extract underlying fault diagnosis information, and based on the identified relevant components, an SOM can distinguish various types of states on the output map. Their results indicate that the proposed CCA-SOM method is effective for conducting real-time monitoring and fault diagnosis for complex chemical processes. Song et al. [23] proposed a technique that integrates a statistical pattern framework and an SOM to differentiate between various states on output maps and visually monitor abnormal states. Benatia et al. [24] combined an SOM with autoencoders using deep neural networks, further enhancing the resulting diagnostic performance. The above research demonstrates that combining feature extraction methods with visualization techniques can further enhance the performance of monitoring models. For general visualization tasks, visualization methods should strive to preserve the structure of the original data to the greatest extent possible. However, for visual process monitoring, the goal is to maximize the degree of separation between different classes of data while concentrating data belonging to the same class. Unfortunately, both SOM and t-SNE are unsupervised methods that were not specifically designed for classification tasks. Although t-SNE is considered more suitable for visual process monitoring than SOM [25], it lacks the ability to directly map new data. These methods require additional models to be trained for online mapping purposes [4,20,21,25]. Moreover, the objective function of t-SNE focuses on clustering similar data within local neighborhoods while failing to pay attention to the global structure [26].

By using kernel techniques, kernel FDA (KFDA) transforms the input data into a feature space where the originally linearly inseparable data become separable and then extracts the key features using FDA. Compared with FDA, KFDA possesses significantly stronger nonlinear feature extraction capabilities. Zhu and Song [27] utilized KFDA for feature extraction and subsequently applied a Gaussian mixture model and k-nearest neighbors (KNN) to the KFDA subspace for fault detection and isolation. Although discriminative analysis techniques can extract discriminative features to increase interclass distances (ICD) and intraclass cohesion to some extent, most existing visual monitoring methods do not further leverage class information to enhance the monitoring process. Additionally, owing to the limitations of feature extraction or visualization methods, these approaches may fail when confronted with data that differ significantly from the training data. Uniform manifold approximation and projection (UMAP), which is a novel manifold dimensionality reduction technique grounded in solid topology theory, has gained attention because of its fast execution speed and high-quality embeddings [28]. Joswiak et al. [26] reported that UMAP has significant potential applications in the chemical industry. Chang et al. [29] proposed a novel fault detection method based on UMAP and support vector data description. The case study in their paper demonstrated that UMAP could effectively capture subtle variations in the process data and extract nonlinear features. UMAP provides an optimal balance between local and global quality while ensuring excellent interpretability, offering powerful insights and process understanding through visualization [30,31]. Unfortunately, UMAP remains an unsupervised method and is unable to effectively handle new data acquired from entirely novel classes.

Based on the aforementioned insights, the task of visually monitoring high-dimensional industrial process data faces significant challenges such as complex data distributions, overlapping categories, and the presence of unknown samples. These characteristics create significant difficulties for both the feature extraction and low-dimensional visualization phases. The effective use of label information is critical for improving the quality of fault diagnosis and visualization methods. Various methods have been proposed to incorporate label information into visualization techniques. For example, Kuzmanovski et al. [32] proposed a supervised SOM classification method that enhances the constructed sample matrix by integrating label vectors. Similarly, Zheng et al. [33] proposed a discriminative t-SNE (DSNE) method that uses label information to compute the joint probabilities of sample pairs. DSNE minimizes the intraclass Kullback-Leibler (KL) divergence while maximizing the interclass KL divergence. Meng et al. [34] proposed a dimensionality reduction method called class-constrained t-SNE, which combines data features and class probabilities to simultaneously display both the data feature structure and the class probability structure in the output dimensionality reduction results. However, while these methods effectively utilize label information, handling unlabeled online data and unknown categories remains a key challenge. Since online data are typically unlabeled, the above methods cannot be directly applied to online processes. To address these challenges, we propose a novel visual process monitoring method. The proposed method integrates feature extraction, adaptive label assignment, and supervised visualization to achieve a more efficient and accurate visual process monitoring scheme. The contributions and details of this work can be summarized as follows: (1) A supervised UMAP (SUMAP) method is proposed; this approach is based on a designed supervised metric and uses label information to guide the embedding generation process. This enhances the interclass separability and intraclass compactness (ICC) of the low-dimensional embeddings, thereby improving the resulting visualization quality. (2) A confidence-based label assignment (CBLA) strategy, which assigns different types of labels to online samples based on their confidence levels, is proposed. This strategy maximizes the utilization of information derived from reliable labels, minimizes the negative impact of low-confidence labels, and labels unknown data. (3) A visual process monitoring method (SUMAP-LA) that combines SUMAP and CBLA is proposed. This method leverages the labels predicted by CBLA to assist SUMAP in performing dimensionality reduction, thereby producing more accurate visualization monitoring results. Its effectiveness is demonstrated in the Tennessee Eastman (TE) process and a real-world continuous catalytic reforming (CCR) process.

The remainder of this paper is organized as follows. Section 2 provides a review of the related methods. Section 3 offers a comprehensive explanation of the framework and principles underlying the proposed visualization-based monitoring method. Section 4 presents an analysis and a discussion of the application results produced by the proposed method on the TE process case and CCR unit data, demonstrating its significant advantages. Finally, Section 5 provides a summary and a future outlook.

2 Preliminaries

2.1 KFDA

KFDA is a widely used supervised feature extraction technique that extends FDA by introducing kernel functions, making it more effective at handling nonlinear and complex data. Consider a data set

X = [x 1, x 2 …, x N]

, where each

x i ∈ R m

represents a training sample, and the data set consists of

C

distinct classes, with n_k being the number of training samples belonging to class k and satisfying

∑ k = 1 c n k = N

. Let Φ be a nonlinear mapping that maps data from the original space

R m

to a high-dimensional feature space F, that is,

Φ : x ↦ Φ (x), R m → F

. Then, in space F, the Fisher criterion can be expressed as follows:

(1)

J Φ (ω) = arg ⁡ max ω ω T S b Φ ω ω T S t Φ ω, ω ≠ 0,

where

S b Φ

and

S t Φ

are the between-class and total scatter matrices, respectively, and are defined as follows:

(2)

S b Φ = 1 N ∑ k = 1 C n k (μ k Φ − μ 0 Φ) (μ k Φ − μ 0 Φ) T,

(3)

S t Φ = 1 N ∑ i = 1 N (Φ (x i) − μ 0 Φ) (Φ (x i) − μ 0 Φ) T .

Here,

μ k Φ

and

μ 0 Φ

are the mean vectors of the n_k mapped samples belonging to class k and all N mapped samples, respectively.

Assume that

Q = [Φ (x 1), Φ (x 2), …, Φ (x N)]

is a set of samples obtained after mapping them via the nonlinear function Φ. According to the research of Yang et al. [35], the optimization problem mentioned above can be transformed into a Fisher criterion problem in the KPCA-transformed space

R d ~

. Therefore, Eq. (1) becomes

(4)

J K (β) = arg ⁡ max β β T S b K β β T S t K β,

where

(5)

S b K = Λ 12 P T W P Λ 12,

(6)

S t K = Λ .

Here,

Λ

and P are the eigenvalues and corresponding eigenvectors of the centered kernel matrix K, respectively.

W = d i a g (W 1, W 2 …, W c)

, where W_k is an n_k × n_k matrix with its terms all equal to 1/n_k. It is easy to verify that

S t k

is positive definite and that

S b k

is semipositive definite. Thus, Eq. (4) represents a standard generalized Rayleigh quotient. By solving the generalized eigenvalue problem of

(S t K) − 1 S b K

, a set of optimal solutions

G = [β 1, β 2 …, β d] (d ≤ C − 1)

can be obtained. Therefore, given a new sample x and its mapped image Φ(x), the corresponding discriminant feature vector z can be obtained by using the following KFDA transformation:

(7)

z = G T (Q P Λ − 1 / 2) T Φ (x) .

Equation (7) can be divided into two components:

(8)

x^= (Q P Λ − 1 / 2) T Φ (x),

(9)

z = G T x^.

It is evident that

x^

represents the sample x transformed by KPCA, and G denotes the associated Fisher optimal (linear) discriminant vectors. Correspondingly, the transformation in Eq. (7) is essentially the Fisher (linear) discriminant transformation in the KPCA-transformed space.

2.2 UMAP

UMAP is a graph-based dimensionality reduction technique that is different from the probabilistic t-SNE model. UMAP constructs a weighted KNN graph in a high-dimensional space to represent the structure of the given data, whereas t-SNE relies on probabilistic distributions to express similarities. The graph construction process in UMAP is highly flexible, enabling the use of different distance metrics. UMAP primarily consists of two main components: high-dimensional graph construction and low-dimensional graph layout [36].

Given high-dimensional samples

x i ∈ R m

with

i = 1, 2, …, N

, the UMAP algorithm begins by constructing a KNN graph based on a specified distance metric d. This graph captures the local structure of the data and serves as the foundation for the subsequent steps. For each sample x_i, its n nearest neighbors

{x i 1, x i 2, …, x i n}

are identified via the chosen metric. The similarity between x_i and its neighbor x_i is calculated using the following formula:

(10)

ω (x i, x j) = exp ⁡ (− max (0, d (x i, x j) − ρ i) σ i),

where ρ_i represents the distance to the nearest neighbor of x_i. Thus, each sample has at least one connected neighbor, significantly improving the performance of UMAP in cases with high-dimensional data.

σ i

is a local scaling factor that ensures a uniform similarity distribution across the entire data set. The calculated similarities are used to construct an adjacency matrix

A ∈ R N × N

, which is defined as follows:

(11)

A i j = {ω (x i, x j), i f x j i s a n e i g h b o r o f x i 0, o t h e r w i s e .

To create a symmetric representation of the graph, the adjacency matrix A is symmetrized by using the following formula:

(12)

B = A + A T − A ∘ A T,

where A^T is the transpose of A and

∘

is the Hadamard (or pointwise) product. This operation combines directed similarities into a unified, undirected graph representation. The resulting symmetric adjacency matrix

B ∈ R N × N

serves as a topological representation of the high-dimensional data. It faithfully preserves the geometric and topological structure of the data set, providing a robust basis for the subsequent low-dimensional embedding process.

Next, UMAP initializes the low-dimensional embedding

Y = {y 1, y 2, …, y N | y i ∈ R k}, k < m

through spectral embedding, which helps accelerate the optimization process and improves its stability. For points y_i and y_j in the low-dimensional space, their similarity is defined as follows:

(13)

ω l = (1 + (a ‖ y i − y j ‖ 22) 2 b) − 1,

where a and b are hyperparameters that control the tightness of the low-dimensional embedding space. These parameters can be flexibly adjusted using the min_dist parameter. To ensure that the low-dimensional embedding Y preserves the structure of the high-dimensional data as faithfully as possible, the following optimization objective is defined:

(14)

min ∑ ω 0 log ⁡ (ω 0 ω l) + (1 − ω 0) log ⁡ (1 − ω 0 1 − ω l),

where ω₀ is the similarity in the high-dimensional space, as computed from the symmetrized adjacency matrix B. This cost function has two components. The first term emphasizes local cohesion by preserving strong connections (large edge weights) and highlighting natural clusters within the data. The second term prioritizes global separation by ensuring significant distances between points that are weakly connected (small edge weights). Unlike t-SNE, which only includes the first term and requires both high-dimensional and low-dimensional similarities to be normalized, UMAP eliminates the need for such a normalization step, resulting in faster computations and improved scalability. Additionally, the inclusion of the second term in UMAP enables it to better capture the global relationships between clusters.

Finally, Eq. (14) is optimized using stochastic gradient descent (SGD). SGD provides several advantages: it is a computationally efficient approach for large data sets, avoids the overhead associated with calculating pairwise similarities for all points, and directly optimizes sparse graph structures without requiring global updates. These characteristics reduce the temporal complexity of the UMAP algorithm to O(N^1.14), whereas t-SNE has a temporal complexity of O(N²) [28]. The gradients guiding this process are derived from attractive and repulsive forces. The attractive force, which corresponds to strongly connected points, is given by

(15)

− 2 a b ‖ y i − y j ‖ 2 2 (b − 1) 1 + ‖ y i − y j ‖ 22 ω 0 (x i, x j) (y i − y j),

while the repulsive force, which is applied to weakly connected or disconnected points, is defined as

(16)

2 b (ε + ‖ y i − y j ‖ 22) (1 + α ‖ y i − y j ‖ 2 2 b) (1 − ω 0 (x i, x j)) (y i − y j),

where ε is taken to be a small value such that the denominator does not become zero. The combination of these forces ensures that points that are close in the high-dimensional space are drawn together in the embedding, whereas points that are distant are pushed apart. Through the continuous adjustment of these forces, the final low-dimensional embedding Y is obtained, providing a more faithful representation of the original high-dimensional data.

3 Proposed approach

This section provides a detailed introduction to the proposed SUMAP-LA method. To facilitate understanding, we first explain the offline and online stages of the SUMAP method separately. Then, we introduce the proposed label assignment mechanism, and finally, we focus on innovatively integrating the two components and forming the complete workflow.

3.1 Supervised UMAP

Unlike t-SNE, UMAP incorporates negative sample optimization into its objective function and adjusts the low-dimensional similarity computation. This allows UMAP to produce higher-quality low-dimensional embeddings in less time in most cases. However, UMAP is essentially an unsupervised method. In many scenarios, visualization algorithms can achieve significantly better performance when auxiliary class label information is incorporated. To address this situation, we propose a new distance metric and apply it to the offline and online stages of UMAP, thereby modifying the adjacency matrix computation to improve the quality of the generated low-dimensional embeddings. Detailed descriptions of the offline and online stages of this method are provided below.

3.1.1 Offline stage

The standard implementation of UMAP typically employs the Euclidean distance measure to construct a KNN graph, which does not explicitly incorporate class information. In fact, UMAP supports the use of flexible distance metrics. This is because its similarity metric is still based on the Gaussian kernel function and undergoes a symmetrization process. Therefore, new supervised distances can be constructed for performing similarity calculations. Consider a data set

X ¯ = {X, L}

, where

X = {x 1, x 1, …, x N}

represents the feature vectors of N samples, with each

x i ∈ R m

;

L = {l 1, l 2, …, l N}

is the matrix of one-hot encoded labels, with

l i ∈ R C

indicating the label of sample x_i; and C is the total number of classes. To better leverage label information, we propose a dynamic distance metric based on class centers, which is defined as follows:

(17)

d s (x i, x j) = ‖ x i − x j ‖ 2 + ∑ k = 1 C λ k c (x j) ⋅ l i k ‖ μ k − μ c (x j) ‖ 2,

where

‖ ⋅ ‖ 2

represents the standard Euclidean distance measure, c(x_i) denotes the class to which sample x_i belongs, and

μ k = 1 n k ∑ c (x i) = k ⁡ x i

denotes the center vector of class k. The second term is activated only when

c (x i) ≠ c (x j)

, effectively increasing the ICD. Additionally,

λ i j

represents the Jensen-Shannon (JS) divergence between classes i and j, which dynamically adjusts the influences of class centers on the distance metric. Assuming that the probability distributions of classes i and j are P_i and P_j, respectively, the computation of

λ i j

is defined as follows:

(18)

λ i j = 12 (∑ k P i k log 2 ⁡ P i k M k + ∑ k P j k log 2 ⁡ P j k M k),

where P_ik and P_jk represent the probabilities of P_i and P_j in the k – th dimension, respectively, and

M k = 12 (P i k + P j k)

is the mixed distribution. By incorporating JS divergence as a dynamic adjustment parameter, the influence of class information can be adaptively modified based on the actual distribution differences between classes. This approach makes the constructed distance metric more realistic, enhancing both interclass separation and ICC. By replacing

d (x i, x j)

in Eq. (10) with

d i (x i, x j)

ω i j

becomes the following equation:

(19)

ω i j = exp ⁡ (− max (0, d s (x i, x j) − ρ i) σ i) .

Subsequently, an exponential decay function is applied to the adjacency matrix A to further reduce the similarities between different classes:

(20)

A i j = A i j ⋅ exp ⁡ (− λ i j) .

A symmetrization step, as defined in Eq. (12), is applied to generate the final adjacency matrix B, completing the process of modifying the UMAP adjacency matrix. The resulting SUMAP method more effectively captures class information and constructs high-quality low-dimensional embeddings, providing a solid foundation for the online phase. Algorithm 1 provides the complete procedure of the offline phase of SUMAP.

To intuitively illustrate how the proposed distance metric works, consider a data set

X ∈ R 20 × 18

containing two classes, each with 10 samples. The number of neighbors is set to n = 10, and both the original UMAP algorithm and the SUMAP algorithm are applied. Fig.1(a) and 1(b) present heatmaps of the constructed adjacency matrices, where the row and column indices correspond to the samples. Half of the samples acquired from each class are selected to examine their similarity to all other samples. From the heatmaps, it is apparent that in Fig.1(a), there is some similarity between data from different classes, suggesting overlap between the classes. In Fig.1(b), however, the use of Eq. (17) to construct the adjacency graph ensures a lower degree of similarity between samples from different classes than between those derived from the same class. Fig.1(c) and 1(d) show the 2D projections produced after performing dimensionality reduction. In Fig.1(d), the samples obtained from the two classes exhibit greater ICD, and the intraclass samples are more compact. This finding demonstrates that the new distance metric is more effective at distinguishing data from different classes.

3.1.2 Online stage

The original UMAP method finds a suitable initial embedding position for each new sample point and performs optimization based on the manifold structure learned in the offline stage. When addressing complex data, incorporating prior labels to guide the projection process can significantly enhance the quality of the generated low-dimensional embedding. Specifically, during the online phase, SUMAP aims to leverage accurate label information to guide the new sample projection process. These labels are assigned through the CBLA method introduced later. We assume that the predicted labels for the new samples are available and denoted as

l^i = [l^i 1, l^i 2, …, l^i C]

. To obtain a KNN graph based on the predicted label information, a distance metric that is consistent with the offline phase is defined to compute the distance between the new sample

x i ∈ R m

and the training samples:

(21)

d s (x i, x j) = ‖ x i − x j ‖ 2 + ∑ k = 1 C λ k c (x j) ⋅ l^i k ‖ μ k − μ c (x j) ‖ 2,

where

l^i k

is the predicted probability that the sample belongs to category k. The other variables are defined in the same as those shown in Eq. (17). Next, the similarity is calculated using the following formula:

(22)

ω i j = {exp ⁡ (− d^s (x i, x j) σ i), i f | l^i | ≠ 0 ε, i f | l^i | = 0,

where ε is a value that is close to zero. This means that if

| l^i | = 0

, the new sample is judged to be dissimilar to all known samples. For samples with

| l^i | = 0

, the gradient degenerates to zero and therefore does not participate in the subsequent SGD optimization process. This similarity calculation method effectively isolates samples from unknown categories, preventing interference with the distribution of known categories. To better visualize the test samples, particularly those belonging to unknown categories, the improved method introduces a new projection initialization mechanism. The initialization formula for the low-dimensional embedding

y n e w ∈ R k

is as follows:

(23)

y n e w = {y j, i f ∃ j, | ω i j | = 1 ∑ j = 1 n ω i j y j ∑ j = 1 n ω i j, i f ∀ j, | ω i j | ∈ (ε, 1) f N, i f ∀ j, | ω i j | = ε,

where y represents the low-dimensional embedding of the training sample, n is the number of nearest neighbors, and f_N denotes the random projection region for unknown samples, which follows the Gaussian distribution defined as shown below:

(24)

f N = N (μ N, σ N) .

The parameters in the above equations are manually configured to ensure that the projection region for the unknown samples remains distant from the main training sample distribution, thereby guaranteeing the visualization quality of unknown categories. With the proposed improvements provided for the distance metric and projection initialization mechanism, the online SUMAP method is able to make better use of the label information of the test samples. These improvements help improve the visualization performance of online SUMAP and provide more reliable low-dimensional representation methods for practical applications. Algorithm 2 provides the complete procedure of the online phase of SUMAP.

3.2 CBLA mechanism

The previous section detailed the offline and online phases of the proposed SUMAP algorithm. As mentioned earlier, SUMAP aims to incorporate label information to improve low-dimensional embeddings, allowing it to better reflect the actual distribution and class differences of the input data. However, in the online phase, samples are typically unlabeled. To fully exploit the advantages of SUMAP, accurate and reliable label information is essential. Nevertheless, predictions are never fully accurate; therefore, it is necessary to assess the credibility of the predicted results. To address this issue, this paper further proposes a CBLA strategy to ensure that appropriate labels are assigned to new samples in the online phase, thereby guiding the online projection process of SUMAP and enhancing the effectiveness and reliability of the entire visual process monitoring method. The CBLA scheme is primarily based on Bayesian inference and is designed with the following considerations. (1) Bayesian methods provide probabilistic interpretations, enabling the effective assessment of the probabilities that each sample belongs to different categories. (2) By applying Bayes’ rule in the feature space, the resulting classification performance can be further improved. (3) By setting a threshold, low-confidence predictions can be labeled as unknown categories, thereby reducing the impacts of noise and outliers on the model.

Specifically, the proposed CBLA approach involves designing thresholds and label assignment rules in the KPCA space. Based on different thresholds, three types of labels are designed to maximize the use of reliable predicted labels, minimize the interference caused by incorrect labels, and accommodate unknown samples. The details are explained below.

Given a training data set

X = {(x 1, y 1), (x 2, y 2), …, (x N k, y N)} ∈ R N × (m + 1)

and

y i = c (x i) ∈ {1, 2, …, C}

with C classes, the posterior probability of a new sample x belonging to class k is expressed as follows according to Bayesian inference:

(25)

P (y = k | x) = P (x | y = k) P (y = k) ∑ k ′ = 1 C P (x | y = k ′) P (y = k ′),

where P(y = k) is the prior probability of class k and P(x|y = k) is the conditional probability density. Rewriting this in a logarithmic form results in the following discriminant function:

(26)

g k (x) = ln ⁡ P (x | y = k) + ln ⁡ P (y = k) .

Under the assumption of the FDA model, the samples of class k follow the multivariate Gaussian distribution shown below:

(27)

P (x | y = k) = 1 (2 π) m / 2 | Σ k | 1 / 2 exp ⁡ (− 12 (x − μ k) T Σ k − 1 (x − μ k)),

where μ_k is the mean vector of class k and ∑_k is the covariance matrix of class k. Upon substituting Eq. (27) into Eq. (26), the discriminant function becomes

(28)

g k (x) = ln ⁡ P (y = k) − 12 (x − μ k) T Σ k − 1 (x − μ k) − 12 ln ⁡ | Σ k | .

As discussed in Section 2.1, KFDA is equivalent to performing FDA in the KPCA-transformed space. In this transformed space, the samples are mapped to

x^∈ R d ~

using Eq. (8), and the covariance and mean vectors are further projected by G. The final discriminant function is given by

(29)

g k (x) = ln ⁡ P (y = k) − 12 (x^− μ^k) T G (G T Σ^k G) − 1 G T (x^− μ^k) − 12 ln ⁡ (| G T Σ^k G |),

where

μ^k

and

Σ^k

are the mean vector and covariance matrix of class k in the KPCA-transformed space, respectively. The Bayesian classifier applies the following rule to classify x:

(30)

y = arg ⁡ max k ∈ {1, …, C} ⁡ g k (x) .

Directly using the predicted label y may introduce errors. Therefore, it is necessary to quantify the confidence of the output prediction. Inspired by the research of Lou et al. [37], we evaluate the reliability of the prediction results using the following statistical threshold:

(31)

g k (x) ≥ T k = 12 L k + ln ⁡ (P (y = k) | G T Σ^k G | − 12),

where L_k is given by [38]:

(32)

L k = d ~ (n k 2 − 1) n k (n k − d ~) F α (n k, n k − d ~) .

Here, F_α represents the Fisher distribution, and

d ~

represents the dimensionality in the KPCA space. After normalizing Eq. (31) probabilistically, we obtain

(33)

g^k (x) = exp ⁡ (g k (x)) ∑ c = 1 C exp ⁡ (g c (x)) ≥ ζ k = exp ⁡ (T k) ∑ c = 1 C exp ⁡ (g c (x)) .

The threshold

ζ k

is influenced by the significance level α. Given significance levels α_a, α_b (α_a > α_b), we have that

ζ k a < ζ k b

. Let

g^k ∗ (x) = max k ⁡ g^k (x)

, and define the online label assignment rule as follows:

(34)

l^= {[0, 0, …, 0], i f g^k ∗ (x) < ζ k ∗ a [g^1 (x), g^2 (x), …, g^C (x)], i f g^k ∗ (x) ∈ (ζ k ∗ a, ζ k ∗ b) e → k ∗ i f g^k ∗ (x) > ζ k ∗ b .

Here,

l^

is a C-dimensional vector, and

e → k ∗ = [0, …, 1, … 0]

is a one-hot vector with the

k ∗ − t h

element set to 1. This rule assigns appropriate label vectors to online samples based on their confidence levels. For high-confidence predictions, one-hot hard labels are assigned to ensure the accuracy of visualization decisions. For low-confidence predictions, the samples are classified into unknown categories to enhance the ability of the model to handle unknown data. For uncertain predictions, normalized probabilities are used to assign the corresponding labels, preserving the classification tendencies. In practical use cases, it is necessary to consider the impacts of both noise and outliers. Specifically, the larger the value of

ζ k ∗ s

is, the more the model tends to detect outliers, but this also increases the risk of misclassifying noise. Therefore, when determining the threshold, the level of noise in the given data must be considered.

3.3 Evaluation metrics

To establish quantitative criteria, this paper employs the convex hull algorithm [39] to determine the envelopes of different regions. Before applying the convex hull algorithm, denoising the low-dimensional embedding

Y ∈ R N × 2

is essential, as this improves the quality of the final envelopes (refer to Fig. S1, cf. Electronic Supplementary Material, ESM). Hierarchical density-based spatial clustering of applications with noise (HDBSCAN) [40] is an effective outlier detection method that can capture the main portion of the input data by simply configuring the search radius and the minimum number of sample points. Therefore, HDBSCAN is used for denoising before executing the convex hull algorithm, with the parameters uniformly set to

e s p = 1

and min_samples = 20. The convex hull algorithm is then applied to draw the envelope

[γ 1, …, γ c]

, which represents the expected projection area of each data category.

The F1 score is the harmonic mean of precision (P_r) and recall (R_e). Precision measures the accuracy of samples that are predicted as belonging to the positive class, whereas recall reflects the ability of the model to correctly identify true-positive samples. Detailed definitions of these metrics are provided in Table S1 (cf. ESM). Therein,

T +

represents the number of samples within category i that fall inside its envelope γ_i,

F −

indicates the number of samples within category i that fall outside γ_i, and

F +

refers to the number of samples from other categories that are incorrectly classified as falling inside γ_i. In this study, the above indicators are weighted according to the number of samples contained in each category and used as the metrics for evaluating the fault classification performance of the developed model.

In addition, a quantitative metric is required to evaluate the visualization results. The ICD and ICC are used to measure the separation between classes and the compactness within a class, respectively. For a given data set

X = {x 1, x 2, …, x N}

with C classes and n_i samples per category, the ICD and ICC are defined as follows:

(35)

ICD = 1 C (C − 1) ∑ j ≠ i ∑ i = 1 C ‖ μ i − μ j ‖,

(36)

ICC = 1 C ∑ i = 1 C 1 n i ∑ c (x) = i ‖ x − μ i ‖,

where μ_i is the mean vector of the i – th class. A larger ICD value indicates that the distance between the samples belonging to the evaluated categories is greater, leading to a more distinct class separation effect. Conversely, a smaller ICC value signifies that samples within the same category are more tightly clustered, resulting in better intraclass consistency. The accuracy of the KNN classifier can also serve as a metric for evaluating the performance of visualization methods [25]. These three metrics provide insights into the performance of visualization methods to some extent. Therefore, this study introduces a composite metric μ_M to evaluate the performance of visualization methods:

(37)

μ M = I C D × A C C K N N I C C .

Here,

A C C K N N

represents the accuracy of the KNN classifier. A higher μ_M value indicates a greater ICD, smaller ICC, and a data distribution that better aligns with the true samples. These three metrics (the ICD, ICC, and ACC) are also presented in the results (Section 4.1.1) to clarify the components of μ_M.

3.4 Proposed visual process monitoring method

This section integrates the methods introduced in the previous sections into a complete framework for visual process monitoring. The flowchart of the proposed SUMAP-LA method, which consists of two main parts, namely, offline training and online monitoring, is shown in Fig.2. The detailed steps are as follows:

1. Offline training

(1) Collect labeled historical industrial data

X ¯ = {X, L} ∈ R N × (m + C)

with C classes and n_j samples per category, and process X using the z score normalization.

(2) Use Eq. (8) to obtain a mapping

X^∈ R m × d ~

for X in the KPCA-transformed space. Based on

X^

, compute the parameters of the discriminant function (29) and the threshold T_k using Eq. (31).

(3) Perform a FDA transformation on

X^

according to Eq. (9) to obtain

Z ¯ = {Z, L} ∈ R N × (2 C − 1)

(4) Obtain the offline training results

Y = {Y 1, …, Y C}, Y i ∈ R n i × 2

by executing Algorithm 1, thereby deriving the offline SUMAP model.

(5) Calculate and plot the convex hull envelopes

γ = [γ 1, …, γ C]

for each category based on Y.

2. Online monitoring

(1) Assume that

x n e w ∈ R m

is a new sample obtained from a real-time industrial process.

(2) Use Eq. (8) to map the sample to the KPCA-transformed space as

x^n e w ∈ R d ~

and input it into the trained CBLA model. Predict its label

l^

based on Eq. (34).

(3) Obtain the KFDA features

z n e w ∈ R C − 1

x n e w ∈ R m

using Eq. (7).

(4) Input

z ¯ n e w = {z n e w, l^} ∈ R 2 C − 1

into Algorithm 2 to derive the final output y_new.

(5) Determine the type of the new sample based on the convex hull region into which y_new falls.

4 Experimental results and analysis

This study selects the TE process and a real-world CCR process as research cases, both of which are representative of important chemical processes. The TE process is rich in data, highly representative, and widely used for evaluating chemical process monitoring methods. Moreover, the CCR process, as a critical chemical industrial process, features a complex workflow, involves multiple stages, and has several potential failure points, providing it with significant practical application value. The selection of these two cases facilitates a comprehensive evaluation and validation of the performance of various monitoring methods across different scenarios. All the experiments are conducted on a computer equipped with an Intel(R) Core(TM) i7-12700 CPU@2.10 GHz and 16.00 GB of RAM using the evaluation metrics described in Section 3.3.

4.1 TE process

TE process data [41], which are derived from simulations of actual chemical processes, have been widely used to evaluate various process monitoring methods [42−45]. Detailed information about the data set can be obtained from Github website. The data set comprises 500 samples for the normal-state training set and 960 samples for the test set. Each fault type contains 480 training samples and 800 test samples. The diversity of the fault types and the abundance of samples provide a robust foundation for validating the practical performance of methods across a wide range of scenarios. While some studies achieve enhanced performance by selecting a subset of representative variables, this approach may overlook some useful variables. In this study, all 52 variables are utilized for training and testing.

4.1.1 Visualization performance comparison among SUMAP, t-SNE, UMAP, and DSNE

Faults 1, 2, and 6 exhibit significant step changes and are combined with the normal data set to evaluate the visualization performance of the proposed method. t-SNE and UMAP are widely used visualization techniques, whereas DSNE [33] is an improved variant of t-SNE. These three methods are included as baselines for comparison. The perplexity of t-SNE is set to 40, with a learning rate of 85. The parameters for UMAP and SUMAP are set to the same values, with n = 100, k = 100, and min_dist = 0.005. The parameters for DSNE remain the same as those employed in the original paper [33]. The μ_M metric proposed in Section 3.1 is used to quantify the visualization performance of different methods. Among them, the specific steps of the

A C C K N N

metric are as follows: for each sample, its n nearest neighbors (with n = 20 in this study) are identified, and the class label that appears most frequently among these neighbors is assigned as the predicted label. Evaluating whether the predicted label for a sample is the same as the true label determines whether it is correctly visualized.

A C C K N N

is then computed as the ratio of the number of correctly labeled samples to the total number of samples. The metric calculation results obtained for all four methods are shown in Fig.3. In addition to the μ_M metric comparison, the other previously discussed metrics (the ICD, ICC, and ACC) are also presented to provide a better understanding of the μ_M behavior. The results show that SUMAP achieves the highest μ_M score, primarily because of its superior ICC and ACC values. Although t-SNE attains the highest ICD score, its overall μ_M performance is the lowest because of its poor ICC and ACC results. In terms of ACC, SUMAP achieves nearly 100% accuracy, whereas the differences among the other methods are relatively minor.

Additionally, the 2D visualization results obtained for the four methods are presented in Fig. S2 (cf. ESM). UMAP exhibits a slight improvement over t-SNE, but neither method effectively distinguishes between faults 1 and 7, where the normal data are interfered with. Similarly, DSNE demonstrates a slight improvement over t-SNE, but its overall performance does not show a significant enhancement. A careful examination of the visualizations produced for t-SNE and SUMAP reveals that t-SNE results in a larger data distribution range, which explains why, despite visually appearing more compact between different categories, t-SNE achieves better ICD scores. On the other hand, although SUMAP provides the best visual separation effect, its overall data distribution is more compact, which results in a lower ICD score. This suggests that one should not blindly pursue higher ICD or lower ICC values. For example, an in-depth analysis of specific types of faults might suffer from the loss of detailed information if the ICC is too low. Notably, only SUMAP achieves clear separation between different categories, demonstrating superior interclass differentiation and classification visualization capabilities. Finally, the dimensionality reduction execution efficiency of the algorithm requires further discussion. In this case, 4160 samples are used for visualization purposes. Table S2 (cf. ESM) presents the model fitting times of the four dimensionality reduction methods. It is evident that UMAP demonstrates the best computational efficiency, whereas t-SNE and its variants require nearly twice the computational time. The computational time required by SUMAP slightly increases, which is attributed to the additional calculations required for the new distance metric. Overall, both the quantitative and visual analyses confirm that SUMAP provides the best classification and visualization performance, benefiting significantly from the integration of auxiliary category labels.

4.1.2 Case study concerning faults 1, 2, 4, 5, 6, and 7

The normal state and faults 1 and 2 are easily distinguishable, whereas faults 4 and 5 overlap with the normal state in most studies. Fault 3 is recognized as a challenging fault to detect due to its minimal fault variation [46]. In this study, all step faults except fault 3 are selected for experimentation, resulting in totals of 3380 training samples and 5760 testing samples. Four methods, KFDA-KNN [27], FDA-t-SNE [20], VWFDA [21], and CCA-SOM [22], are employed to compare and validate the effectiveness of the proposed method. The proposed method uses a Laplacian kernel function with a bandwidth of 1/52, and its KPCA dimensionality is set to 100. The significance levels α_a and α_b are set to 0.9 and 0.1, respectively. The number of nearest neighbors is configured as 50, and min_dist = 0.2. The parameters of the remaining methods are configured according to the original papers.

Tab.1 presents the evaluation metrics produced by different methods in this case, including precision, recall, and the F1 score. The results obtained for KFDA-KNN and CCA-SOM are based on the predicted labels generated by the KNN classifier and SOM model, respectively, whereas the results of the other methods are computed using the envelope approach described in Section 3.3. The results show that FDA-t-SNE and CCA-SOM perform poorly, likely because of their reliance on linear feature extraction methods. The performance of KFDA-KNN also suggests that nonlinear feature extraction methods may be better choices. VWFDA achieves good results, as it amplifies the influences of fault features by weighting the key characteristics. The proposed SUMAP-LA method achieves the best performance, with all three metrics approaching 100%. A further examination of the two-dimensional visualization results of these methods facilitates an intuitive comparison. Owing to the higher dimensionality of the KFDA-KNN method, which exceeds 2, direct visualization is not possible. Therefore, UMAP is used to generate a 2D projection. The 2D projection plots produced for the training and testing phases of all four methods are shown in Figs. S3 and S4 (cf. ESM). Among these methods, FDA-t-SNE and VWFDA exhibit some class overlap, particularly FDA-t-SNE. Additionally, their online projection plots show significant degrees of drift for many testing samples, likely owing to their reliance on neural network models for online projection, which results in an unstable visualization process and requires multiple training iterations. KFDA-KNN achieves good separation but still has some misclassified points. In contrast, the proposed method demonstrates excellent fault separation capabilities, enhances the intraclass cohesion effect, and significantly improves the reliability of online projections.

4.1.3 Case study involving multiple fault types

In actual industrial processes, faults often manifest in various forms, making it crucial to evaluate the performance achieved by the SUMAP-LA method when handling multiple types of faults. In this case, a data set comprising normal operations and faults 4, 8, 14, and 17 is constructed to test this ability. The data set includes 2420 training samples and 4160 test samples, covering four types of faults: step, random, sticky, and unknown faults. Considering the slight reduction in the data set size, the number of neighbors is adjusted to 45, and min_dist is set to 0.1 to preserve more local information, while the remaining parameters remain unchanged.

In this case, quantitative metrics are used to evaluate the performance of all methods. The final experimental results show that when encountering different types of faults, all methods experience certain performance degradations, particularly FDA-t-SNE. This decline is primarily because FDA, which is a linear approach, struggles to effectively capture features in complex data. VWFDA and CCA-SOM exhibit better performance, likely because their visualization methods are more stable, especially since VWFDA employs multiple ELM models rather than the BP model used in FDA-t-SNE. The performance of KFDA-KNN is comparable to that of CCA-SOM, which may be due to the inability of the KNN classifier to achieve optimal classification effects. The performance of SUMAP-LA also exhibits some decline, but compared with those of the other methods, this reduction is relatively small, and it remains the best-performing model. The detailed metrics obtained for this case are shown in Table S3 (cf. ESM).

Fig.4 presents a comparison between the normalized confusion matrices obtained for the two best-performing methods. It is clear that VWFDA performs poorly on faults 14 and 17, as it fails to distinguish them correctly, whereas SUMAP-LA demonstrates good classification performance. Upon analyzing the faults, this may be due to fault 17 also involving a sticking fault. The SUMAP-LA method exhibits slightly worse performance for faults 8 and 17, where some faults are incorrectly classified as normal. Step faults are relatively easy to distinguish with both methods. Next, a further comparison between their two-dimensional visualization results is conducted. VWFDA and the proposed method are selected for comparison because of their superior performance. The visualization results of training and testing are shown in Fig. S5 (cf. ESM). During the offline training phase, VWFDA fails to distinguish between faults 14 and 17, with relatively small distances between these faults. In contrast, SUMAP-LA achieves perfect separation. The online results indicate that VWFDA results in misclassifications for faults 8, 14, and 17, with many samples incorrectly classified. SUMAP-LA also yields some misclassifications, with some faults being categorized as normal, possibly due to subtle changes induced during the initial stages of the faults. However, overall, the proposed method demonstrates superior visual separation capabilities. This finding indicates that SUMAP-LA can effectively distinguish between different faults and accurately project online samples into their corresponding fault regions when handling various types of faults.

4.1.4 Case studies involving numerous fault conditions

SUMAP-LA is capable of identifying various types of faults. Additionally, a further analysis is needed to evaluate the performance of this method in the presence of multiple faults. To increase the complexity of fault detection, as many faults as possible are selected from the 21 fault types. Ultimately, ten faults are chosen: the normal state and faults 1, 2, 4, 5, 6, 7, 12, 14, 17, and 19. The training set comprises 5,300 samples, whereas the test set contains 8960 samples, covering a wide range of fault types. Owing to the large size of the data set, to balance the efficiency of the tested methods and the final results, the number of neighbors is set to 70. The

m i n_d i s t

parameter is set to 0.02 to obtain a denser embedding. The remaining parameters are kept unchanged.

The final performance metrics indicate that, when faced with many different fault types and quantities, the experimental results of all methods decline again. Among the tested approaches, FDA-t-SNE yields the worst performance, with a significant decline, suggesting that as the complexity of the data set increases, this method struggles to distinguish between different faults. This may be due to the linear limitations of the FDA approach and its reliance on a single BP model to learn online mapping relationships. KFDA-KNN outperforms CCA-SOM but still operates at a relatively low level. VWFDA demonstrates good precision but poor recall, indicating that many faults are missed. The proposed method remains the best-performing approach, with only a slight decrease in performance relative to the previous cases. The detailed experimental results can be found in Table S4 (cf. ESM).

A further examination of the visualization results produced by VWFDA and SUMAP-LA is presented. As shown in Fig.5, SUMAP-LA yields more dispersed and distinct training and testing results for different types of faults, with no overlapping regions. In contrast, the VWFDA method results in partial overlap for faults 1, 4, 12, and 17. Its online results are suboptimal at the boundaries, with many points projected outside the envelope. This is because VWFDA relies on ELM models for online projection purposes, which may require multiple iterations to achieve the optimal projection effect. Even with a small convergence loss, its online performance might still be poor. Conversely, the SUMAP-LA method performs online data transformation based on previously learned manifold structures, offering better stability. In the online testing results, some fault 12 samples are misclassified as being fault 5. Upon further examining the causes of the faults from the data set Github website, it can be observed that both faults are caused by variations in the condenser when cooling the inlet water temperature, which explains the projection results. Overall, the proposed method delivers superior visualizations and effectively separates different faults.

Overall, in the TE scenario, SUMAP-LA performs exceptionally well across various settings, with an average F1 score exceeding 95%. The visualization results derived from different cases demonstrate that the improved SUMAP approach effectively utilizes label information acquired from the training data, enhancing the low-dimensional projection process and creating well-separated envelopes. Additionally, during online projection, the label allocation mechanism based on CBLA provides preassigned labels for the online samples, guiding their low-dimensional projection and ultimately improving both the visualization quality and fault classification performance of the model.

4.2 CCR process

Although the TE process data set provides abundant experimental data for validating the proposed process monitoring methods and demonstrating their adaptability across different fault scenarios, the data were obtained through simulations implemented under a simplified process structure. In contrast, the CCR process is a key process in the petrochemical industry for producing gasoline and aromatics. It involves a more complex workflow and multiple stages, which present greater challenges in terms of detecting faults and maintaining system stability. Given this, conducting a case study involving the CCR process can further validates the applicability of the proposed method in complex, real-world chemical processes. The simplified naphtha CCR process consists of four continuous reactors, four heaters, one catalyst regeneration unit, one stabilizer, and other utility units (refer to Fig. S6, cf. ESM). The feedstock is a mixture of pretreated materials, crude oil, and heavy naphtha produced by hydrocracking, with boiling points ranging between 30 and 200 °C. The entire reaction process is maintained at approximately 525 °C, and the feedstock is required to pass through multiple reactors and heaters to ensure a complete reaction. Finally, the hydrogen and reformate products are separated via a distillation tower. Throughout the process, the catalyst regeneration unit maintains the activity of the catalyst, ensuring long-term, high-yield system operations [47,48].

In this section, the research data are obtained from the catalytic reforming unit of a refinery, with data sampled once every 30 min. From the collected data,

m = 18

key variables are selected, forming a data set

X ∈ R n × m

containing n = 3000 samples. These samples consist of 1000 normal samples and four types of fault samples. Each fault type contains 500 samples, which represent unknown faults, step faults, drift faults, and random faults. A complex coupling relationship is observed between the data variables, which makes actual visual process monitoring highly difficult.

4.2.1 Exploration of the hyperparameter of SUMAP-LA

To gain a deeper understanding of the proposed method, we conduct a detailed study of the hyperparameter λ and the significance level α. λ is calculated according to Eq. (18), with a value range of [0–1]; it measures the similarity between the data distributions of two categories. This parameter directly affects the calculation of the distance metric d_s in both the offline and online stages, thereby influencing the construction of the adjacency graph. Larger values of λ amplify the influence of class information, ultimately leading to greater separation between data belonging to different classes. When λ is set to zero, the effect of class information is completely ignored. Since the parameter is calculated based on JS divergence, its value dynamically changes according to class distribution differences. When two classes are close to each other, the expected value is smaller, thereby reducing the influence of class labels and decreasing the ICD. This strategy better aligns with the actual distribution of the data. In Fig. S7 (cf. ESM), the variations exhibited by the ICD and ICC metrics produced for the low-dimensional embedding under different λ values are demonstrated using the CCR data set, and these variations are consistent with our expectations.

The parameter α = (α_b, α_a), α_a > α_b is another important factor, as it influences the types of labels assigned by the CBLA by controlling the upper and lower limits of the threshold, thereby affecting the final performance. Specifically, α_a affects the lower limit of the threshold. The smaller the value of α_a is, the better the recognition ability of the model for unknown classes, but it also becomes more sensitive to noise. In practice, a can be adjusted based on the amount of noise contained in the input data. On the other hand,

α b

determines the upper limit of the threshold. The larger its value is, the smaller the upper limit, resulting in more labels being assigned as one-hot labels. Typically, when the predictions obtained for the given data are sufficiently accurate, this value can be increased. In the supplementary information, Fig. S8 (cf. ESM) illustrates the changes exhibited by the

A C C K N N

metric on the CCR data set under different α combinations. Notably, the best performance is achieved when α = (0.1, 0.9); thus, this setting is used in this case study.

4.2.2 Comparative experiments on the CCR case

To evaluate the ability of the tested methods to handle out-of-sample issues, all data except for the unknown fault samples are divided into training and testing sets at a 1:1 ratio. The unknown fault samples are directly used for online testing to verify the performance of the methods when addressing unknown data. For comparison purposes, the VWFDA method, which performs well in the TE case, is selected, and all data sets are standardized. Each ELM model in VWFDA is trained 20 times, with the best-performing model (mean squared error < 1) chosen. For SUMAP-LA, the Laplacian kernel function is selected, with its bandwidth set to 1/18 and the number of neighbors n = 30. The significance level is set to (0.1, 0.9), and min_dist is set to 0.1. For the VWFDA method, the perplexity of t-SNE is adjusted to 40, the learning rate is set to 50, and the remaining parameters remain unchanged.

The detailed metrics produced by both methods in the CCR case are presented in Table S5 (cf. ESM). The experimental results indicate that the proposed SUMAP-LA method outperforms VWFDA, with all of its metrics exceeding those of the VWFDA method. Figure S9 (cf. ESM) further presents the normalized confusion matrices produced for both methods, providing more detailed information. An examination of the classification results obtained for each category reveals that SUMAP-LA results in only a few misclassifications between normal data and drift faults. In contrast, the VWFDA method performs poorly on three fault types, with many step and drift faults misclassified as random faults. This could be due to the occurrence of variable changes in the random faults that resemble the characteristics of other fault types.

Fig.6(a) and Fig.6(b) show the visualization results produced for the VWFDA method. Despite the ELM model being trained multiple times, the online visualization results are still suboptimal, with many points exhibiting significant degrees of drift. Additionally, the fault regions of the three fault types overlap, resulting in poor visualization performance. The unknown faults (blue sample points) are projected into several different regions, likely due to weighting strategy differences. Some unknown points are projected into the normal region, which can interfere with the normal visualization monitoring procedure. In contrast, Fig.6(c) and 6(d) display the 2D visualization projections produced for the SUMAP-LA method. Both the training and testing results show that different types of samples are effectively classified, forming well-separated envelopes, with the samples of different categories being very compact. Upon a careful examination of the local zoomed-in results obtained for each type of fault, the visualization reflects the characteristic information of the faults. For example, the migration fault exhibits a gradual transition process, whereas the random fault shows a uniform distribution in all directions. However, the step fault shows two clusters, which may be due to its small number of neighbors, leading to the formation of false clusters. Therefore, it is recommended to set n close to the number of cluster samples when performing visualization. However, this results in a significant increase in the model execution time, so a tradeoff needs to be made. In Fig.6(d), the blue sample points represent the projection area produced for unknown faults during the online phase. Since the lower bound of the threshold is set, some samples that differ significantly from the training data are classified as unknown data. These samples are initialized into unrelated regions via Algorithm 2, which prevents interference with the previously classified regions while alerting the operator to the occurrence of an unknown fault. Finally, a comparison between the offline training times of the two methods is provided. As shown in Table S6 (cf. ESM), SUMAP-LA demonstrates higher efficiency. While the calculation of the kernel functions increases the complexity of the UMAP method, its speed advantage makes SUMAP-LA the more efficient approach among the two methods. Importantly, the training time required for the neural networks contained in the VWFDA method is not included in this comparison. In contrast, SUMAP-LA does not require additional neural network training. Once the offline data training process is complete, new data can be directly embedded into the previously trained space with high stability.

4.2.3 Ablation study of SUMAP-LA in the CCR case

To illustrate the role of each component in the proposed method, an ablation study is conducted based on the CCR process. This analysis gradually removes each component to demonstrate its necessity and effectiveness. The complete SUMAP-LA method serves as the baseline model, and the indicators proposed in Section 3.3 are used for evaluation purposes. Detailed information about the components can be found in Table S7 (cf. ESM).

Based on the aforementioned components, multiple models are constructed and trained on the same CCR data set. The related parameter settings remain consistent with those in Section 4.2.2. Model 1 serves as the baseline model, utilizing the parameters described in Section 4.2.2. Models 2 and 3 are derived from Model 1 with the same parameters but with specific components removed. Model 2 excludes the discriminative feature extraction component. Specifically, the original data are directly applied to the SUMAP method for training, with online labels obtained by inputting the raw data into a Bayesian classifier and assigned according to a predefined rule. Model 3 removes the CBLA component. To ensure the proper functioning of Model 3, the similarity of new sample points is calculated via the Euclidean distance measure. Importantly, unknown fault data are not included during training. The final experimental results are presented in Tab.2, where

μ M

measures the visualization performance of the models. Model 1, as the baseline model, achieves the best performance. When Component 1 is removed, the performance declines significantly, indicating that extracting FDA features in the kernel space helps improve the final results. Model 3, which does not use predicted labels but instead computes similarity directly on the basis of Euclidean distance, also results in a performance drop, suggesting that incorporating the predicted labels to project online samples results in more accurate classification effects. A comparison between the

μ M

values of Models 1 and 3 demonstrates that Component 3 enhances the quality of the low-dimensional visualization. Overall, each component contributes to the improvement exhibited by the final results.

5 Conclusions

This paper proposes a novel SUMAP-LA visual monitoring method that leverages category information to increase both its accuracy and visual effectiveness. The method consists of two components: SUMAP, which incorporates category information into UMAP to improve the manifold learning and online projection procedure, and CBLA, which combines Bayesian rules and KFDA to flexibly assign labels for handling different types of data. Case studies conducted on TE and CCR data sets demonstrate the superior accuracy and visual separation of the proposed method relative to the compared state-of-the-art approaches, as well as its ability to handle unknown data. Future work will focus on updating the model dynamically through incremental learning, further enhancing its adaptability to unknown data.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Wu P , Yu G , Han Y , Ma B . Cross-domain acoustic diagnosis method of rotating machinery based on vibration and acoustic migration. IEEE Transactions on Reliability, 2025, 1–15

[2]	Choi Y , Bhadriaju B , Cho H , Lim J , Han I S , Moon I , Kwon J S I , Kim J . Data-driven modeling of multimode chemical process: validation with a real-world distillation column. Chemical Engineering Journal, 2023, 457: 141025

[3]	Wang Z , Gong R , Song L , He S , Gao Y . A data-driven monitoring scheme for multivariate multimodal data. Computers & Industrial Engineering, 2024, 192: 110186

[4]	Lu W , Yan X . Visualizing high-dimensional industrial process based on deep reinforced discriminant features and a stacked supervised t-distributed stochastic neighbor embedding network. Expert Systems with Applications, 2021, 186: 115389

[5]	Li Z , Ying Y , Yang M , Zhao L , Zhao L , Du W . Monitoring and path optimization of catalytic reformer in a refinery: principal component analysis and A* algorithm application. Expert Systems with Applications, 2022, 209: 118358

[6]	Zheng J , Ye L , Ge Z . Semi-supervised process monitoring based on self-training PCA model. Process Safety and Environmental Protection, 2024, 187: 1311–1321

[7]	Palla G L P , Pani A K . Independent component analysis application for fault detection in process industries: literature review and an application case study for fault detection in multiphase flow systems. Measurement, 2023, 209: 112504

[8]	Wang B , Pan H , Yang W . Robust bearing degradation assessment method based on improved CVA. IET Science, Measurement & Technology, 2017, 11(5): 637–645

[9]	Jiang Q , Gao F , Yi H , Yan X . Multivariate statistical monitoring of key operation units of batch processes based on time-slice CCA. IEEE Transactions on Control Systems Technology, 2018, 27(3): 1368–1375

[10]	Lu W , Yan X . Balanced multiple weighted linear discriminant analysis and its application to visual process monitoring. Chinese Journal of Chemical Engineering, 2021, 36: 128–137

[11]	Zhang W , Gao S , He X . An improved LLE-based cluster security approach for nonlinear system fault diagnosis. Cluster Computing, 2019, 22(S3): 5663–5673

[12]	Rosman G , Bronstein M M , Bronstein A M , Kimmel R . Nonlinear dimensionality reduction by topologically constrained isometric embedding. International Journal of Computer Vision, 2010, 89(1): 56–68

[13]	Arena P , Patanè L , Spinosa A G . Data-based analysis of Laplacian Eigenmaps for manifold reduction in supervised liquid state classifiers. Information Sciences, 2019, 478: 28–39

[14]	Shirani Faradonbeh R , Shaffiee Haghshenas S , Taheri A , Mikaeil R . Application of self-organizing map and fuzzy c-mean techniques for rockburst clustering in deep underground projects. Neural Computing & Applications, 2020, 32(12): 8545–8559

[15]	Zhu J , Mahalec V , Fan C , Yang M , Qian F . Multiple input self-organizing-map ResNet model for optimization of petroleum refinery conversion units. Frontiers of Chemical Science and Engineering, 2023, 17(6): 759–771

[16]	Wang X , Li J , Zheng Y , Li J . Smart systems engineering contributing to an intelligent carbon-neutral future: opportunities, challenges, and prospects. Frontiers of Chemical Science and Engineering, 2022, 16(6): 1023–1029

[17]	Van der Maaten L , Hinton G . Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(11): 2579–2605

[18]	LiukkonenMHiltunenYLaaksoI. Advanced monitoring and diagnosis of industrial processes. EUROSIM'13: Proceedings of the 2013 8th EUROSIM Congress on Modelling and Simulation. USA: IEEE Computer Society, 2013, 112–117

[19]	Wang W , Yu Z , Ding W , Jing Q . Deep discriminative feature learning based on classification-enhanced neural networks for visual process monitoring. Journal of the Taiwan Institute of Chemical Engineers, 2024, 156: 105384

[20]	Tang J , Yan X . Neural network modeling relationship between inputs and state mapping plane obtained by FDA-t-SNE for visual industrial process monitoring. Applied Soft Computing, 2017, 60: 577–590

[21]	Lu W , Yan X . Variable-weighted FDA combined with t-SNE and multiple extreme learning machines for visual industrial process monitoring. ISA Transactions, 2022, 122: 163–171

[22]	Chen X , Yan X . Using improved self-organizing map for fault diagnosis in chemical industry process. Chemical Engineering Research & Design, 2012, 90(12): 2262–2277

[23]	Song Y , Jiang Q , Yan X . Fault diagnosis and process monitoring using a statistical pattern framework based on a self-organizing map. Journal of Central South University, 2015, 22(2): 601–609

[24]	Benatia M A , Chabane A N , Sahnoun M , Bettayeb B . Fault diagnosis using deep neural networks for industrial alarm sequence clustering. Applied Intelligence, 2025, 55(3): 220

[25]	Lu W , Yan X . Deep double supervised embedding neural network enhancing class separation for visual high-dimensional industrial process monitoring. IEEE Transactions on Industrial Informatics, 2020, 17(9): 6357–6367

[26]	Joswiak M , Peng Y , Castillo I , Chiang L H . Dimensionality reduction for visualizing industrial chemical process data. Control Engineering Practice, 2019, 93: 104189

[27]	Zhu Z B , Song Z H . A novel fault diagnosis system using pattern classification on kernel FDA subspace. Expert Systems with Applications, 2011, 38(6): 6895–6905

[28]	McInnesLHealyJMelvilleJ. Umap: uniform manifold approximation and projection for dimension reduction. 2018 arXiv preprint arXiv:1802.03426

[29]	Chang T , Liu T , Ma X , Wu Q , Wang X , Cheng J , Wei W , Zhang F , Liu H . Fault detection in industrial wastewater treatment processes using manifold learning and support vector data description. Industrial & Engineering Chemistry Research, 2024, 63(35): 15562–15574

[30]	Hozumi Y , Wang R , Yin C , Wei G W . UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets. Computers in Biology and Medicine, 2021, 131: 104264

[31]	Becht E , McInnes L , Healy J , Dutertre C A , Kwok I W H , Ng L G , Ginhoux F , Newell E W . Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology, 2019, 37(1): 38–44

[32]	Kuzmanovski I , Dimitrovska-Lazova S , Aleksovska S . Classification of perovskites with supervised self-organizing maps. Analytica Chimica Acta, 2007, 595(1–2): 182–189

[33]	Zheng J W , Qiu H , Jiang Y B , Wang W L . Discriminative stochastic neighbor embedding analysis method. Computer-Aided Design & Computer Graphics, 2012, 24(11): 1477–1484

[34]	Meng L , van den Elzen S , Pezzotti N , Vilanova A . Class-constrained t-SNE: combining data features and class probabilities. IEEE Transactions on Visualization and Computer Graphics, 2023, 30(1): 164–174

[35]	Yang J , Jin Z , Yang J , Zhang D , Frangi A F . Essence of kernel Fisher discriminant: KPCA plus LDA. Pattern Recognition, 2004, 37(10): 2097–2100

[36]	McInnes L , Healy J , Saul N , Großberger L . UMAP: uniform nanifold approximation and projection. Journal of Open Source Software, 2018, 3(29): 861

[37]	Lou C , Atoui M A , Li X . Novel online discriminant analysis based schemes to deal with observations from known and new classes: application to industrial systems. Engineering Applications of Artificial Intelligence, 2022, 111: 104811

[38]	Ding S , Zhang P , Ding E , Naik A , Deng P , Gui W . On the application of PCA technique to fault diagnosis. Tsinghua Science and Technology, 2010, 15(2): 138–144

[39]	Gamby A N , Katajainen J . Convex-hull algorithms: implementation, testing, and experimentation. Algorithms, 2018, 11(12): 195

[40]	McInnes L , Healy J , Astels S . hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2017, 2(11): 205

[41]	Downs J J , Vogel E F . A plant-wide industrial process control problem. Computers & Chemical Engineering, 1993, 17(3): 245–255

[42]	Li T , Han Y , Hu X , Ma B , Geng Z . Twofold weighted-based statistical feature KECA for nonlinear industrial process fault diagnosis. IEEE Transactions on Automation Science and Engineering, 2024, 22: 3901–3910

[43]	Li T , Han Y , Wang Y , Geng Z . A self-attention mechanism integrating adaptive double subspace for fault detection in industrial processes. IEEE Transactions on Systems, Man, and Cybernetics. Systems, 2024, 55(1): 540–549

[44]	Chen Y , Bai H , Li S , Zhou X . Dynamic non-gaussian and nonlinear industrial process monitoring using deep analysis of hybrid characteristics. IEEE Transactions on Instrumentation and Measurement, 2025, 74: 1–12

[45]	Peng X , Tang Y , Du W , Qian F . Performance monitoring of non-gaussian chemical processes with modes-switching using globality-locality preserving projection. Frontiers of Chemical Science and Engineering, 2017, 11(3): 429–439

[46]	Wu P , Zhang X , He J , Lou S , Gao J . Locality preserving randomized canonical correlation analysis for real-time nonlinear process monitoring. Process Safety and Environmental Protection, 2021, 147: 1088–1100

[47]	Li Z , Xue K , Chen J , Peng X . A quality-driven multi-attribute channel hybrid neural network for soft sensing in refining processes. Measurement, 2025, 250: 117061

[48]	Wei M , Yang M , Qian F , Du W , He W , Zhong W . Dynamic modeling and economic model predictive control with production mode switching for an industrial catalytic naphtha reforming process. Industrial & Engineering Chemistry Research, 2017, 56(31): 8961–8971