Cationic organobismuth complex as an effective catalyst for conversion of CO<sub>2</sub> into cyclic carbonates

Xiaowen ZHANG; Weili DAI; Shuangfeng YIN; Shenglian LUO; Chak-Tong AU

doi:10.1007/s11783-008-0068-y

PDF(383 KB)

Front. Environ. Sci. Eng. ›› 2009, Vol. 3 ›› Issue (1) : 32-37. DOI: 10.1007/s11783-008-0068-y

Research article

Cationic organobismuth complex as an effective catalyst for conversion of CO₂ into cyclic carbonates

Author information +

History +

Abstract

In order to achieve high-efficiency conversion of CO₂ into valuable chemicals, and to exploit new applications of organobismuth compounds, cationic organobismuth complex with 5,6,7,12-tetrahydrodibenz[c,f]1^,5 azabismocine framework was examined for the first time for the coupling of CO₂ into cyclic carbonates, using terminal epoxides as substrates and tetrabutylammonium halide as co-catalyst in a solvent-free environment under mild conditions. It is shown that the catalyst exhibited high activity and selectivity for the coupling reaction of CO₂ with a wide range of terminal epoxide. The selectivity of propylene carbonates could reach 100%, and the maximum turnover frequency was up to 10740 h^-1 at 120°C and 3 MPa CO₂ pressure when tetrabutylammonium iodide was used as co-catalyst. Moreover, the catalyst is environment friendly, resistant to air and water, and can be readily reused and recycled without any loss of activity, demonstrating a potential in industrial application.

Keywords

cationic organobismuth complex / terminal epoxide / carbon dioxide / coupling / cyclic carbonate

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xiaowen ZHANG, Weili DAI, Shuangfeng YIN, Shenglian LUO, Chak-Tong AU. Cationic organobismuth complex as an effective catalyst for conversion of CO₂ into cyclic carbonates. Front Envir Sci Eng Chin, 2009, 3(1): 32‒37 https://doi.org/10.1007/s11783-008-0068-y

1 1 Introduction

Wastewater treatment and reclamation are essential strategies for alleviating water crisis caused by water scarcity and pollution. The membrane bioreactor (MBR) technology, a combination of membrane separation and biological treatment (Yamamoto et al., 1989), has been widely used in recent years owing to its advantages of excellent and stable effluent quality, small footprint, and low residual sludge production (Xiao et al., 2014; Krzeminski et al., 2017; Xiao et al., 2019; Qu et al., 2022). However, membrane fouling during MBR operation leads to decreased flux, deteriorated separation efficiency, increased energy consumption, and shortened membrane lifespan, limiting the techno-economic sustainability (Xiao et al., 2019; Qu et al., 2022).

Membrane fouling is closely related to membrane properties, mixed liquor properties, and operating conditions. The fouling behavior of membrane materials, such as polyvinylidene fluoride, polyethersulfone, polyethylene, and polyacrylonitrile, varies depending on their hydrophilicity/hydrophobicity, pore structure, and surface roughness, all of which influence the membrane-foulant interaction (Yamato et al., 2006; Zhang et al., 2008). Generally, a hydrophobic membrane is more prone to fouling than a hydrophilic one (Choi et al., 2002). Membrane pore and foulant size play interactive roles in fouling. Foulants of comparable size to pores can tightly clog pores owing to size exclusion (Meireles et al., 1991), whereas small-size foulants can cause adsorptive fouling within the pores (Kawakatsu et al., 1993). A narrower pore size distribution is favorable for maintaining a stable flux (Shimizu et al., 1990; Meireles et al., 1991). The membrane surface roughness influences the fluid dynamics for particle deposition at the micrometer scale and the intermolecular contact for foulant adsorption at the nanometer scale (Xu et al., 2020). Empirically, a smoother membrane surface tends to impede the progress of cake layer formation (Vatanpour et al., 2011; Sadeghi et al., 2013; Panda et al., 2015). Mixed liquor suspended solids (MLSS), extracellular polymeric substances (EPS), and soluble microbial products (SMP) all contribute to membrane fouling in MBRs. MLSS is responsible for overall fouling, especially at concentrations above 10 mg/L. EPS and SMP, on the other hand, are critical contributors to physically irreversible fouling. MLSS, EPS, and SMP are closely related to the operating conditions of MBR such as hydraulic retention time (HRT), sludge retention time (SRT), and food-to-microorganism rate (Huang et al., 2011). For instance, an excessively short HRT or long SRT can lead to high concentrations of MLSS and SMP, which can exacerbate fouling. Nonetheless, unusually low MLSS concentrations can also exacerbate fouling, probably due to the excessive release of EPS (Yoon, 2015).

Fouling control strategies can be mainly categorized into three groups: mixed liquor conditioning, filtration conditions adjustment, and physical/chemical cleaning of the fouled membranes (Meng et al., 2017). Mixed liquor conditioning involves the addition of material such as suspended carriers, particles, coagulants, ozone, and other chemicals (Wu and Huang, 2008; Kurita et al., 2014,2015; Juntawang et al., 2017; Zhang et al., 2017; Zhang et al., 2022). These additives can tailor the properties of SMP and EPS and affect the floc structure (Wu et al., 2006; Wu and Huang, 2008; Juntawang et al., 2017; Zhang et al., 2017). The overall fouling rate and cake/gel layer reversibility are influenced by aeration intensity and filtration/relaxation intervals (Liu et al., 2020b). Aeration or air scouring removes foulants from the membrane surface by increasing the cross-flow shear. Physical cleaning (e.g. tap water rinsing, membrane effluent rinsing, staggered flow rinsing, backwash, and ultrasonic cleaning) can mitigate reversible fouling, whereas chemical cleaning (using acid, alkali, oxidant, chelating agent, and surfactants) can further alleviate irreversible fouling. However, these fouling control measures are mainly taken a posteriori based on observation of fouling phenomena (such as the increase in transmembrane pressure, TMP) that have already occurred, resulting in a certain lag between the fouling and fouling control. Incorrect control measures can reduce effectiveness, waste energy and chemicals, and lead to membrane damage. Therefore, it is essential to apply suitable control measures with accurate timing and dosage. To achieve early warning and timely control of membrane fouling, it is crucial to develop a model to predict the fouling tendency in its infancy.

Data-driven prediction and regulation models provide a new approach for precise control of membrane fouling. Researchers have progressed in fouling prediction using conventional statistical models. For instance, Zhang et al. (2012) established a partial least squares model (R² = 0.84) to predict membrane flux from mixed liquor characteristics, including MLSS/EPS/SMP concentrations, relative hydrophobicity, average particle size, and osmotic pressure. Chen et al. (2022) developed a time series model taking temperature as a covariate and online cleaning events as switching variables to predict the trend of TMP in an anaerobic MBR (AnMBR) across seasons (R² = 0.91). Conventional statistical models can well reflect the relationships between input and output variables. However, these models rely on a priori knowledge of these relationships, which inevitably have the following shortcomings: (a) insufficient fitting accuracy, weak generalization ability, and poor adaptability to fresh samples due to underestimation of the complexity of the real relationships and (b) low computational efficiency for big data analysis and slow convergence when dealing with multiple variables with complex interactions because of the limitations of the model structure. The diversity and complexity of multiple interactive variables are inherent in actual membrane fouling system, making the use of conventional statistical models challenging.

Machine learning is a recent statistical development that complements conventional statistical models. Machine learning has gained increasing attention in environmental engineering as a general artificial intelligence technology (Zhong et al., 2021). Born out of but different from statistical models, machine learning focuses on accurately estimating complex functions using a computer, rather than providing statistical confidence intervals for these functions. Machine learning algorithms automatically “learn” from experience (data) to improve the system performance (Bishop, 2006). As a black-box model, it has a strong fitting ability, good adaptability, and prediction accuracy in dealing with problems with unknown response functions, complex variable relationships, and large amount of data. These advantages provide good prospects for fouling prediction in MBR systems. In recent years, machine learning has been gradually applied to predicting membrane filtration performances, such as flux, resistance, and permeability, and lent new quantitative support to the analysis of fouling mechanisms (Niu et al., 2022).

This study first introduces four common machine learning models, then reviews the application of machine learning models in predicting pollutant removal and fouling performance, and finally analyzes the limitations of existing models to help future researchers develop new machine learning models for improved fouling prediction in MBRs.

2 2 Principles and methods of machine learning

Machine learning constructs a model using “training data” to make predictions or decisions without explicit a priori assumptions or programming. Fig.1(a) and 1(b) depict the general and detailed steps of machine learning, respectively. Popular machine learning models include supervised learning, unsupervised learning, and reinforcement learning (RL), as illustrated in Fig.2. To select an appropriate model, it is necessary to consider several factors, including model principles, problem type (classification, regression, time series, etc.), data volume, feature dimensions, and interpretation requirements. When there are a variety of models to choose, the optimal model can be selected by comparing the model performance, computational costs, and interpretability. Stemming from artificial intelligence, machine learning has found extensive application in the field of environmental science and engineering, enabling the construction of predictive models using diverse data sets, the assessment of feature importance via model interpretation, anomaly detection through historical data comparison, and the advancement of new materials (Zhong et al., 2022).

Fig.1 Machine learning process: (a) general procedure; (b) detailed steps.

Full size|PPT slide

Fig.2 Classification of machine learning methods.

Full size|PPT slide

2.1 2.1 Machine learning methods

This section mainly introduces three common machine learning methods: support vector machine (SVM), artificial neural network (ANN), and decision tree. Additionally, a concise overview of the k-nearest neighbor (KNN) algorithm is provided. These four methods have demonstrated their efficacy in addressing classification problems with discrete variables and regression problems with continuous variables. Tab.1 summarizes the four methods and their respective scopes of application.

Tab.1 Summary and comparison of machine learning methods

Method	Advantage	Disadvantage	Scope of environmental application	Potential application scenarios and examples in MBRs
SVM	• Supervised learning. • Clear mathematical basis. • Robust performance with limited sample size. • Rapid prediction capabilities. • Strong interpretability.	• Susceptibility to missing data. • High computational complexity. • Inappropriateness for large data set.	Classification or regression problems where the sample size is not particularly large.	Prediction of TMP and resistance (Liu et al., 2020a).
ANN	• Supervised learning • Diversified variants and innovations. • Wide range of applicable sample sizes. .• Appropriateness for a variety of input data forms. • Robust nonlinear mapping capability.	• Complexity parameter setting .• Challenge of determining the optimal network structure. • Gradient vanishing and exploding issues.	Classification, regression, time series, and feature self-extraction for various sample sizes.	Prediction of effluent quality and biogas yield (Li et al., 2022); prediction of flux and flux recovery rate (Zhao et al., 2020); identification of fouling type (Shi et al., 2022).
Tree-based model	• Supervised learning .• If-then logical basis. • Strong interpretability.	• Susceptibility to noise or missing data. • Challenge of dealing with exclusive OR logic. • Proneness to overfitting.	Interpretable classification or regression problems where the sample size is not particularly large.	Prediction of effluent quality (Zhuang et al., 2021); prediction of flux (Li et al., 2020).
KNN	• Unsupervised learning. • Insensitivity to outliers. • Appropriateness for multimodal and multi-label classification problems.	• High computational complexity. • Susceptibility to sample imbalance. • Low interpretability.	Primarily applicable to handle numerical and nominal data.	Outlier identification and screening.

2.1.1 2.1.1 Support vector machine

SVM is a binary classification method rooted in statistical learning theory, implementing a structural risk minimization strategy. It accomplishes data transformation from a low-dimensional space to a high-dimensional space by constructing a kernel function. SVM can be utilized for solving both classification (support vector classification, SVC) and regression (support vector regression, SVR) problems by finding the optimal dividing hyperplane that classifies the sample data into different categories with the widest “margin” at the border. The sample points closest to the border determine the position of the hyperplane and are thus called support vectors.

SVM is a suitable method for solving nonlinear mapping problems in high-dimensional space with limited samples, and has certain generalization ability. The input to SVM is typically comprised of a series of data points containing multiple features. The number of features is typically ≥ 2, depending on the modeling problem. When dealing with a small sample size (e.g., a training sample size less than 125 (Qian et al., 2015)), the SVM model generally exhibits superior generalization performance and prediction accuracy. This advantage arises from its relatively low complexity and solid statistical theoretical foundation. However, the computational efficiency decreases when the sample size is large, which may require longer time for the calculation. SVM is susceptible to the presence of missing data. Therefore, it is imperative to screen the data set for outliers and to address the issue of missing values prior to SVM modeling. Furthermore, it is important to consider the interpretability of high-dimensional mapping.

SVM has a variety of applications in water engineering and water environments, including leakage detection in water supply system (McMillan et al., 2024), structural analysis or characterization of pollutants (Zhong and Guan, 2023), spectral analysis of water quality (Mallet et al., 2022), monitoring of reactor operation (Vasilaki et al., 2020), ecological monitoring and control in watersheds (Kim et al., 2021a), and the early warning of water quality pollution event (Oliker and Ostfeld, 2014). SVM has also been used to monitor and assess of air quality (Li et al., 2017) and rapidly estimate soil organic carbon (Li et al., 2015).

2.1.2 2.1.2 Artificial neural network

An ANN comprises of a multitude of fundamental neurons that mimic the learning process of the human brain for solving diverse practical issues. As a representative example of connectionist learning, the ANN is one of the most frequently used models in machine learning. An ANN typically contains input, hidden, and output layers. A layer can have a few to millions of neurons, and the number of neurons determines the complexity of the hidden relationships among the data to be learned. The input layer receives training data, which is nonlinearly varied by one or more hidden layers, and the output layer outputs the nonlinearly transformed valuable data, thereby realizing input-to-output mapping. ANN’s ability to deal with complex nonlinear problems can be increased by increasing the “depth” (by adding more layers), increasing the “width” (by increasing the number of neurons in a single layer), and optimizing the activation function. The ANN can be employed to address classification, regression, and time-series problems. It is suitable for the resolving problems with varying sample sizes through the multiple variants. The input data are multivariate and can be continuous, discrete, or matrix/image data.

Multilayer perceptron (MLP, also known as back-propagation neural network (BPNN)) (Fig.3(a)) and radial basis function neural network (RBFNN) (Fig.3(b)) are the simplest neural networks, typically employed to address regression problems. They can also be employed to address classification problems and time series puzzles. Compared to MLP and RBFNN, deep neural network (DNN) (e.g., convolutional neural network (CNN) (Fig.3(c)), recurrent neural network (RNN) (Fig.3(d)), and graph neural network (GNN)) possess more intricate network architectures and are capable of autonomously extracting features, which reduces the necessity for human intervention and enhances the quality of feature extraction (Zhang et al., 2018). CNNs frequently utilize spatial or image data as inputs to achieve image recognition or spatial feature extraction through convolutional computation (Zeiler and Fergus, 2014; Gao et al., 2019; Kiranyaz et al., 2021). There have been many innovations in CNN, such as GoogLeNet (Szegedy et al., 2015), depth-based CNNs (Szegedy et al., 2016), DenseNet (Huang et al., 2017), wide residual networks (Zagoruyko and Komodakis, 2016), and dual-channel CNN (Ma et al., 2023). RNNs, in particular long short-term memory (LSTM) (Greff et al., 2017), demonstrate certain advantages when input data exhibit temporal characteristics. It is important to note that DNNs present a challenge in terms of increased model complexity and reduced interpretability.

Fig.3 Schematic diagrams of artificial neural networks: (a) MLP, (b) RBFNN, (c) CNN, (d) RNN.

Full size|PPT slide

It should be noted that different neural networks may apply different hidden layer activation functions. The Table S1 in Appendix A provides a summary of the hidden layer activation functions that may be employed. In contrast to the conventional S-type activation function, wavelet neural network (WNN) employs a wavelet function as the hidden layer activation function (Alexandridis and Zapranis, 2013). Compared to the traditional MLP, WNN has several benefits such as faster network convergence, prevention of local optimization, and the ability to conduct local time-frequency analysis.

The application of ANN is pervasive in the environmental field, encompassing a multitude of domains. These include the operation or optimization of membrane process systems (Wang et al., 2024b), wastewater treatment (Al-Ghazawi and Alawneh, 2021), prediction of novel pollutants such as disinfection by-products (Kulkarni and Chellam, 2010), prediction of surface or groundwater quality, biogas production (Liu et al., 2021), and environmental adsorption.

2.1.3 2.1.3 Decision tree and ensemble learning

The decision tree is a machine learning method characterized by a tree-like structure. Three common decision tree algorithms (Table S2 in Appendix A), namely ID3, C4.5 (Quinlan, 1993), and categorical regression tree (CART) (Breiman et al., 1984), use different metrics to divide samples. The single decision tree algorithm is typically applied to solve classification problems, whereas the CART method is suited for solving regression problems. The decision tree algorithm can be considered as a set of if-then rules or a conditional probability distribution defined in both the feature and class spaces. This results in a low model complexity and facilitates good interpretability. However, overfitting may occur, potentially leading to weak generalization. To mitigate this, techniques such as pruning, cross-validation (CV), and ensemble learning can be employed to prevent overfitting.

Ensemble learning is a machine learning algorithm that combines a group of base learners (e.g., decision trees, BPNN, etc.) according to a specific strategy to improve the generalization performance of these base learners. Bagging and boosting are two typical ensemble learning implementation algorithms. Random forest (RF) (Breiman, 2001) is an extension of the bagging method. Adaptive boosting, gradient boosting decision tree (GBDT), and eXtreme gradient boosting (Chen and Guestrin, 2016) are extensions of the boosting method. Ensemble learning offers several advantages over base learner approaches, including high accuracy, good generalization performance, fast training speed, good robustness, minimal feature engineering, and a wide range of application scenarios.

Similar to ANN and SVM, decision trees and their integrated learning methods are widely used. Such as, water quality monitoring or classification (Xu et al., 2021), nano-plastics identification (Xie et al., 2023), pollutant degradation (Zhang et al., 2023), air quality prediction (Chen et al., 2020), groundwater contamination prediction (Bindal and Singh, 2019), source of soil contamination (Zhou and Li, 2024), and ecological and environmental evaluation (Espel et al., 2020). With better interpretability, tree models are employed not only for environmental prediction but also for environmental management and decision-making (Jiang et al., 2021).

2.1.4 2.1.4 k-nearest neighbors

KNN is an unsupervised machine learning model that measures the distance between different feature values as a basis for classification or regression. KNN model is employed to address the classification problem. It involves finding the k-nearest neighbors of the sample to be predicted for regression problem. KNN has obvious advantages, such as high precision, insensitivity to outliers, and no input assumption for sample labels. However, it is important to note that KNN has certain drawbacks, including high computational and space complexity. Hence, KNN is primarily applicable for handling numerical and nominal data.

KNN has a more circumscribed range of applications in the environmental domain than SVM, ANN, and tree models. Nevertheless, it can still be employed to solve a wide range of complex environment problems, including water quality monitoring (Uddin et al., 2023), wastewater treatment control (Xu et al., 2022), adsorption evaluation (Nguyen et al., 2022), air quality prediction (Tella et al., 2021), etc.

2.1.5 2.1.5 Other methods

In addition to supervised and unsupervised learning, RL is another class of machine learning that relies on the optimal policy for mapping states to behaviors based on the interaction between intelligence and environment, with the objective of maximizing the cumulative rewards (Byeon, 2023). The Markov decision process is a fundamental RL framework. RL is a commonly employed methodology for addressing decision-making and control issues. Several studies have employed RL to optimize wastewater treatment control, including the removal of phosphorus (Mohammadi et al., 2024) and the reduction of energy consumption.

The development of large models (also known as foundation models) represents a significant advancement in the field of artificial intelligence. These models are commonly employed in the domains of natural language processing, computer vision, and multimodal problems. Large models are distinguished by large parameter scale, complex computational structure, multitask learning, and emergence capability. ChatGPT is currently one of the hottest models (Ahmed et al., 2024). At present, large models are employed primarily for addressing large-scale or long-term environment problems, like weather forecasting (Bi et al., 2023) and global methane emissions (Rouet-Leduc and Hulbert, 2024), given the necessity for a substantial number of samples to serve as a foundation for training or learning.

Another rapidly evolving field is automated machine learning (AutoML), which aims to automate the process of building machine learning. A series of automated procedures, including data processing, feature engineering, model/algorithm selection, hypermeter optimization, and model evaluation, have been established to minimize the need for intervention by model developers and enhancing the quality of models (Salehin et al., 2024). AutoML is a valuable tool in computer vision and natural language processing and has been employed in the environmental sector to predict the potential energy surfaces (Abbott et al., 2019) and water quality (Senthil Kumar et al., 2024).

2.1.6 2.1.6 Related algorithms

Fuzzy logic and Monte Carlo methods are frequently utilized algorithms in machine learning. Fuzzy logic represents a multi-valued logic approach to address such uncertainty and imprecise information through simulation of human thinking (Zadeh, 2023). Fuzzy logic finds extensive application in control systems (Castillo et al., 2008). The combination of fuzzy logic and neural networks gives rise to a fuzzy neural network, which leverages pre-existing data to generate an expert knowledge base and predict outcomes through fuzzy logic inference (de Campos Souza, 2020). The Monte Carlo method is a numerical calculation method grounded in probability statistics theory (Raychaudhuri, 2008). The method is based on the “large number theorem” to achieve random approximation by repeatedly sampling a data set and performing randomized tests. Monte Carlo methods have been employed in machine learning, particularly in reinforcement learning. For example, the famous AlphaGo applied Monte Carlo method in reinforcement learning to improve the decision-making ability of neural networks (Silver et al., 2016). The integration of these two algorithms with machine learning enables the resolution of complex and uncertain problems in the context of big data.

2.2 2.2 Model optimization

Machine learning involves the optimization of multiple parameters during the learning process. A straightforward method for model optimization is CV (including k-fold CV, where the check of k has no normal rule), where the training data are first split into k subsets, and then the subsets are assigned to two parts (Browne, 2000): the training part (k-1 subsets) to train the model, and the validation part (the remaining subset outside the training part) to check the error. These two parts of data are used to obtain the model with the least generalization error. CV requires a substantial amount of time throughout repeated re-training and re-verification.

Heuristic intelligent optimization algorithms in the learning process can significantly improve the model performance and convergence rate during the learning process. Intelligent optimization algorithms usually draw inspiration from natural, biological, or physical phenomena. Common intelligent optimization algorithms include genetic algorithms (GA) (Fig.4(a)) (Katoch et al., 2021), particle swarm optimization (PSO) (Fig.4(b)), simulated annealing (SA) (Fig.4(c)) (Suman and Kumar, 2006), artificial bee colony (Karaboga and Basturk, 2007), ant colony optimization (Dorigo et al., 2006), firefly algorithm (FFA) (Yang, 2009), bat algorithm (BA) (Yang, 2010), and gray wolf optimizer (GWO) (Mirjalili et al., 2014). These algorithms have been widely used in various fields to solve optimization problems with high efficiency and accuracy. It is important to note that each algorithm has its own strengths and weaknesses.

Fig.4 Intelligent optimization algorithms: (a) genetic algorithms (GA), (b) particle swarm optimization (PSO), (c) simulated annealing (SA).

Full size|PPT slide

2.3 2.3 Assessment of model performance

A statistical model’s performance can usually be evaluated through indicators, such as the coefficient of determination (R²), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE) (Niu et al., 2022). The receiver operating characteristic (ROC), area under curve (AUC), accuracy, precision, and recall are frequently employed to assess the efficacy of a binary classifier. R², MSE, RMSE, MAPE, ROC, AUC, accuracy, precision, and recall can evaluate the performance of a model, but they are not designed to consider the model complexity (related to the number of unknown parameters).

Information criterion, such as Akaike information criterion (AIC) (Akaike, 1974), Bayesian information criterion (BIC) (Gideon, 1978), and Hannan-Quinn criterion (HQC) (Hannan and Quinn, 1979), can be employed to achieve optimal model selection by balancing the model complexity and model performance. For a large sample size, the model parameter penalty exerted on the three criteria exhibits a gradient from weak to strong as follows: AIC < HQC < BIC (Tu and Xu, 2012). The stronger the penalty, the more inclined it is toward favoring a low-dimensional model. In general, smaller AIC, BIC, or HQC values indicate better model fitting and greater accuracy.

2.4 2.4 Assessment of model interpretation

Variable importance metrics aid researchers in better understanding the data generation process by evaluating the relative importance of independent variables with respect to a dependent variable (Kruskal, 1987).

Common tree model interpretation methods include the Shapley value (Samek, 2020) and TreeExplainer method (Lundberg et al., 2020). However, the interpretability of the model decreases when the decision-making involves multiple trees. Therefore, advanced tree models fall into the category of black boxes (Samek, 2020). The variable importance measure is based on changes in out-of-bag prediction accuracy (such as MSE, accuracy) or Gini index when using that variable (Grömping, 2015).

In models like ANN and SVM, the relative importance of input features can be assessed using the simple R² (Hosseinzadeh et al., 2020). Variable importance can be evaluated through sensitivity analysis (Cortez and Embrechts, 2013). For a neural network, it is possible to calculate the variable importance of input variables to the output using methods based on the connection weights of neurons. Several methods, such as those of Garson (1991), Goh (1995), Gedeon (1997) and Olden (Olden et al., 2004), allow for this assessment. Notably, Gedeon’s and Olden’s methods use the weights of all connections, with Gedeon's method being tailored specially for deep learning.

Recently, explainable artificial intelligence (XAI) has seen application in various scientific fields, including water quality prediction (Madni et al., 2023) and understanding MBR process (Chang et al., 2022). Partial dependence plot, individual conditional expectation, local interpretable model-agnostic explanations, and Shapley additive explanations (SHAP) are suitable for model interpretation (Molnar, 2019). In particular, SHAP has the desirable properties of local accuracy, missingness, and consistency, thus can be used to interpret the model from global, local, and feature interaction perspectives (Aldrees et al., 2024b). These methods can be used to interpret or explain all kinds of machine learning models, including deep learning models.

2.5 2.5 Model selection guide

The selection of an appropriate modeling method is of paramount importance. Unsupervised learning methods are employed when the data set lacks output labels, whereas supervised learning methods are used when the data set contains output labels. The first step is to determine the type of problem. For problems with numerical output labels (i.e., regression or time-series problems), models such as SVR, RF, and ANN can be selected, whereas for problems with discrete or nominal output data (i.e., classification problems), SVC, RF, and ANN are suitable. In cases involving multiple decision-making processes, reinforcement learning may prove an advantageous solution. Subsequently, the magnitude of the data set must be taken into consideration. When the sample size is limited, it has been demonstrated that SVM, RF, and MLP models with relatively simple structures can achieve adequately robust results. The performance of DNN can be optimized when the sample size is large. The characteristics of the input data are also of great importance. LSTM is appropriate for time-series data, CNN is suitable for matrix data with image characteristics, and GNN is suitable for directly processing graph-structured data.

3 3 Application of machine learning in MBRs

3.1 3.1 Overview of application

In recent years, machine learning, an artificial intelligence technology, has gained widespread usage for predicting MBR performances. Since 2015, there has been a noticeable surge in the number of researchers publishing papers on machine learning model in MBR, with a sharp rise after 2018 (Fig. S1 in Appendix A). The statistical analysis of models, features, and optimization provides a more comprehensive reflection of the research progress in MBR performance prediction. Among the reviewed papers, 34.3% focused on predicting pollutant removal and 67.1% were dedicated to predicting membrane fouling.

The characteristics/parameters of the membrane process can be categorized into seven main groups: time parameter (t), conventional concentration indices (CCI) (including chemical oxygen demand (COD), total nitrogen (TN), and total phosphorus (TP) in the influent and effluent, total suspended solids (TSS) in the influent, MLSS in the mixed liquor, and other parameters), membrane filtration indices (MFI) (including TMP, membrane flux, filtration resistance, ΔTMP/Δt, membrane permeability, and other parameters), environment indices (EI) (including temperature (T), dissolved oxygen (DO), pH, oxidation reduction potential (ORP), organic loading rate (OLR), and other parameters), operation indices (OI) (including SRT, HRT, filtration-relaxation ratio, aeration intensity, backwash strength/time, and other parameters), and characteristic foulant indices (CFI) (including concentrations of SMP, loosely-bound EPS, tightly-bound EPS, among others). These characteristics can be used as model inputs, as well as other parameters such as sludge particle size, viscosity, zeta potential, blocking coefficient, membrane pore size, etc. Spectroscopic measurement results, such as spectral data and spectral grayscale maps, may also be used as model inputs to facilitate automatic spectral features extraction. With the aforementioned model inputs, pollutant removal and membrane fouling performances are typically received as the model outputs.

Fig.5(a) shows the models for MBR fouling prediction. For this purpose, CCI (60.4%) and MFI (68.8%) were used as the majority of input characteristics, and membrane flux (52.7%) was selected as the primary indicator of membrane fouling performance. Of the established models, ANN models accounted for 72.9%, with a focus on relatively simple model structures, including MLP at 39.6% and RBFNN at 18.8%. Additionally, DNN and SVM models were utilized, accounting for 8.3% and 18.8%, respectively. For model tuning, over half (56.3%) of the models did not employ any optimization algorithms, 33.3% chose intelligent optimization algorithms, and 8.3% used simple CV to optimize the parameters. Among the intelligent optimization algorithms, GA was predominant at 56.5%, followed by PSO at 18.9%.

Fig.5 Distribution of different machine learning models used for MBR research: (a) membrane fouling prediction models, (b) pollutant removal prediction models.

Full size|PPT slide

The prediction models for MBR pollutant removal are illustrated in Fig.5(b). None of the model inputs employed CFI. Instead, the majority of the models utilized CCI (75.2%), OI (62.5%), and EI (54.2%) as input characteristics. Only 20.8% of the models incorporated MFI into the input characteristics. Regarding the model outputs, 79.2% of the models aimed to predict the crucial carbon-related indices (such as effluent COD and COD removal rate). The nitrogen-related indices (such as the removal rates and effluent concentrations of TN, ammonium nitrogen (NH₃-N), and nitrate nitrogen (NO₃⁻-N)) followed at 58.3%, while a lower percentage of models (16.7%) were targeted at the prediction of trace organic pollutants. In terms of model structure, a significant proportion of the models employed ANN models, with MLP models being the most favored option at 66.7%. Compared with membrane fouling prediction models, pollutant removal prediction models are usually simpler, with fewer models using optimization algorithms.

Figure S2 in Appendix A shows the relationship between R² and the number of parameters of different ANN models, with model’s capacity denoted by the number of parameters. As illustrated in Fig. S2, the stability of the MLP model is inferior, which is potentially influenced by the data set or model parameter settings. The introduction of optimization algorithms, such as GA, leads to overall improvement in model’s performance, potentially resulting from the algorithm optimization or the refinement of the model parameter training process. Conversely, the WNN model shows commendable results with a modest parameter count, which is likely attributed to the construction of a more complex activation function.

In general, machine learning is a prevalent tool in the MBRs with satisfactory performance. It is important to recognize the limitations of an incomplete indicator system, lack of full-scale engineering validation, difficulty of real-time prediction, and lack of process understanding contribution.

3.2 3.2 Machine learning models to predict pollutant removal performances

The application of machine learning models to predict MBRs’ pollutant removal performance is summarized in Table S3 in Appendix A. These applications are mainly based on ANNs. MLP model serves as a foundation for statistical analysis of various models, with a focus on optimizing the hierarchical structure and activation functions. Based on MLP, Kim et al. (2021b) used near-infrared spectroscopy to predict the concentrations of effluent pollutant (COD, TN, NH₃-N, nitrite nitrogen, NO₃⁻-N, and phosphate) as well as SMP and EPS in the mixed liquor with R² > 0.97. The MLP topology for predicting these three types of pollutants/foulants were 5-11-6, 5-9-1, and 5-9-2, respectively.

Some MLP models were specifically focused on predicting the removal of trace organic pollutants, in addition to the detection of routine water quality (Wolf et al., 2001; Wolf et al., 2003). Researchers have attempted to optimize the activation function of the hidden layer using radial basis function (RBF) and wavelet functions. Mirbagheri et al. (2015b) established an RBFNN model with a topology of 5-5-1 to evaluate the performance of a submerged MBR in the treatment of a combined urban and industrial wastewater, using the influent concentrations (biochemical oxygen demand (BOD), COD, NH₃-N, TP), influent total dissolved solids, HRT, volatile MLSS (MLVSS), and mixed liquor pH as input characteristics to predict the effluent concentrations (BOD, COD, NH₃-N, and TP) with R² > 0.98. Cai et al. (2019b) established a WNN model with a topology of 3-2-1 to predict the effluent quality (COD: R² = 0.999; NH₃-N: R² = 0.997) with the influent COD, NH₃-N, and salinity as input characteristics. They also found that the WNN model had better performance than the MLP model in predicting effluent COD and TN (Cai et al., 2019a).

Researchers have also optimized the model hierarchy with structures that are more complex than MLPs, such as CNN, DenseNet, and LSTM. Li et al. (2022) established three DNN models, including fully connected network (FCN), CNN, and DenseNet, to predict the effluent pH, effluent COD, COD removal rate, biogas (CH₄, N₂, and CO₂) yield, and redox potential of an AnMBR by considering the ambient temperature, influent water temperature, influent pH, influent COD, mixed liquor temperature, and membrane flux as input characteristics. The prediction accuracy of the DenseNet model reached 97.4%, whereas those of FCN and CNN were 92.6% and 91.8%, respectively. Yaqub et al. (2020) established an LSTM to predict the removal of TN, TP, and NH₃-N by an anaerobic/anoxic/aerobic-MBR process using the influent water quality (total organic carbon (TOC), TN, TP, COD, NH₃-N, and suspended solids) and operating parameters (DO, ORP, and MLSS) as input characteristics, and the results showed that the model performed the best in predicting NH₃-N removal rate with MSE = 0.0047.

Based on the above analysis, previous studies have predominantly used MLP models to predict the pollutant removal performances. More complex ANN models, such as WNN, CNN, and LSTM are occasionally utilized. In addition to the enhancement of the model structure, some researchers have optimized machine learning models by applying intelligent optimization algorithms (e.g., FFA, PSO, and GWO) to improve the model accuracy and achieve more effective solutions to complex problems (Aldrees et al., 2024a). However, some models did not include MFI as input variable and thus may have underestimated the role of membranes in pollutant removal. Furthermore, with regard to output variables, previous research primarily concentrated on removing C, N, and P, with insufficient attention paid to trace pollutants. ANNs seemed to be less effective in predicting trace organic pollutants due to the diversified behavior of trace pollutants during their degradation process. Optimizing the activation function can improve the performance of ANNs, and tree models may have better performance than ANNs. It is worth noting that complicating the model structure may not straightforwardly improve the performance of ANNs, which may be due to an insufficient amount of data, errors in the measured data, or complex relationship between variables.

3.3 3.3 Machine learning models to predict membrane fouling

Membrane fouling often involves organic, inorganic, biological, and composite fouling. Organic compounds, such as polysaccharides, proteins, and humic substances, contribute to various stages of membrane fouling resulting in reversible or irreversible fouling (Lin et al., 2014; Xu et al., 2020). Pollutant removal in MBRs involves a combination of biological processes and membrane retention, implying that the formation of membrane fouling is inherently linked to pollutant removal. Compared to considering solely pollutant removal, membrane fouling is a complex process with higher nonlinearity between parameters. Such complexity offers enormous potential for the implementation of machine learning models.

3.3.1 3.3.1 ANN for membrane fouling prediction

ANN is a favorable machine learning model with strong nonlinear fitting capabilities. It demonstrated good performance in predicting membrane fouling. The prediction of membrane fouling can be classified into two categories: filtration state prediction (such as flux, TMP, and permeability) and membrane fouling analysis (such as fouling type, flux recovery rate, and membrane interfacial energy). Tab.2 summarizes the ANN models used to predict MBR filtration state. To improve the prediction, some researchers have modified the MLP model by optimizing/training algorithms, changing the hidden layer activation function, and adjusting the model hierarchy. Sensitivity factor analysis was used to interpret the model or identify significant parameters.

Tab.2 Examples of ANN models to predict membrane filtration state in MBRs

Model	Optimization	Hidden layer activation function	Structural features	Input parameter	Output parameter	Training algorithm	Fitting performance	Ref.
ENN			9-55-1	T, SRT, TSS, ODR, TMP, dTMP/dt, Filtration and backwash time	Flux		AD = 2.7%	Geissler et al., 2005
MLP			3-5-1	Backwash time, Operation time, Flux	Flux	LM	R² = 0.99	Aidan et al., 2008
MLP	GA	log-sigmoid		MLSS, TMP, Resistance	Flux	LM	MAPE = 0.0331	Li et al., 2014
MLP	GA	tan-sigmoid	5-10-1	Time, MLSS, COD, SRT, TSS	TMP, Permeability	LM	R² = 0.98R² = 0.98	Mirbagheri et al., 2015b
RBFNN	GA	RBF	5-5-1	Time, MLSS, COD, SRT, TSS	TMP, Permeability	LM	R² = 0.98R² = 0.99	Mirbagheri et al., 2015b
MLP	GA	tan-sigmoid	6-8-1	Flux, Aeration ratio, Concentration of SMP and EPS, initial TMP, Running time	TMP	Bayesian rule	Relative MSE = 0.024	Wang and Wu, 2015
RBFNN	CV	RBF	2-2-1	Aeration volume, TMP	Flux		R² = 0.80	2017
MLP		log-sigmoid	6-5-1	Influent (TN, NO₃⁻-N, TP), Effluent (TN, NO₃⁻-N, TP)	TMP	LM	R² = 0.85	Schmitt et al., 2018
Fuzzy-RBFNN	PSO	log-sigmoid	2-14-49-1	Flux, Membrane flux variation	Flux		MAPE = 0.0287	Tao and Li, 2018
MLP	PSO			Temperature, Flux, TMP, MLSS	Resistance	LM	R² = 0.97	Hamedi et al., 2019
MLP		tan-sigmoid	4-8-1	MLSS, EC, DO, Time	Flux	LM	R² = 0.98	Hosseinzadeh et al., 2020
RBFNN		RBF	1-3-1	Permeate pump pressure	Flux, TMP	LM	R² > 0.90	Abdul Wahab et al.,
MLP		tan-sigmoid	1-5(7)-1	Permeate pump pressure	Flux, TMP	LM	R² > 0.88	Abdul Wahab et al.,
RNN				EC, Flux	EC, Flux		RMSE = 18 mS/cmRMSE = 1.1 LMH	Viet et al., 2021
MLP		tan-sigmoid	4-30-30-14-5-5-5-5-5-5-1	pH, EC, influent TN and NH₃-N	FluxResistance	LM	Flux: R² = 0.88Resistance: R² = 0.86	Viet and Jang, 2021
WNN	BA	Bandelet function	5-12-2	MLSS, Sludge particle size, EPS, SMP, Sludge viscosity, RH, Zeta potential	Flux, Membrane flux recovery rate	Gradient descent method	MAPE = 0.032	Zhao et al., 2020
MLP			3-17-2	MLSS, HRT, Time	Flux, COD removal rate	LM	R² = 0.9996	Hazrati et al., 2017
ANFIS				OLR, Effluent pH, MLSS, MLVSS	TMP	LM	R² = 0.98	Taheri et al., 2021
MLP		log-sigmoid	6-9-1	Time, Flux, influent COD, pH, MLSS, TMP rate of change	Permeability		R² = 0.9985	Yao et al., 2022
MLP		tan-sigmoid	3-9-1	Disc rotational speed, Membrane to disc gap, OLR	Permeability	LM	R² = 0.999	Irfan et al., 2022
MLP	CV	tan-sigmoid	6-6-1	Sludge filterability, MLVSS, pH, influent COD, T, Cleaning cycle	Permeability	BFGS	R² = 0.93	Alkmim et al., 2020

Note: ENN = Elman neural network, ODR = oxygen decay rate; EC = Electrical conductivity, ANFIS = adaptive network-based fuzzy inference system, BFGS = Broyden-Fletcher-Goldfarb-Shanno, RH = relative hydrophobicity.

GA and fuzzy logic have been used in the modified MLP models. Levenberg-Marquardt (LM) is a widely used training algorithm, some researchers have also used the Bayesian rule, gradient descent algorithm, or Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm for model training. Wang and Wu (2015) predicted TMP by inputting flow rate, aeration ratio, initial TMP, running time, and the concentration of characteristic foulants (SMP and EPS) to obtain the jump point of TMP with a relative MSE of 0.024. They established the MLP model with a topology of 6-8-1, optimized the weight and bias of the model by GA algorithm, and trained the model by the Bayesian rule. Additionally, they showed that the performance of MLP with small sample sizes was less stable than traditional mathematical models. Alkmim et al. (2020) established an MLP model with a topology of 6-6-1, trained with the BFGS algorithm, and optimized the parameters by CV. They considered sludge filterability, MLVSS, pH, influent COD, temperature, and cleaning cycle as input characteristics to predict membrane permeability with R² = 0.93. The optimizing/training algorithms seem useful to improve the model. In the meantime, the impact of sample size on machine learning performance in addressing membrane fouling remains a concern.

RBF and wavelet function were used to optimize the hidden layer activation function in the fouling prediction models. Mirbagheri et al. (2015a) used an RBFNN model optimized by GA algorithm to predict TMP and membrane permeability from five input indicators, including operating time, TSS, COD, SRT, and MLSS, with R² > 0.98. Sensitive analysis highlighted the importance of operating time and mixed liquid MLSS as influencing factors. Zhao et al. (2020) used the Bandelet function (approximate to the wavelet function) as the hidden layer activation function to establish the Bandelle neural network, trained the network using the gradient descent method, and introduced the BA to optimize the parameters. They realized the prediction of membrane flux and membrane flux recovery rate using mixed liquid properties such as MLSS, sludge particle size, EPS, SMP, sludge viscosity, relative hydrophobicity, and zeta potential with a relative error of 3.2% on the entire data set.

RNNs are suitable for predicting time-varying membrane fouling because of the advantage in processing sequential data. Elman neural network (ENN) is a rudimentary form of RNN with local memory cells and local feedback connections, and has been used for membrane fouling prediction in early years. Geissler et al. (2005) established an ENN model with a network topology of 9-55-1 to predict membrane flux with an average deviation of 2.7%. Through sensitivity analysis, they revealed that the best membrane backwash condition was high-pressure backwash with short intervals. More recently, Viet et al. (2021) established an RNN model to predict the mixed liquor conductivity and membrane flux of the osmosis membrane bioreactor for 40 d, showing an acceptable RMSE of 18 mS/cm and 1.1 LMH, respectively. The above studies have shown that consideration of time-series development factors can help to better predict membrane fouling, while model interpretation can help to design more optimal operating conditions or process flow. These findings may contribute to refined operation of MBR.

Many researchers using ANN, such as MLP, RBFNN, RNN, and CNN, for membrane fouling analysis. Chen et al. (2012) developed an MLP model that used the bioprocess aeration capacity, membrane aeration capacity, mixed liquor recirculation flowrate, and membrane flux as input variables to predict the energy consumption per unit water production in a full-scale MBR with the R² exceeding 0.55. Zhao et al. (2019) established an RBFNN model to quantify the MBR membrane interfacial energy by considering the contact angles of water/glycerol/diiodomethane on sludge/membrane surface, the zeta potentials of sludge/membrane surface, and the distance between particulate sludge and membrane surface as input variables. The computation time required by RBFNN was only approximately 1/50 that of the analytically extended Derjaguin-Laudau-Verwey-Overbeek method. Shi et al. (2022) established a CNN model based on the attention mechanism. The model utilized a processed grayscale image set as input to classify membrane fouling with a diagnostic accuracy of 98%. Beside supervised learning methods, unsupervised learning methods were also used to analyze membrane fouling for operational process optimization (Woo et al., 2022). Although online monitoring is not available for most model features in the above applications, these models still possess strong analytical capabilities. These applications offer new insights for researchers to analyze mechanisms of membrane fouling and provide a modeling basis for targeted control of membrane fouling.

3.3.2 3.3.2 Other model applications

Other models, such as SVM (including SVR), least squares SVM (LSSVM), RF, limit GBDT, and GBDT have also been applied for membrane fouling prediction in MBRs.

Table S4 in Appendix A presents examples of using SVM or tree-based models to predict membrane fouling. Although the application of these models is not as widespread as ANN, they still demonstrate good predictive abilities in various scenarios. Hamedi et al. (2019) established an LSSVM to simplify the optimization of a linear equation system. MLSS, TMP, flux, and temperature were selected as input parameters to predict the filtration resistance. LSSVM model outperformed both the PSO-MLP model (R² = 0.96) and the gene expression programming model (R² = 0.98), achieving an R² of 0.99. Li et al. (2020) developed an RF model with 300 trees and 2 node variables. MLSS, TMP, and membrane resistance were selected as the main input features, as pre-evaluated via principal component analysis (PCA), to predict the membrane flux. The results showed that the RF model (R² = 0.95) outperformed the SVM model (R² = 0.92) and the MLP model (R² = 0.89). As seen from this example, the RF models may be more suitable than SVM and MLP models for predicting membrane fouling. The use of PCA for feature selection in these applications indicates the possibility of multicollinearity between factors that affect membrane fouling. Although the factors contributing to membrane fouling are complex, a few representative indicators can be selected for online monitoring and prediction.

4 4 Tutorial example

4.1 4.1 Method

According to the above literature review, BPNN, SVM, and RF have emerged as prevalent machine learning models, among which the BP and SVM models have been most widely used in MBRs. The GA is the predominant algorithm used for model optimization. In addition, introducing of long short-term memory mechanism in LSTM (as a variant of RNN) has demonstrated significant promise in addressing time issues. Given the complex nature of fouling formation compared to pollutant removal, the real-time prediction of membrane fouling proved is crucial, particularly considering the dynamic and time-sensitive interactions between foulants and membranes.

This tutorial aims to apply five typical machine learning methods (SVM, RF, BPNN, LSTM, and GA-BP) to predict membrane fouling in MBR systems. A virtual data set (see S Appendix B for the datasheet) was created for the machine learning practice of this example which is conducted on the MATLAB platform.

4.1.1 4.1.1 Data preprocessing

One should first specify the independent variables (as input characteristics) and the dependent variables (as output characteristics) from the raw data set. In this example, TMP was selected as the target output to characterize the fouling state. The possibly important influencing factors for fouling were selected as input characteristics, according to the common understanding and preliminary investigations. In this example, variables such as temperature, MLSS, pH, DO in aerobic zone, influent water quality (COD, TN, and TP), and membrane flux were selected as input characteristics. To eliminate the adverse effects caused by singular data and expedite convergence, “mapminmax” function in MATLAB was applied to normalize the input and output data within the range of [0,1]. Subsequently, 70% of the sample data was randomly selected as the training set, and the remaining 30% for the test set.

4.1.2 4.1.2 Selection of model and policy

The MATLAB software was used for the modeling and programming (see Appendix C for the exemplary codes). For the SVM modeling, the LibSVM toolkit was employed for regression prediction, with the RBF kernel chosen as the kernel function. The parameters C (penalty coefficient, representing the tolerance for error) and G (a parameter for the RBF kernel function, calculated as G = 1/(2σ_RBF²), which implicitly determines the distribution of the data mapped to the new feature space) are optimized through CV. To construct BPNN, LSTM, and RF models, the neural network toolbox and random forest toolbox were selected from the MATLAB packages. The backpropagation training algorithm was employed for neural network model development. For GA-BP, a hybrid model of GA and BPNN algorithm was established, which involves a three-step process of selection, crossover, and mutation to globally optimize the weight and bias of the neural network.

4.1.3 4.1.3 Model evaluation

The reliability and accuracy of each model were evaluated by three statistical parameters: R², RMSE, and MAPE. In general, the MAPE and RMSE values being closer to 0 (or R² closer to 1) suggest a more accurate prediction and better model performance.

4.2 4.2 Results and discussion

4.2.1 4.2.1 Presentation of raw data

Table S5 displays the mean, standard deviation, and maximum and minimum values of the input and output data. The input variables consisted of T, MLSS, pH, aerobic-zone DO, influent COD, influent TN, influent TP, and flux. The output variable was TMP. Machine learning models establish a mapping relationship between input and output variables to predict fouling. The raw data set comprises 2000 samples, with 70% randomly selected for training and the remaining 30% for testing.

4.2.2 4.2.2 Model performance

The set of model parameters is presented in Section 1 of Appendix A. Tab.3 compares the prediction results of the SVM, RF, BPNN, LSTM, and GA-BP methods. All five models demonstrated good fitting ability for the full data set. The SVM, BPNN, LSTM, and GA-BP performed similarly on the training and testing sets, whereas RF showing good fitting ability but weaker generalized predictability. Although RFs do not have the problem of overfitting (Peter et al., 1998), their lack of generalization ability may originate from the correlation and redundancy among the independent variables caused by randomly drawn features (Wu et al., 2012). The generalization ability of the model can be improved by increasing the number of trees, performing feature selection, and optimizing the pruning algorithm (Yang et al., 2012). The LSTM has a slightly better predictive ability than BPNN. The GA-BP improved the prediction compared with the BPNN.

Tab.3 Performance of different machine learning models for the tutorial example

		SVM	RF	BPNN	LSTM	GA-BP
R²	Training	0.8208	0.9017	0.8199	0.8206	0.8201
	Testing	0.8124	0.7344	0.8096	0.8175	0.8128
	All Data	0.8184	0.8537	0.8170	0.8197	0.8180
RMSE	Training	1.4075	1.0424	1.4107	1.4080	1.4100
	Testing	1.3955	1.6605	1.4058	1.3765	1.3940
	All Data	1.4039	1.2601	1.4092	1.3987	1.4052
MAPE	Training	0.0616	0.0463	0.0629	0.0630	0.0628
	Testing	0.0624	0.0737	0.0636	0.0619	0.0625
	All Data	0.0618	0.0545	0.0631	0.0626	0.0627

5 5 Summary and prospect

Machine learning models, including ANNs, SVMs, and decision trees, have been used to predict membrane filtration performances in MBRs. Several models have been reported to exhibit fairly good fitting performance and generalization ability. The model parameters can be optimized using intelligent optimization algorithms such as GA and PSO to enhance predictability. The application of SHAP to the interpretation of MBR predication models has developed rapidly (Aldrees et al., 2024a; Niu et al., 2024). However, the present models are facing challenges in terms of: (a) inadequate input features and monitoring indices, (b) limited interpretability and generalizability of model prediction, and (c) lacked utilization in automated process control. This calls for further advancements in the modeling techniques and deeper integration of modeling into the physical world (e.g., monitoring and control systems). Therefore, we recommend exploring the constraints and enhancements of machine learning in MBR engineering validation applications at four levels: constitution of model features, online monitoring of model features, application toward process control, and data sharing and generalization.

(1) The model input parameters can be extended to achieve a more accurate and comprehensive description of key features. Compared to the conventional rough indices to describe overall membrane fouling by current models, specific indicators can be refined to more accurately describe fouling behavior (e.g., fouling potential for specific fouling stages) and foulant properties (e.g., polysaccharide/protein/humus concentrations and molecular characteristics). Another requirement is to extend the current index system toward a more complete coverage of possible factors. Previous models for pollutant removal efficiency have paid less attention to the effects of membrane operating state (TMP, flux, permeability, MFI, etc.) and the resultant pollutant interception by the membrane. In fouling prediction models, MFI and CCI have been frequently utilized as input parameters, whereas OI, which is related to membrane backwash and scouring, has not been adequately included. Traditional concentration parameters can be integrated with membrane status and operating conditions in the ensuing models. The inclusion of these more accurate and complete monitoring indices as input features would thus facilitate the machine learning for MBR processes.

(2) Online monitoring of the key parameters is crucial in developing “smart” models that are timely responsive to real-time operating status of MBRs. The conventional online monitoring items include temperature, pH, DO, turbidity, and COD, etc., which are inadequate to provide accurate details of pollutants/foulants for the modeling. For instance, COD can only reflect the overall concentration of organic matter, without revealing details of chemical composition and molecular structure. Elaborate measurements of these properties are often laborious, time-consuming, and unsuitable for online monitoring. Alternatively, spectroscopic methods (such as ultraviolet, visible, and fluorescence spectroscopy) offer new possibilities for real-time reflection of molecular details. These techniques are fast, sensitive, and informative for exploration of molecular fingerprints, and are promising supplements to the online monitoring system. Combining online spectral indicators with membrane operating status and control conditions is beneficial for developing models that are really implementable for process control.

(3) Feedforward models are required to support the early warning and proactive control of MBR processes. For optimized operation of MBRs in terms of pollutant removal and fouling mitigation, preventive actions should be taken in advance rather than reacting to adverse events. However, current MBRs lack feedforward control. Although most models have demonstrated fairly good performance in simulating existing facts, their capability to predict future tendencies remains insufficient. Owing to incomplete input features and lack of prediction models, intelligent feedforward fouling control has not yet been realized. In the ensuing modeling, attention should be paid to the tendency indices (e.g., fouling potential) as the model output, or incorporate time series concepts to forecast future performances from current operation status. Moreover, the model results should be coupled with an automatic control system to make them practical.

(4) A widely shared database can be constructed to enhance the interpretability and generalizability of the models. The interpretability of “black box” models remains a longstanding issue, and the complexity of physical/chemical/biological processes in MBR systems poses new requirements and obstacles for the interpretability of the predictive models. Currently, the application of model interpretation and deep learning models in MBR prediction remains limited. Various methods, such as decision rule analysis, correlation coefficient comparison, sensitivity factor analysis, and SHAP, can help explain the model, but have not been sufficiently used. The incompleteness of input features makes the interpretation of the model toward its real physical meaning even harder. To support a robust deep learning model with sufficient input features and widely generalizable, a large amount of representative data are critical, which requires a significant workload for data collection and preprocess. To address this challenge, an open database similar to ImageNet would make great sense, where different researchers can share data and develop predictive models with better generalization and broader application by using large and diverse experimental or engineering data.

In summary, the future trajectory of machine learning in MBRs encompasses a multitude of dimensions, including monitoring multidimensional features with online potential, extracting spectral-driven molecular details, integrating intelligent control system based on real-time warning models, and establishing open and shared data systems. Additionally, model innovation is also noteworthy. Cutting-edge AI methods, such as explainable reinforcement learning (Yu et al., 2023) and large models based on the Transformer framework or multimodality (Vasu et al., 2023), have demonstrated considerable potential in certain fields, including remote sensing (Sun et al., 2023), spatio-temporal early prediction (Wei et al., 2024), and 3D object detection (Li et al., 2024b). It is also important for researchers to consider simple cutting-edge models such as KAN (Kolmogorov-Arnold Network with learnable activation functions on weights rather than employing fixed activation functions on neurons) (Liu et al., 2024), xLSTM (extended LSTM with exponential gating and enhanced memory structures) (Beck et al., 2024). The incorporation of physics-informed, physics-aware, or data-knowledge driven methodologies can also improve the model performance and/or interpretability (Nguyen et al., 2023; Li et al., 2024a; Wang et al., 2024a). However, these new models or methods have not been significantly underutilized in the field of water management and wastewater engineering. It is recommended that MBR researchers direct their attention to these cutting-edge directions. Machine learning (including deep learning) has several applications in materials, biology, medicine, and remote sensing. The field of water treatment may benefit from the interdisciplinary or cross-disciplinary inspiration of machine learning techniques, which have the potential to enhance the existing approaches to water treatment. The profiling of data types, solution goals, and application requirements facilitates the selection and optimization of models. Further comprehensive exploration of the prospective applications of intelligent optimization algorithms (also known as metaheuristic algorithms), AutoML, and XAI in MBR procedures may facilitate the advancement of effective models for process prediction and explanation.

6 6 Conclusions

This paper summarized recent advances in machine learning for predicting the performance of MBR pollutants removal and membrane fouling. Based on the literature review, a range of machine learning methods, including ANN, SVM, and decision trees, have been utilized in this scope. Ordinary ANNs have dominated the prediction models, and there are limitations in the study of deep learning models. This paper not only reviewed the basic principles related to machine learning but also presented a tutorial example for readers to practice five models for the prediction of TMP. Alongside the great potential of machine learning in MBR research, further development and application of the models in full-scale engineering are challenged by inadequacies in input features and monitoring metrics, insufficient model interpretability and generalizability, and lack of practicability in automated process control. A more complete input index system reinforced by online monitoring would be crucial for developing real-time responsive or feedforward models that bridge the gap between model prediction and practical control. An open and shared MBR operation database would largely benefit advancements in model generalizability and interpretability. The application of deep learning and intelligent optimization algorithms may enhance the model performance. In addition, the integration of AutoML and XAI may facilitate the deployment of models in practical engineering applications. The information presented in this paper is expected to provide implications for future researches toward machine learning-based intelligent operation and maintenance of MBR processes.

7 7 Abbreviations

Abbreviation	Description
AIC	Akaike information criterion
ANFIS	Adaptive network-based fuzzy inference system
AnMBR	Anaerobic membrane bioreactor
ANN	Artificial neural networks
AUC	Area under curve
AutoML	Automated machine learning
BA	Bat algorithm
BFGS	Broyden-Fletcher-Goldfarb-Shanno
BIC	Bayesian information criterion
BOD	Biochemical oxygen demand
BPNN	Back propagation neural network
CART	Classification and regression tree
CCI	Conventional concentration indices
CFI	Characteristic foulant indices
CNN	Convolutional neural network
COD	Chemical oxygen demand
CV	Cross-validation
DNN	Deep neural network
DO	Dissolved oxygen
EC	Electrical conductivity
EI	Environment indices
ENN	Elman neural network
EPS	Extracellular polymeric substances
FFA	Firefly algorithm
FCN	Fully connected network
GA	Genetic algorithms
GA-BP	Genetic algorithm-back propagation
GBDT	Gradient boosting decision tree
GNN	Graph neural network
GWO	Grey wolf optimizer
HQC	Hannan-Quinn criterion
HRT	Hydraulic retention time
KAN	Kolmogorov-Arnold network
KNN	K-nearest neighbors
LM	Levenberg-Marquardt
LSSVM	Least-squares support vector machine
LSTM	Long short-term memory
MAPE	Mean absolute percentage error
MBR	Membrane bioreactor
MFI	Membrane filtration indices
MLP	Multilayer perceptron
MLSS	Mixed liquor suspended solids
MLVSS	Volatile MLSS
MSE	Mean square error
NH₃-N	Ammonium nitrogen
NO₃⁻-N	Nitrate nitrogen
ODR	Oxygen decay rate
OI	Operation indices
OLR	Organic loading rate
ORP	Oxidation reduction potential
PCA	Principal component analysis
PSO	Particle swarm optimization
RBF	Radial basis function
RBFNN	Radial basis function neural network
RF	Random forest
RH	Relative hydrophobicity
RL	Reinforcement learning
RMSE	Root mean square error
RNN	Recurrent neural network
ROC	Receiver operating characteristic
SA	Simulated annealing
SHAP	Shapley additive explanations
SMP	Soluble microbial products
SRT	Sludge retention time
SVC	Support vector classification
SVM	Support vector machines
SVR	Support vector regression
t	Time parameter
T	Temperature
TMP	Transmembrane pressure
TN	Total nitrogen
TOC	Total organic carbon
TP	Total phosphorus
TSS	Total suspended solids
WNN	Wavelet neural network
XAI	Explainable artificial intelligence

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	SakakuraT, ChoiJ C, YasudaH. Transformation of carbon dioxide. Chem. Rev., 2007, 107(6): 2365–2387 CrossRef Google scholar

[2]	YangH, XuZ, FanM, GuptaR, SlimaneR B, BlandA E, WrightI. Progress in carbon dioxide separation and capture: A review. J. Environ. Sci., 2008, 20(1): 14–27 CrossRef Google scholar

[3]	KasugaK, KabataN. The fixation of carbon dioxide with 1,2-epoxypropane catalyzed by alkali-metal halide in the presence of a crown ether. Inorg. Chim. Acta, 1997, 257(2): 277–278 CrossRef Google scholar

[4]	LuX B, ZhangY J, JinK, LuoL M, WangH. Highly active electrophile-nucleophile catalyst system for the cycloaddition of CO₂ to epoxides at ambient temperature. J. Catal., 2004, 227(2): 537–541 CrossRef Google scholar

[5]	JingH, NguyenS T. SnCl₄-organic base: Highly efficient catalyst system for coupling reaction of CO₂ and epoxides. J. Mol. Catal. A, 2007, 261(1): 12–15 CrossRef Google scholar

[6]	KimH S, KimJ J, KimH, JangH G. Imidazolium zinc tetrahalide-catalyzed coupling reaction of CO₂ and ethylene oxide or propylene oxide. J. Catal., 2003, 220(1): 44–46 CrossRef Google scholar

[7]	LiF, XiaoL, XiaC, HuB. Chemical fixation of CO₂ with highly efficient ZnCl₂/[BMIm]Br catalyst system. Tetrahedron Lett., 2004, 45(45): 8307–8310 CrossRef Google scholar

[8]	SunJ, FujitaS, ZhaoF, AraiM. Synthesis of styrene carbonate from styrene oxide and carbon dioxide in the presence of zinc bromide and ionic liquid under mild conditions. Green Chem., 2004, 6(12): 613–616 CrossRef Google scholar

[9]	XiaoL F, LiF W, PengJ J, XiaC G. Immobilized ionic liquid/zinc chloride: Heterogeneous catalyst for synthesis of cyclic carbonates from carbon dioxide and epoxides. J. Mol. Catal. A, 2006, 253(1–2): 265–269

[10]	XiaoL F, LiF W, XiaC G. An easily recoverable and efficient natural biopolymer-supported zinc chloride catalyst system for the chemical fixation of carbon dioxide to cyclic carbonate. Appl. Catal. A, 2005, 279(1–2): 125–129

[11]	KimY J, VarmaR S. Tetrahaloindate(III)-based ionic liquids in the coupling reaction of carbon dioxide and epoxides to generate cyclic carbonates: H-bonding and mechanistic studies. J. Org. Chem., 2005, 70(20): 7882–7891 CrossRef Google scholar

[12]	ZhaoY, TianJ S, QiX H, HanZ N, ZhuangY Y, HeL N. Quaternary ammonium salt-functionalized chitosan: An easily recyclable catalyst for efficient synthesis of cyclic carbonates from epoxides and carbon dioxide. J. Mol. Catal. A, 2007, 271(1–2): 284–289

[13]	LuX B, ZhangY J, LiangB, LiX, WangH. Chemical fixation of carbon dioxide to cyclic carbonates under extremely mild conditions with highly active bifunctional catalysts. J. Mol. Catal. A, 2004, 210(1–2): 31–34

[14]	LuX B, HeR, BaiC X. Synthesis of ethylene carbonate from supercritical carbon dioxide/ethylene oxide mixture in the presence of bifunctional catalyst. J. Mol. Catal. A, 2002, 186(1–2): 1–11

[15]	SunJ M, FujitaS I, ZhaoF Y, AraiM. A highly efficient catalyst system of ZnBr₂/n-Bu₄NI for the synthesis of styrene carbonate from styrene oxide and supercritical carbon dioxide. Appl. Catal. A, 2005, 287(2): 221–226 CrossRef Google scholar

[16]	OnoF, QiaoK, TomidaD, YokoyamaC. Rapid synthesis of cyclic carbonates from CO₂ and epoxides under microwave irradiation with controlled temperature and pressure. J. Mol. Catal. A, 2007, 263(1–2): 223–226

[17]	XieH, LiS, ZhangS. Highly active, hexabutylguanidinium salt/zinc bromide binary catalyst for the coupling reaction of carbon dioxide and epoxides. J. Mol. Catal. A, 2006, 250(1–2): 30–34

[18]	SunJ, WangL, ZhangS, LiZ, ZhangX, DaiW, MoriR. ZnCl₂/phosphonium halide: An efficient Lewis acid/base catalyst for the synthesis of cyclic carbonate. J. Mol. Catal. A, 2006, 256(1–2): 295–300

[19]	PaddockR L, NguyenS T. Chemical CO₂ fixation: Cr(III) salen complexes as highly efficient catalysts for the coupling of CO₂ and epoxides. J. Am. Chem. Soc., 2001, 123(46): 11498–11499 CrossRef Google scholar

[20]	JutzF, GrunwaldtJ D, BaikerA. Mn(III)(salen)-catalyzed synthesis of cyclic organic carbonates from propylene and styrene oxide in “supercritical” CO₂. J. Mol. Catal. A, 2008, 279(1): 94–103 CrossRef Google scholar

[21]	BuZ, QinG, CaoS. A ruthenium complex exhibiting high catalytic efficiency for the formation of propylene carbonate from carbon dioxide. J. Mol. Catal. A, 2007, 277(1–2): 35–39

[22]	SuzukiH, MatanoY. Organobismuth Chemistry. Amsterdam: Elsevier, 2001

[23]	GagnonA, St-OngeM, LittleK, DuplessisM, BarabeìF. Direct N-cyclopropylation of cyclic amides and azoles employing a cyclopropylbismuth reagent. J. Am. Chem. Soc., 2007, 129(1): 44–45 CrossRef Google scholar

[24]	WuS S, DaiW L, YinS F, LiW S, AuC T. Bismuth subnitrate as an efficient heterogeneous catalyst for acetalization and ketalization of carbonyl compounds with diols. Catal. Lett., 2008, 124(1–2): 127–132 CrossRef Google scholar

[25]	BaoM, HayashiT, ShimadaS. Cationic organobismuth complex with 5,6,7,12-tetrahydrodibenz[c,f][1,5]azabismocine framework and its coordination complexes with neutral molecules. Organometallics, 2007, 26(7): 1816–1822 CrossRef Google scholar

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 20507005) and Outstanding Young Research Award of National Natural Science Foundation of China (Grant No. E50725825). Prof. Yin also shows sincere thanks to Dr. Shimada of National Institute of Advanced Industrial Science and Technology (AIST, Japan) for his kind help on organobismuth chemistry.

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap

PDF(383 KB)

Supplementary files

FSE-24134-OF-LYZ_suppl_1 (863 KB)

Accesses

Citations

Detail

Sections

Recommended

Abstract
Keywords
Cite this article
1 1 Introduction
2 2 Principles and methods of machine learning
Fig.1 Machine learning process: (a) general procedure; (b) detailed steps.
Fig.2 Classification of machine learning methods.
2.1 2.1 Machine learning methods
Tab.1 Summary and comparison of machine learning methods
2.1.1 2.1.1 Support vector machine
2.1.2 2.1.2 Artificial neural network
Fig.3 Schematic diagrams of artificial neural networks: (a) MLP, (b) RBFNN, (c) CNN, (d) RNN.
2.1.3 2.1.3 Decision tree and ensemble learning
2.1.4 2.1.4 k-nearest neighbors
2.1.5 2.1.5 Other methods
2.1.6 2.1.6 Related algorithms
2.2 2.2 Model optimization
Fig.4 Intelligent optimization algorithms: (a) genetic algorithms (GA), (b) particle swarm optimization (PSO), (c) simulated annealing (SA).
2.3 2.3 Assessment of model performance
2.4 2.4 Assessment of model interpretation
2.5 2.5 Model selection guide
3 3 Application of machine learning in MBRs
3.1 3.1 Overview of application
Fig.5 Distribution of different machine learning models used for MBR research: (a) membrane fouling prediction models, (b) pollutant removal prediction models.
3.2 3.2 Machine learning models to predict pollutant removal performances
3.3 3.3 Machine learning models to predict membrane fouling
3.3.1 3.3.1 ANN for membrane fouling prediction
Tab.2 Examples of ANN models to predict membrane filtration state in MBRs
3.3.2 3.3.2 Other model applications
4 4 Tutorial example
4.1 4.1 Method
4.1.1 4.1.1 Data preprocessing
4.1.2 4.1.2 Selection of model and policy
4.1.3 4.1.3 Model evaluation
4.2 4.2 Results and discussion
4.2.1 4.2.1 Presentation of raw data
4.2.2 4.2.2 Model performance
Tab.3 Performance of different machine learning models for the tutorial example
5 5 Summary and prospect
6 6 Conclusions
7 7 Abbreviations
References
Acknowledgements
RIGHTS & PERMISSIONS

Received	Accepted	Published
23 Apr 2008	15 Aug 2008	05 Mar 2009
Issue Date
05 Mar 2009

About the journal

Browse

Authors & reviewers

Abstract

Keywords

Cite this article

1 1 Introduction

2 2 Principles and methods of machine learning

Fig.1 Machine learning process: (a) general procedure; (b) detailed steps.

Fig.2 Classification of machine learning methods.

2.1 2.1 Machine learning methods

Tab.1 Summary and comparison of machine learning methods

2.1.1 2.1.1 Support vector machine

2.1.2 2.1.2 Artificial neural network

Fig.3 Schematic diagrams of artificial neural networks: (a) MLP, (b) RBFNN, (c) CNN, (d) RNN.

2.1.3 2.1.3 Decision tree and ensemble learning

2.1.4 2.1.4 k-nearest neighbors

2.1.5 2.1.5 Other methods

2.1.6 2.1.6 Related algorithms

2.2 2.2 Model optimization

Fig.4 Intelligent optimization algorithms: (a) genetic algorithms (GA), (b) particle swarm optimization (PSO), (c) simulated annealing (SA).

2.3 2.3 Assessment of model performance

2.4 2.4 Assessment of model interpretation

2.5 2.5 Model selection guide

3 3 Application of machine learning in MBRs

3.1 3.1 Overview of application

Fig.5 Distribution of different machine learning models used for MBR research: (a) membrane fouling prediction models, (b) pollutant removal prediction models.

3.2 3.2 Machine learning models to predict pollutant removal performances

3.3 3.3 Machine learning models to predict membrane fouling

3.3.1 3.3.1 ANN for membrane fouling prediction

Tab.2 Examples of ANN models to predict membrane filtration state in MBRs

3.3.2 3.3.2 Other model applications

4 4 Tutorial example

4.1 4.1 Method

4.1.1 4.1.1 Data preprocessing

4.1.2 4.1.2 Selection of model and policy

4.1.3 4.1.3 Model evaluation

4.2 4.2 Results and discussion

4.2.1 4.2.1 Presentation of raw data

4.2.2 4.2.2 Model performance

Tab.3 Performance of different machine learning models for the tutorial example

5 5 Summary and prospect

6 6 Conclusions

7 7 Abbreviations

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

RIGHTS & PERMISSIONS