Comparison of data driven and data-mechanism hybrid driven methods for key variables prediction based on data sets with different sample sizes and noises

Qihang Tan , Chao Wang , Wange Li , Jinghao Sun , Jun Zhao

ENG. Chem. Eng. ›› 2026, Vol. 20 ›› Issue (2) : 11

PDF (3314KB)
ENG. Chem. Eng. ›› 2026, Vol. 20 ›› Issue (2) :11 DOI: 10.1007/s11705-026-2632-z
RESEARCH ARTICLE

Comparison of data driven and data-mechanism hybrid driven methods for key variables prediction based on data sets with different sample sizes and noises

Author information +
History +
PDF (3314KB)

Abstract

Soft measurement based on data-driven models is an important method to predict key variables in process industry due to low latency demand and economics costs. However, data-driven models cannot provide accurate prediction on a noisy data set with a small number of samples. In response to the challenge of noisy data and lack of samples, several data-mechanism hybrid driven methods are proposed to improve key variables prediction performances on the basis of three data-driven models including random forest, extreme gradient boosting, and artificial neural network. Simultaneously, the effectiveness of hybrid driven methods proposed is validated via two cases including benzene-toluene-xylene distillation and steam methane reforming process, where data sets feature different sample sizes and noise intensity. The comparison results show that the hybrid driven methods can improve the prediction accuracy to a certain extent. The degree of improvement depends on the noise intensity, sample size, and data-driven model selected. Under conditions of noise intensity at 10%–20% and sample size ranging from 100 to 400 in this work, after adopting the hybrid driven methods, the coefficient of determination for random forest, extreme gradient boosting, and artificial neural network can be improved by 0.3%–5.2%, 0.6%–17.7%, and 0.1%–36.2% compared to corresponding data driven models.

Graphical abstract

Keywords

data-mechanism hybrid driven methods / different sample sizes / noise dataset / machine learning / process industry

Cite this article

Download citation ▾
Qihang Tan, Chao Wang, Wange Li, Jinghao Sun, Jun Zhao. Comparison of data driven and data-mechanism hybrid driven methods for key variables prediction based on data sets with different sample sizes and noises. ENG. Chem. Eng., 2026, 20(2): 11 DOI:10.1007/s11705-026-2632-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Yuan X , Huang L , Ye L , Wang Y , Wang K , Yang C , Gui W , Shen F . Quality prediction modeling for industrial processes using multiscale attention-based convolutional neural network. IEEE Transactions on Cybernetics, 2024, 54(5): 2696–2707

[2]

Yuan X , Qi S , Wang Y , Wang K , Yang C , Ye L . Quality variable prediction for nonlinear dynamic industrial processes based on temporal convolutional networks. IEEE Sensors Journal, 2021, 21(18): 20493–20503

[3]

Frafjord A J , Radicke J P , Keprate A , Komulainen T M . Data-driven approaches for deriving a soft sensor in a district heating network. Energy, 2024, 292: 130426

[4]

Wang K , Shang C , Ke W , Jiang Y , Huang D . Automatic structure and parameters tuning method for deep neural network soft sensor in chemical industries. Chinese Journal of Chemical Engineering, 2018, 69(3): 900

[5]

Yu Z , Zhang Z , Jiang Q , Yan X . Neural network-based hybrid modeling approach incorporating Bayesian optimization with industrial soft sensor application. Knowledge-Based Systems, 2024, 301: 112341

[6]

Huang B , Qi Y , Murshed A K M M . Dynamic Modeling and Predictive Control in Solid Oxide Fuel Cells: First Principle and Data-Based Approaches. New Jersey: John Wiley & Sons, 2013,

[7]

Yuan X , Xu N , Ye L , Wang K , Shen F , Wang Y , Yang C , Gui W . Attention-based interval aided networks for data modeling of heterogeneous sampling sequences with missing values in process industry. IEEE Transactions on Industrial Informatics, 2024, 20(4): 5253–5262

[8]

Hasnen S H , Shahid M , Zabiri H , Tapvi S A A . Semi-supervised adaptive PLS soft-sensor with PCA-based drift correction method for online valuation of NOx emission in industrial water-tube boiler. Process Safety and Environmental Protection, 2023, 172: 787–801

[9]

Pang Z , Huang Z , Lian C , Peng C , Fang X , Liu H . Data-driven prediction of product yields and control framework of hydrocracking unit. Chemical Engineering Science, 2024, 283: 119386

[10]

Ching P M L , Zou X , Wu D , So R H Y , Chen G H . Development of a wide-range soft sensor for predicting wastewater BOD5 using an extreme gradient boosting (XGBoost) machine. Environmental Research, 2022, 210: 112953

[11]

Xu W , Tang J , Xia H , Yu W , Qiao J . Multi-objective PSO semi-supervised random forest method for dioxin soft sensor. Engineering Applications of Artificial Intelligence, 2024, 135: 108772

[12]

Yuan X , Wang Y , Wang C , Ye L , Wang K , Wang Y , Yang C , Gui W , Shen F . Variable correlation analysis-based convolutional neural network for far topological feature extraction and industrial predictive modeling. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 1–10

[13]

Ma Y , Gao Z , Shi P , Chen M , Wu S , Yang C , Wang J K , Cheng J , Gong J . Machine learning-based solubility prediction and methodology evaluation of active pharmaceutical ingredients in industrial crystallization. Frontiers of Chemical Science and Engineering, 2022, 16(4): 523–535

[14]

Zhou X , Li Z , Feng X , Yan H , Chen D , Yang C . A hybrid deep learning framework driven by data and reaction mechanism for predicting sustainable glycolic acid production performance. AIChE Journal. American Institute of Chemical Engineers, 2023, 69(7): e18083

[15]

Zhou W , Li X , Qi Z , Zhao H , Yi J . A shale gas production prediction model based on masked convolutional neural network. Applied Energy, 2024, 353: 122092

[16]

Zhou J , Li X , Liu D , Wang F , Zhang T , Ye M , Liu Z . A hybrid spatial-temporal deep learning prediction model of industrial methanol-to-olefins process. Frontiers of Chemical Science and Engineering, 2024, 18(4): 42

[17]

Tong Y , Shu M , Li M , Liu Y , Tao R , Zhou C , Zhao Y , Zhao G , Li Y , Dong Y , Zhang L , Liu L , Du J . A neural network-based production process modeling and variable importance analysis approach in corn to sugar factory. Frontiers of Chemical Science and Engineering, 2023, 17(3): 358–371

[18]

Raissi M , Perdikaris P , Karniadakis G E . Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 2019, 378: 686–707

[19]

Ren J C , Liu D , Wan Y . Modeling and application of czochralski silicon single crystal growth process using hybrid model of data-driven and mechanism-based methodologies. Journal of Process Control, 2021, 104: 74–85

[20]

Wang C , Yu Z , Du X , Sun X . An improved high-fidelity adaptive model for integrated inlet-engine-nozzle based on mechanism-data fusion. Chinese Journal of Aeronautics, 2024, 37(8): 190–207

[21]

Jiang W , Li Z , Kang X , Luo L , Zhou Y , Liu Q , Liu K , Ji X , He G . Hybrid modeling approach for natural gas desulfurization process: coupling mechanism and data modeling via compact variable identification. Gas Science and Engineering, 2024, 123: 205243

[22]

Meng L , Ding J , Li X , Cao G , Li Y , Zhang D . Novel shape control system of hot-rolled strip based on machine learning fused mechanism model. Expert Systems with Applications, 2024, 255: 124789

[23]

Zhang Z , Wang Y , Zhang D , Zhao D , Shi H , Yao H , Zhou X , Feng X , Yang C . Integration of physical information and reaction mechanism data for surrogate prediction model and multi-objective optimization of glycolic acid production. Green Chemical Engineering, 2025, 6(2): 169–180

[24]

Li H . Machine Learning Methods. Berlin: Springer Nature, 2023,

[25]

Chen T , Guestrin C . XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, 785–794

[26]

Li W , Zhuang Y , Liu L , Zhang L , Du J . Process evaluation and optimization of methanol production from shale gas based on kinetics modeling. Journal of Cleaner Production, 2020, 274: 123153

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (3314KB)

Supplementary files

Supplementary materials

220

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/