RESEARCH ARTICLE

A neural network-based production process modeling and variable importance analysis approach in corn to sugar factory

  • Yi Tong 1 ,
  • Mou Shu 2 ,
  • Mingxin Li 3 ,
  • Yingwei Liu 2 ,
  • Ran Tao 3 ,
  • Congcong Zhou 3 ,
  • You Zhao 1 ,
  • Guoxing Zhao 1 ,
  • Yi Li , 1 ,
  • Yachao Dong 3 ,
  • Lei Zhang 3 ,
  • Linlin Liu 3 ,
  • Jian Du , 3
Expand
  • 1. COFCO Biotechnology Co., Ltd., Beijing 100005, China
  • 2. COFCO Nutrition and Health Research Institute Co., Ltd., Beijing 102209, China
  • 3. Institute of Process Systems Engineering, School of Chemical Engineering, Dalian University of Technology, Dalian 116024, China
Li-yi1@cofco.com
dujian@dlut.edu.cn

Received date: 05 Apr 2022

Accepted date: 16 May 2022

Published date: 15 Mar 2023

Copyright

2022 Higher Education Press

Abstract

Corn to sugar process has long faced the risks of high energy consumption and thin profits. However, it’s hard to upgrade or optimize the process based on mechanism unit operation models due to the high complexity of the related processes. Big data technology provides a promising solution as its ability to turn huge amounts of data into insights for operational decisions. In this paper, a neural network-based production process modeling and variable importance analysis approach is proposed for corn to sugar processes, which contains data preprocessing, dimensionality reduction, multilayer perceptron/convolutional neural network/recurrent neural network based modeling and extended weights connection method. In the established model, dextrose equivalent value is selected as the output, and 654 sites from the DCS system are selected as the inputs. LASSO analysis is first applied to reduce the data dimension to 155, then the inputs are dimensionalized to 50 by means of genetic algorithm optimization. Ultimately, variable importance analysis is carried out by the extended weight connection method, and 20 of the most important sites are selected for each neural network. The results indicate that the multilayer perceptron and recurrent neural network models have a relative error of less than 0.1%, which have a better prediction result than other models, and the 20 most important sites selected have better explicable performance. The major contributions derived from this work are of significant aid in process simulation model with high accuracy and process optimization based on the selected most important sites to maintain high quality and stable production for corn to sugar processes.

Cite this article

Yi Tong , Mou Shu , Mingxin Li , Yingwei Liu , Ran Tao , Congcong Zhou , You Zhao , Guoxing Zhao , Yi Li , Yachao Dong , Lei Zhang , Linlin Liu , Jian Du . A neural network-based production process modeling and variable importance analysis approach in corn to sugar factory[J]. Frontiers of Chemical Science and Engineering, 2023 , 17(3) : 358 -371 . DOI: 10.1007/s11705-022-2190-y

Acknowledgement

The authors are grateful for the financial supports of Special Foundation for State Major Basic Research Program of China (Grant No. 2021YFD2101000).

Electronic Supplementary Material

Supplementary material is available in the online version of this article at https://dx.doi.org/10.1007/s11705-022-2190-y and is accessible for authorized users.
1
Kirmse A, Kuschicke F, Hoffmann M. Industrial big data: from data to information to actions. 4th International Conference on Internet of Things. Big Data and Security, 2019,

2
Tian W, Ren Y, Dong Y, Wang S, Bu L. Fault monitoring based on mutual information feature engineering modeling in chemical process. Chinese Journal of Chemical Engineering, 2019, 27(10): 2491–2497

DOI

3
Kira K, Rendell L A. The feature selection problem: traditional methods and a new algorithm. AAAI-92 Proceedings: Tenth National Conference on Artificial Intelligence, 1992, 129–134

4
Barros R S M, Hidalgo J I G, Cabral D R L. Wilcoxon rank sum test drift detector. Neurocomputing, 2018, 275: 1954–1963

DOI

5
Malik H, Yadav A K. A novel hybrid approach based on relief algorithm and fuzzy reinforcement learning approach for predicting wind speed. Sustainable Energy Technologies and Assessments, 2021, 43: 100920

DOI

6
Wold S, Sjostrom M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 2001, 58(2): 109–130

DOI

7
Li H, Xu Q, Liang Y. Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. Analytica Chimica Acta, 2012, 740: 20–26

DOI

8
Cutler A, Cutler D R, Stevens J R. Random forests. Machine Learning, 2004, 45: 157–176

9
Zavaljevski N, Stevens F J, Reifman J. Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics, 2002, 18(5): 689–696

DOI

10
Li Z, Liu P, Wang W, Xu C. Using support vector machine models for crash injury severity analysis. Accident; Analysis and Prevention, 2012, 45: 478–486

DOI

11
Olden J D, Jackson D A. Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling, 2002, 154(1–2): 135–150

DOI

12
Yun Y H, Deng B C, Cao D S, Wang W T, Liang Y Z. Variable importance analysis based on rank aggregation with applications in metabolomics for biomarker discovery. Analytica Chimica Acta, 2016, 911: 27–34

DOI

13
Qin S J. Process data analytics in the era of big data. AIChE Journal, 2014, 60(9): 3092–3100

DOI

14
Dimopoulos Y, Bourret P, Lek S. Use of some sensitivity criteria for choosing networks with good generalization ability. Neural Processing Letters, 1995, 2(6): 1–4

DOI

15
Dimopoulos I, Chronopoulos J, Chronopoulou-Sereli A, Lek S. Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece). Ecological Modelling, 1999, 120(2–3): 157–165

DOI

16
Garson G D. Interpreting neural network connection weights. Artificial Intelligence Expert, 1991, 6: 47–51

17
Scardi M, Harding L W Jr. Developing an empirical model of phytoplankton primary production: a neural network case study. Ecological Modelling, 1999, 120(2–3): 213–223

DOI

18
Lek S, Belaud A, Baran P, Dimopoulos I, Delacoste M. Role of some environmental variables in trout abundance models using neural networks. Aquatic Living Resources, 1996, 9(1): 23–29

DOI

19
Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S. Application of neural networks to modelling nonlinear relationships in ecology. Ecological Modelling, 1996, 90(1): 39–52

DOI

20
Balls G R, Palmer-Brown D, Sanders G E. Investigating microclimatic influences on ozone injury in clover (Trifolium subterraneum) using artificial neural networks. New Phytologist, 1996, 132(2): 271–280

DOI

21
Grahovac J, Jokic A, Dodic J, Vucurovic D, Dodic S. Modelling and prediction of bioethanol production from intermediates and byproduct of sugar beet processing using neural networks. Renewable Energy, 2016, 85: 953–958

DOI

22
Hao W R, Lu Z Z, Wei P F, Feng J, Wang B T. A new method on ANN for variance based importance measure analysis of correlated input variables. Structural Safety, 2012, 38: 56–63

DOI

23
de Sa C R. Variance-based feature importance in neural networks. Discovery Science, 22nd International Conference, 2019, 306–315

24
Hadzima-Nyarko M, Nyarko E K, Moric D. A neural network based modelling and sensitivity analysis of damage ratio coefficient. Expert Systems with Applications, 2011, 38(10): 13405–13413

DOI

25
Cortez P, Embrechts M J. Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 2013, 225: 1–17

DOI

26
Hadjisolomou E, Stefanidis K, Papatheodorou G, Papastergiadou E. Assessing the contribution of the environmental parameters to eutrophication with the use of the “PaD” and “PaD2” methods in a hypereutrophic lake. International Journal of Environmental Research and Public Health, 2016, 13(8): 764

DOI

27
Yang B, Li H. A novel convolutional neural network based approach to predictions of process dynamic time delay 286 sequences. Chemometrics and Intelligent Laboratory Systems, 2018, 174: 56–61

DOI

28
Wang Y J, Li H G. A novel intelligent modeling framework integrating the convolutional neural network with an adaptive time-series window and its application to industrial process operational optimization. Chemometrics and Intelligent Laboratory Systems, 2018, 179: 64–72

DOI

29
Wang Y, Li H. Industrial process time-series modeling based on adapted receptive field temporal convolution networks concerning multi-region operations. Computers & Chemical Engineering, 2020, 139: 106877

DOI

30
Yang W, Yang C, Hao Z Y, Xie C Q, Li M Z. Diagnosis of plant cold damage based on hyperspectral imaging and convolutional neural network. IEEE Access: Practical Innovations, Open Solutions, 2019, 7: 118239–118248

DOI

31
Liu Q, Zhang L, Tang K, Liu L, Du J, Meng Q, Gani R. Machine learning-based atom contribution method for the prediction of charge density profiles and solvent design. AIChE Journal, 2021, 67(2): e17110

DOI

32
Liu Q, Jiang Y, Zhang L, Du J. A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship. Frontiers of Chemical Science and Engineering, 2022, 16(2): 152–167

DOI

33
Chang Z, Zhang Y, Chen W. Electricity price prediction based on hybrid model of adam optimized LSTM neural network and wavelet transform. Energy, 2019, 187: 115804

DOI

34
Maples M P, Reichart D E, Konz N C, Berger T A, Trotter A S, Martin J R, Dutton D A, Paggen M L, Joyner R E, Salemi C P. Robust Chauvenet Outlier Rejection. Astrophysical Journal. Supplement Series, 2018, 238(1): 2

DOI

35
Elko G W, Sondhi M M, West J E. Noise reduction processing arrangement for microphone arrays. Journal of the Acoustical Society of America, 1989, 88(6): 2919

DOI

36
López-Medina C, Ladehesa-Pineda L, Puche-Larrubia M Á, Escudero-Contreras A, Font-Ugalde P, Collantes-Estévez E. Which factors explain the patient global assessment in patients with ankylosing spondylitis? A hierarchical cluster analysis on REGISPONSER-AS. Seminars in Arthritis and Rheumatism, 2021, 51(4): 1–5

DOI

37
Lin J, Li S. Sparse recovery with coherent tight frames via analysis Dantzig selector and analysis LASSO. Applied and Computational Harmonic Analysis, 2014, 37(1): 126–139

DOI

38
MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967, 281–297

39
Ranade N, Nagarajan S, Sarvothaman V, Ranade V. ANN based modelling of hydrodynamic cavitation processes: biomass pre-treatment and wastewater treatment. Ultrasonics Sonochemistry, 2021, 72: 105428

DOI

40
Zhang X, Liu L, Long G, Jiang J, Liu S. Episodic memory govern schoices: an RNN-based reinforcement learning model for decision-making task. Neural Networks, 2021, 134: 1–10

DOI

41
Liu S, Lee I. Sequence encoding incorporated CNN model for email document sentiment classification. Applied Soft Computing, 2021, 102: 107104

DOI

Outlines

/