Machine learning for small-data in aquatic environments: a review of challenges, methods, and optimization approaches

Yulin Chen , Lin Liu , Dawen Gao

ENG. Environ. ›› 2026, Vol. 20 ›› Issue (6) : 86

PDF (4049KB)
ENG. Environ. ›› 2026, Vol. 20 ›› Issue (6) :86 DOI: 10.1007/s11783-026-2186-9
REVIEW ARTICLE
Machine learning for small-data in aquatic environments: a review of challenges, methods, and optimization approaches
Author information +
History +
PDF (4049KB)

Abstract

Aquatic environmental systems often suffer from low monitoring frequency, limited spatial coverage, and high experimental costs, resulting in small-data characteristics such as limited sample size, high dimensionality, and structural heterogeneity. These issues significantly limit the performance and generalizability of machine learning models. This review examines the challenges associated with applying machine learning to model under small-data conditions in aquatic environments. Building on the structural features of representative datasets, current mainstream approaches are systematically evaluated, and their adaptability and robustness across different application scenarios are compared. Drawing on cross-disciplinary experience, it proposes a modeling framework tailored to aquatic systems and emphasizes the coordinated optimization of data preparation, model construction, and performance evaluation. The analysis highlights that data incompleteness and non-stationarity are the primary obstacles in small-data modeling and that constructing problem-oriented modeling workflows is crucial for enhancing predictive reliability and the robustness of the results. Taken together, these efforts provide theoretical and methodological guidance for intelligent environmental modeling and scientific decision-making under small-data conditions.

Graphical abstract

Keywords

Aquatic environment / Small-data machine learning / Data characteristics / Modeling workflow

Highlight

● Small-data challenges in aquatic machine learning were reviewed.

● Data characteristics and small-data patterns in aquatic studies were summarized.

● A diagnostic approach and a practical workflow were outlined and discussed.

Cite this article

Download citation ▾
Yulin Chen, Lin Liu, Dawen Gao. Machine learning for small-data in aquatic environments: a review of challenges, methods, and optimization approaches. ENG. Environ., 2026, 20(6): 86 DOI:10.1007/s11783-026-2186-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Aalizadeh R , Nika M C , Thomaidis N S . (2019). Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants. Journal of Hazardous Materials, 363: 277–285

[2]

Achar S K , Keith J A . (2024). Small data machine learning approaches in molecular and materials science. Chemical Reviews, 124(24): 13571–13573

[3]

Ahn J M , Kim J , Kim H , Kim K . (2023). Harmful cyanobacterial blooms forecasting based on improved CNN-transformer and temporal fusion transformer. Environmental Technology & Innovation, 32: 103314

[4]

Ai J QHan X XChen L JHe H QLi X LTan Y BXie T CTang X T (2025). Deep neural network and transfer learning for annual wetland vegetation mapping using sentinel-2 time-series data in the heterogeneous lake floodplain environment. International Journal of Remote Sensing, 46: 1–24

[5]

Alagialoglou L , Manakos I , Papadopoulou S , Chadoulis R T , Kita A . (2023). Mapping underwater aquatic vegetation using foundation models with air- and space-borne images: the case of Polyphytos Lake. Remote Sensing, 15(16): 4001

[6]

Andaryani S , Afkhaminia A . (2024). Real-time prediction of river ice breakup phenomena: a jittered genetic programming model and wavelet analysis integrating remotely sensed imagery and machine learning. Journal of Hydrology, 644: 132097

[7]

Azzam A , Zhang W C , Akhtar F , Shaheen Z , Elbeltagi A . (2022). Estimation of green and blue water evapotranspiration using machine learning algorithms with limited meteorological data: a case study in Amu Darya River Basin, Central Asia. Computers and Electronics in Agriculture, 202: 107403

[8]

Babakhani P , Bridge J , Doong R A , Phenrat T . (2017). Parameterization and prediction of nanoparticle transport in porous media: a reanalysis using artificial neural network. Water Resources Research, 53(6): 4564–4585

[9]

Babakhani P , Dale A W , Woulds C , Moore O W , Xiao K Q , Curti L , Peacock C L . (2025). Preservation of organic carbon in marine sediments sustained by sorption and transformation processes. Nature Geoscience, 18(1): 78–83

[10]

Banda K , Pietersen K , Hamutoko J , Kanyerere T , Munamati M , Kujinga K , Nyambe I . (2024). Using a comparative of DRASTIC and Bayesian weights of evidence approach to assess transboundary aquifer vulnerability in a data scarcity region: Tuli-Karoo aquifer. Journal of Hydrology: Regional Studies, 55: 101930

[11]

Banda K , Shilengwe C , Nyambe I . (2025). Integrating environmental and LULC drivers of groundwater droughts in groundwater-dependent ecosystems: a machine learning (XGBoost)-SEM analysis with ecosystem implications. Ecological Processes, 14(1): 64

[12]

Beal M R W , Özdoğan M , Block P J . (2024). A machine learning and remote sensing-based model for algae pigment and dissolved oxygen retrieval on a small inland lake. Water Resources Research, 60(3): e2023WR035744

[13]

Belina Y , Kebede A , Masinde M . (2024). Comparative analysis of HEC-HMS and machine learning models for rainfall-runoff prediction in the upper Baro watershed, Ethiopia. Hydrology Research, 55(9): 873–889

[14]

Bozeman III J F . (2024). Bolstering integrity in environmental data science and machine learning requires understanding socioecological inequity. Frontiers of Environmental Science & Engineering, 18(5): 65

[15]

Cai W C , Ye C , Ao F Y , Xu Z X , Chu W H . (2025). Emerging applications of fluorescence excitation-emission matrix with machine learning for water quality monitoring: a systematic review. Water Research, 277: 123281

[16]

Cao H L , Xie X J , Shi J B , Jiang G B , Wang Y X . (2022). Siamese network-based transfer learning model to predict geogenic contaminated groundwaters. Environmental Science & Technology, 56(15): 11071–11079

[17]

Cao H L , Xie X J , Xiao Z Y , Liu W J . (2024). Transferability of machine learning models for geogenic contaminated groundwaters. Environmental Science & Technology, 58(20): 8783–8791

[18]

Chang S Y , Schwenk J , Solander K C . (2025). Deep learning advances Arctic river water temperature predictions. Water Resources Research, 61(6): e2024WR039053

[19]

Chen L , Hu J , Wang H , He Y Y , Deng Q Y , Wu F F . (2024). Predicting Cd(II) adsorption capacity of biochar materials using typical machine learning models for effective remediation of aquatic environments. Science of the Total Environment, 944: 173955

[20]

Cheng Y S , Zhang K , Huang K , Zhang H C . (2024). Meta-analysis and machine learning models for anaerobic biodegradation rates of organic contaminants in sediments and sludge. Environmental Science & Technology, 58(29): 12976–12988

[21]

Deng L L , Qian L X , Hong M , Li D Y , Hu Y J . (2025). A coupled model of nonlinear dynamical and deep learning for monthly precipitation prediction with small samples. Stochastic Environmental Research and Risk Assessment, 39(5): 1877–1898

[22]

Ding J F , Sun C J , He C F , Li J X , Ju P , Li F M . (2021). Microplastics in four bivalve species and basis for using bivalves as bioindicators of microplastic pollution. Science of the Total Environment, 782: 146830

[23]

Dodangeh E , Choubin B , Eigdir A N , Nabipour N , Panahi M , Shamshirband S , Mosavi A . (2020). Integrated machine learning methods with resampling algorithms for flood susceptibility prediction. Science of the Total Environment, 705: 135983

[24]

Donnelly J , Daneshkhah A , Abolfathi S . (2024). Physics-informed neural networks as surrogate models of hydrodynamic simulators. Science of the Total Environment, 912: 168814

[25]

Dou B Z , Zhu Z L , Merkurjev E , Ke L , Chen L , Jiang J , Zhu Y Y , Liu J , Zhang B G , Wei G W . (2023). Machine learning methods for small data challenges in molecular science. Chemical Reviews, 123(13): 8736–8780

[26]

Duan H T , Xiao Q T , Qi T C , Hu C , Zhang M , Shen M , Hu Z H , Wang W , Xiao W , Qiu Y G . et al. (2023). Quantification of diffusive methane emissions from a large eutrophic lake with satellite imagery. Environmental Science & Technology, 57(36): 13520–13529

[27]

El Bilali A , Lamane H , Taleb A , Nafii A . (2022). A framework based on multivariate distribution-based virtual sample generation and DNN for predicting water quality with small data. Journal of Cleaner Production, 368: 133227

[28]

El Bilali A , Taleb A , Bahlaoui M A , Brouziyne Y . (2021). An integrated approach based on Gaussian noises-based data augmentation method and AdaBoost model to predict faecal coliforms in rivers with small dataset. Journal of Hydrology, 599: 126510

[29]

Fang S B , Deitch M J , Gebremicael T G . (2025). Evaluating the reliability of data interpolation and machine learning methods for water quality management: a SWAT model comparison. Environmental Earth Sciences, 84(10): 274

[30]

Faraway J J , Augustin N H . (2018). When small data beats big data. Statistics & Probability Letters, 136: 142–145

[31]

Fildes S G , Clark I F , Bruce D , Raimondo T . (2025). An ensemble model of knowledge- and data-driven geospatial methods for mapping groundwater potential in a data-scarce, semi-arid fractured rock region. Applied Water Science, 15(4): 86

[32]

Fooladi M , Nikoo M R , Mirghafari R , Madramootoo C A , Al-Rawas G , Nazari R . (2024). Robust clustering-based hybrid technique enabling reliable reservoir water quality prediction with uncertainty quantification and spatial analysis. Journal of Environmental Management, 362: 121259

[33]

Fu G T , Jin Y W , Sun S A , Yuan Z G , Butler D . (2022). The role of deep learning in urban water management: a critical review. Water Research, 223: 118973

[34]

Gad M , Khomami N T S , Krieg R , Schor J , Philippe A , Lechtenfeld O J . (2025). Environmental drivers of dissolved organic matter composition across central European aquatic systems: a novel correlation-based machine learning and FT-ICR MS approach. Water Research, 273: 123018

[35]

Gajewicz-Skretna A , Furuhama A , Yamamoto H , Suzuki N . (2021). Generating accurate in silico predictions of acute aquatic toxicity for a range of organic chemicals: towards similarity-based machine learning methods. Chemosphere, 280: 130681

[36]

Garzón A , Kapelan Z , Langeveld J , Taormina R . (2022). Machine learning-based surrogate modeling for urban water networks: review and future research directions. Water Resources Research, 58(5): e2021WR031808

[37]

Ghozatlou O , Datcu M , Focsa A , Conde M H , Ullo S L . (2024). A review and a perspective of deep active learning for remote sensing image analysis: enhanced adaptation to user conjecture. IEEE Geoscience and Remote Sensing Magazine, 12(3): 125–148

[38]

Godasiaei S H . (2025). Predictive modeling of microplastic adsorption in aquatic environments using advanced machine learning models. Science of the Total Environment, 958: 178015

[39]

Gong M G , Li J Z , Zhang Y R , Wu Y , Zhang M Y . (2022). Two-path aggregation attention network with quad-patch data augmentation for few-shot scene classification. IEEE Transactions on Geoscience and Remote Sensing, 60: 4511616

[40]

Gordon-Rodriguez EQuinn T PCunninghham J P (2022). Data augmentation for compositional data: advancing predictive models of the microbiome. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 1494

[41]

Gulshin I , Kuzina O . (2024). Machine learning methods for the prediction of wastewater treatment efficiency and anomaly classification with lack of historical data. Applied Sciences, 14(22): 10689

[42]

Guo Y X , Peng H R , Wang Q R , Wang J Q , Wu Z Y , Shao B , Xing G D , Huang Z , Zhao F , Cui H Y . et al. (2025). Unveiling the global dynamics of dissolved organic carbon in aquatic ecosystems: climatic and anthropogenic impact, and future predictions. Science of the Total Environment, 958: 178109

[43]

Gupta S , Aga D , Pruden A , Zhang L Q , Vikesland P . (2021). Data analytics for environmental science and engineering research. Environmental Science & Technology, 55(16): 10895–10907

[44]

Han J W , Kim T , Lee S , Kang T , Im J K . (2024). Machine learning and explainable AI for chlorophyll-a prediction in Namhan River watershed, South Korea. Ecological Indicators, 166: 112361

[45]

Harrell F E (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer

[46]

Heacock M L , Lopez A R , Amolegbe S M , Carlin D J , Henry H F , Trottier B A , Velasco M L , Suk W A . (2022). Enhancing data integration, interoperability, and reuse to address complex and emerging environmental health problems. Environmental Science & Technology, 56(12): 7544–7552

[47]

Hollmann N , Müller S , Purucker L , Krishnakumar A , Körfer M , Hoo S B , Schirrmeister R T , Hutter F . (2025). Accurate predictions on small data with a tabular foundation model. Nature, 637(8045): 319–326

[48]

Hong S M , Morgan B J , Stocker M D , Smith J E , Kim M S , Cho K H , Pachepsky Y A . (2024). Using machine learning models to estimate Escherichia coli concentration in an irrigation pond from water quality and drone-based RGB imagery data. Water Research, 260: 121861

[49]

Hospedales T , Antoniou A , Micaelli P , Storkey A . (2022). Meta-learning in neural networks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9): 5149–5169

[50]

Hosseini F S , Choubin B , Mosavi A , Nabipour N , Shamshirband S , Darabi H , Haghighi A T . (2020). Flash-flood hazard assessment using ensembles and Bayesian-based machine learning models: application of the simulated annealing feature selection method. Science of the Total Environment, 711: 135161

[51]

Hou F , Liu S , Yin W X , Gan L L , Pang H T , Lv J Q , Liu Y , Wang H C . (2024). Machine learning for high-precision simulation of dissolved organic matter in sewer: overcoming data restrictions with generative adversarial networks. Science of the Total Environment, 947: 174469

[52]

Huang F , Qian B , Ochoa C G . (2023). Long-term river water temperature reconstruction and investigation: a case study of the Dongting Lake Basin, China. Journal of Hydrology, 616: 128857

[53]

Huang S , Xia J , Wang Y L , Lei J R , Wang G S . (2024). Water quality prediction based on sparse dataset using enhanced machine learning. Environmental Science and Ecotechnology, 20: 100402

[54]

Jablonka K M , Ongari D , Moosavi S M , Smit B . (2020). Big-data science in porous materials: materials genomics and machine learning. Chemical Reviews, 120(16): 8066–8129

[55]

Jain A KChandrasekaran B (1982). Dimensionality and sample size considerations in pattern recognition practice. In: Krishnaiah P R, Kanal L N, eds. Handbook of Statistics, Volume 2: Classification, Pattern Recognition and Reduction of Dimensionality. Amsterdam: Elsevier, 835–855

[56]

Jayaprakash V , You J B , Kanike C , Liu J F , McCallum C , Zhang X H . (2024). Determination of trace organic contaminant concentration via machine classification of surface-enhanced Raman spectra. Environmental Science & Technology, 58(35): 15619–15628

[57]

Jiang Y Y , Song Y Y , Liu J L , Liu H S , Zang X Z , Ji Z L . (2025). Machine learning assisted precise prediction of algae bloom in large-scale water diversion engineering. Desalination, 610: 118880

[58]

Kangi G , Dondeyne S , Kleinschroth F , Van Orshoven J . (2024). Variation of the Omo Delta between 1990 and 2018: what remote sensing data reveal and models explain. Land Degradation & Development, 35(2): 867–883

[59]

Karimzadeh M , Meybodi M A , Shams M , Goli-Malekabadi Z . (2025). Microfluidic droplet detection using synthetic data generated by a Generative Adversarial Network. Engineering Applications of Artificial Intelligence, 156: 111032

[60]

Karniadakis G E , Kevrekidis I G , Lu L , Perdikaris P , Wang S F , Yang L . (2021). Physics-informed machine learning. Nature Reviews Physics, 3(6): 422–440

[61]

Khalid S I , Massaad E , Roy J M , Thomson K , Mirpuri P , Kiapour A , Shin J H . (2025). An appraisal of the quality of development and reporting of predictive models in neurosurgery: a systematic review. Neurosurgery, 96(2): 269–275

[62]

Koch J , Kim H , Tirado-Conde J , Hansen B , Møller I , Thorling L , Troldborg L , Voutchkova D , Højberg A L . (2024). Modeling groundwater redox conditions at national scale through integration of sediment color and water chemistry in a machine learning framework. Science of the Total Environment, 947: 174533

[63]

Kokol PKokol MZagoranski S (2022). Machine learning on small size samples: a synthetic knowledge synthesis. Science Progress, 105(1): 368504211029777

[64]

Kozyatnyk I , Yacout D M M , Van Caneghem J , Jansson S . (2020). Comparative environmental assessment of end-of-life carbonaceous water treatment adsorbents. Bioresource Technology, 302: 122866

[65]

Lai L , Liu Y C , Zhang Y C , Cao Z , Yin Y P , Chen X , Jin J L , Wu S M . (2024). Long-term spatiotemporal mapping in lacustrine environment by remote sensing: review with case study, challenges, and future directions. Water Research, 267: 122457

[66]

Li B W , Liu L , Ma R Y , Guo L F , Jiang J W , Li K X , Li X J . (2024a). Siamese based few-shot learning lightweight transformer model for coagulant and disinfectant dosage simultaneous regulation. Chemical Engineering Journal, 499: 156025

[67]

Li D C , Wen I H . (2014). A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing, 143: 222–230

[68]

Li F FFergus RPerona P (2003). A Bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings of the 9th IEEE International Conference on Computer Vision. Nice: IEEE, 1134–1141

[69]

Li S J , Song K S , Wang S , Liu G , Wen Z D , Shang Y X , Lyu L , Chen F F , Xu S Q , Tao H . et al. (2021). Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm. Science of the Total Environment, 778: 146271

[70]

Li Z L , Liu H X , Zhang C , Fu G T . (2023). Generative adversarial networks for detecting contamination events in water distribution systems using multi-parameter, multi-site water quality monitoring. Environmental Science and Ecotechnology, 14: 100231

[71]

Li Z L , Liu H X , Zhang C , Fu G T . (2024b). Gated graph neural networks for identifying contamination sources in water distribution systems. Journal of Environmental Management, 351: 119806

[72]

Li Z L , Liu H X , Zhang C , Fu G T . (2024c). Real-time water quality prediction in water distribution networks using graph neural networks with sparse monitoring data. Water Research, 250: 121018

[73]

Liang Y C , Ding F Y , Liu L , Yin F , Hao M M , Kang T T , Zhao C P , Wang Z T , Jiang D . (2025). Monitoring water quality parameters in urban rivers using multi-source data and machine learning approach. Journal of Hydrology, 648: 132394

[74]

Liao Z L , Wang X , Tian W C , Xie W Y . (2025). Enhancing surface water quality prediction in data-scarce sites using transfer learning and neural networks. Journal of Water Process Engineering, 75: 107923

[75]

Liao Z T , Lu Y , Wei D B , Ding R , Wu Y H , Gao H N , Liao A R , Tang Y C , Xu H W , Chen Z . et al. (2024). Tailor-made ammonia nitrogen risk management with machine learning models for aquatic environments in the Mainland of China. Journal of Hazardous Materials, 479: 135726

[76]

Liu L , Bao Z S , Liang Y , Deng H X , Zhang X L , Cao T , Zhou C C , Zhang Z Y . (2025). Unsupervised learning for lake underwater vegetation classification: constructing high-precision, large-scale aquatic ecological datasets. Science of the Total Environment, 958: 177895

[77]

Liu W J , Chen J W , Wang H B , Fu Z Q , Peijnenburg W J G M , Hong H X . (2024). Perspectives on advancing multimodal learning in environmental science and engineering studies. Environmental Science & Technology, 58(38): 16690–16703

[78]

Liu X , Lu D W , Zhang A Q , Liu Q , Jiang G B . (2022). Data-driven machine learning in environmental pollution: gains and problems. Environmental Science & Technology, 56(4): 2124–2133

[79]

Lv J Q , Yin W X , Xu J M , Cheng H Y , Li Z L , Yang J X , Wang A J , Wang H C . (2025). Augmented machine learning for sewage quality assessment with limited data. Environmental Science and Ecotechnology, 23: 100512

[80]

Ly Q V , Tong N A , Lee B M , Nguyen M H , Trung H T , Le Nguyen P , Hoang T H T , Hwang Y , Hur J . (2023). Improving algal bloom detection using spectroscopic analysis and machine learning: a case study in a large artificial reservoir, South Korea. Science of the Total Environment, 901: 166467

[81]

Lyu H , Xu Z M , Zhong J , Gao W H , Liu J X , Duan M . (2024). Machine learning-driven prediction of phosphorus adsorption capacity of biochar: insights for adsorbent design and process optimization. Journal of Environmental Management, 369: 122405

[82]

Ma C M , Jiao H Y , Hao Y H , Yeh T C J , Zhu J F , Hao H Q , Lu J H , Dong J K . (2025). Simulation of spring discharge using deep learning, considering the spatiotemporal variability of precipitation. Water Resources Research, 61(4): e2024WR037449

[83]

Mangkhaseum S , Bhattarai Y , Duwal S , Hanazawa A . (2024). Flood susceptibility mapping leveraging open-source remote-sensing data and machine learning approaches in Nam Ngum River Basin (NNRB), Lao PDR. Geomatics, Natural Hazards and Risk, 15(1): 2357650

[84]

Marsland S (2011). Machine Learning: An Algorithmic Perspective. New York: Chapman and Hall/CRC

[85]

Miao D H , Gu W Q , Li W H , Liu J , Hu W T , Feng J P , Shao D G . (2024). A research on multi-index intelligent integrated prediction model of catchment pollutant load under data scarcity. Water, 16(8): 1132

[86]

Minh D , Wang H X , Li Y F , Nguyen T N . (2022). Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55(5): 3503–3568

[87]

Mu T WDuan F YNing B KZhou BLiu J YHuang M H (2025). ST-GPINN: a spatio-temporal graph physics-informed neural network for enhanced water quality prediction in water distribution systems. npj Clean Water, 8(1): 74

[88]

Mugume S N , Murungi J , Nyenje P M , Sempewo J I , Okedi J , Sörensen J . (2024). Development and application of a hybrid artificial neural network model for simulating future stream flows in catchments with limited in situ observed data. Journal of Hydroinformatics, 26(8): 1944–1969

[89]

Nguyen A D , Vu V H , Hoang D V , Nguyen T D , Nguyen K , Nguyen P L , Ji Y S . (2023). Attentional ensemble model for accurate discharge and water level prediction with training data enhancement. Engineering Applications of Artificial Intelligence, 126: 107073

[90]

Pratap A RAnnamalai S (2025). Sensor technologies for environmental data collection. In: Suresh A, Devi T, Deepa N, Bashir A K, eds. Environmental Monitoring Using Artificial Intelligence. Beverly: Scrivener Publishing, 133–166

[91]

Pu F L , Ding C J , Chao Z Y , Yu Y , Xu X . (2019). Water-quality classification of inland lakes using Landsat8 images by convolutional neural networks. Remote Sensing, 11(14): 1674

[92]

Qian L X , Hu W , Zhao Y , Hong M , Fan L L . (2025). A coupled model of nonlinear dynamical system and deep learning for multi-step ahead daily runoff prediction for data scarce regions. Journal of Hydrology, 653: 132640

[93]

Rajapaksha P , Crespi N . (2024). Explainable attention pruning: a metalearning-based approach. IEEE Transactions on Artificial Intelligence, 5(6): 2505–2516

[94]

Rathnayake N , Rathnayake U , Chathuranika I , Dang T L , Hoshino Y . (2023). Cascaded-ANFIS to simulate nonlinear rainfall–runoff relationship. Applied Soft Computing, 147: 110722

[95]

Raudys S J , Jain A K . (1991). Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3): 252–264

[96]

Redondo-Tilano S A , Boucher M A , Lacey J . (2025). Emerging strategies for addressing flood-damage modeling issues: a review. International Journal of Disaster Risk Reduction, 116: 105058

[97]

Rosales-Estrella D L , Portilla-Cabrera C V , Guzmán-Alvis Á I , Enterline C , Díaz Gil A , Rueda M , Benavides-Martínez I F . (2025). Machine learning based mapping of physicochemical attributes in the Colombian Pacific seafloor. Earth Science Informatics, 18(3): 430

[98]

Sagan V , Peterson K T , Maimaitijiang M , Sidike P , Sloan J , Greeling B A , Maalouf S , Adams C . (2020). Monitoring inland water quality using remote sensing: potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Science Reviews, 205: 103187

[99]

Saleem R , Yuan B , Kurugollu F , Anjum A , Liu L . (2022). Explaining deep neural networks: a survey on the global interpretation methods. Neurocomputing, 513: 165–180

[100]

Sampurno J , Putra M G E , Faryuni I D , Adriat R . (2024). Flood impact assessment in remote areas using machine learning, SAR, and GIS: a case study of Ngabang District, Indonesia. Journal of Hydroinformatics, 26(11): 2928–2938

[101]

Sarker A R , Chowdhury A H , Haque T , Rahman M M , Meftaul I M , Jubayer F . (2025). From data to decision: leveraging machine learning and water quality index for groundwater quality evaluation. Sustainable Water Resources Management, 11(5): 102

[102]

Schafer J L , Graham J W . (2002). Missing data: our view of the state of the art. Psychological Methods, 7(2): 147–177

[103]

Schlender T , Viljanen M , Van Rijn J N , Mohr F , Peijnenburg W J G M , Hoos H H , Rorije E , Wong A . (2023). The bigger fish: a comparison of meta-learning QSAR models on low-resourced aquatic toxicity regression tasks. Environmental Science & Technology, 57(46): 17818–17830

[104]

Sheik A G , Malla M A , Srungavarapu C S , Patan A K , Kumari S , Bux F . (2024). Prediction of wastewater quality parameters using adaptive and machine learning models: a South African case study. Journal of Water Process Engineering, 67: 106185

[105]

Sivakumar J , Ramamurthy K , Radhakrishnan M , Won D . (2022). Synthetic sampling from small datasets: a modified mega-trend diffusion approach using k-nearest neighbors. Knowledge-Based Systems, 236: 107687

[106]

Sivakumar J , Ramamurthy K , Radhakrishnan M , Won D . (2023). GenerativeMTD: a deep synthetic data generation framework for small datasets. Knowledge-Based Systems, 280: 110956

[107]

Soonthornrangsan J T , Bakker M , Vossepoel F C . (2025). Linked data-driven, physics-based modeling of pumping-induced subsidence with application to Bangkok, Thailand. Groundwater, 63(2): 145–159

[108]

Sun Z J , Li J Z , Meng J , Li J L . (2025). Small-data-trained model for predicting nitrate accumulation in one-stage partial nitritation-anammox processes controlled by oxygen supply rate. Water Research, 269: 122798

[109]

Tian X , Beén F , Sun Y Q , Van Thienen P , Bäuerlein P S . (2023). Identification of polymers with a small data set of mid-infrared spectra: a comparison between machine learning and deep learning models. Environmental Science & Technology Letters, 10(11): 1030–1035

[110]

Todman L C , Bush A , Hood A S C . (2023). ‘Small data’ for big insights in ecology. Trends in Ecology & Evolution, 38(7): 615–622

[111]

Vymazal J , Zhao Y Q , Mander Ü . (2021). Recent research challenges in constructed wetlands for wastewater treatment: a review. Ecological Engineering, 169: 106318

[112]

Wan H , Xiang L , Cai Y P , Xie Y L , Xu R . (2025). Temporal and spatial feature extraction using graph neural networks for multi-point water quality prediction in river network areas. Water Research, 281: 123561

[113]

Wang L , Han M , Li X J , Zhang N , Cheng H D . (2021). Review of classification methods on unbalanced data sets. IEEE Access, 9: 64606–64628

[114]

Wang L , Shan K , Yi Y , Yang H , Zhang Y Y , Xie M J , Zhou Q C , Shang M S . (2024a). Employing hybrid deep learning for near-real-time forecasts of sensor-based algal parameters in a Microcystis bloom-dominated lake. Science of the Total Environment, 922: 171009

[115]

Wang S N , Gao M , Wu H , Luo F J , Jiang F , Tao L . (2024b). Many-to-many: domain adaptation for water quality prediction. Applied Soft Computing, 167: 112381

[116]

Wang X N , Chang J B , Jin H , Zhao Z F , Zhu X P , Cai W J . (2024c). Research on annual runoff prediction model based on adaptive particle swarm optimization–long short-term memory with coupled variational mode decomposition and spectral clustering reconstruction. Water, 16(8): 1179

[117]

Wang Y , Dong J C , Zhou Y C , Cheng Y H , Zhao X L , Peijnenburg W J G M , Vijver M G , Leung K M Y , Fan W H , Wu F C . (2025). Addressing the data scarcity problem in ecotoxicology via small data machine learning methods. Environmental Science & Technology, 59(12): 5867–5871

[118]

Wang Z T , Yue C , Wang J P . (2024d). An optimization framework with dimensionality reduction using Markov Chain Monte Carlo and genetic algorithms for groundwater potential assessment. Applied Soft Computing, 164: 111991

[119]

Wu A Q , Li K L , Song Z Y , Lou X H , Hu P F , Yang W J , Wang R F . (2025a). Deep learning for sustainable aquaculture: opportunities and challenges. Sustainability, 17(11): 5084

[120]

Wu Y D , Xian B , Xiang X W , Fang F , Chu F H , Deng X K , Hu Q , Sun X Q , Tang W , Bao S P . et al. (2025b). Identification of key feature variables and prediction of harmful algal blooms in a water diversion lake based on interpretable machine learning. Environmental Research, 276: 121491

[121]

Xiao L Z , Liu H Y , Liu X L , Zhou H R , Zhang Y , Liu B W , Wu J , Lin J . (2025). Discovering key factors determining perovskite bandgap under data scarcity inspired by knowledge distillation. Journal of Colloid and Interface Science, 695: 137827

[122]

Xu H L , Yang X , Hu Y H , Wang D Q , Liang Z Y , Mu H , Wang Y Y , Shi L , Gao H Q , Song D Q . et al. (2024). Trusted artificial intelligence for environmental assessments: an explainable high-precision model with multi-source big data. Environmental Science and Ecotechnology, 22: 100479

[123]

Xu P CJi X BLi M JLu W C (2023). Small data machine learning in materials science. npj Computational Materials, 9(1): 42

[124]

Xu S Y , Peng Z W , Zheng Q S . (2025). Research on data-driven prediction of inrush probability in coal mines under the mechanism of feature reconstruction in information interconnectivity. Water, 17(6): 843

[125]

Xue C , Cannizzaro J P , Hu C M , Barnes B B , Xie Y Y , Qi L , Armstrong C , Chen Z Q , Jones P R . (2025). Long-term changes of chlorophyll-a in Lake Okeechobee: combining the strengths of in situ observations, multi-sensor remote sensing, and machine learning. IEEE Transactions on Geoscience and Remote Sensing, 63: 4206415

[126]

Yalezo N , Musee N . (2023). Meta-analysis of engineered nanoparticles dynamic aggregation in freshwater-like systems using machine learning techniques. Journal of Environmental Management, 337: 117739

[127]

Yan H N , Zheng Q , Zeng L Z . (2024). Conditional generative adversarial networks for groundwater contamination characterization and source identification. Journal of Hydrology, 632: 130900

[128]

Yang D , Peng X , Jiang C , Wu X L , Ding S X , Zhong W M . (2024a). Transferable deep slow feature network with target feature attention for few-shot time-series prediction. IEEE Transactions on Industrial Informatics, 20(5): 7292–7302

[129]

Yang F , Xiong X . (2024). Carbon emissions, wastewater treatment and aquatic ecosystems. Science of the Total Environment, 921: 171138

[130]

Yang H J , Chen C Q , Xue X H . (2025). Coupled convolutional neural network with long short-term memory network for predicting lake water temperature. Journal of Hydrology, 655: 132878

[131]

Yang M H , Yang Q L , Shao J M , Wang G Q , Zhang W . (2023a). A new few-shot learning model for runoff prediction: demonstration in two data scarce regions. Environmental Modelling & Software, 162: 105659

[132]

Yang W L , Fu B L , Li S Z , Lao Z N , Deng T F , He W , He H C , Chen Z K . (2023b). Monitoring multi-water quality of internationally important karst wetland through deep learning, multi-sensor and multi-platform remote sensing images: a case study of Guilin, China. Ecological Indicators, 154: 110755

[133]

Yang Y , Shan C , Pan B C . (2024b). Machine learning modeling of fluorescence spectral data for prediction of trace organic contaminant removal during UV/H2O2 treatment of wastewater. Water Research, 255: 121484

[134]

Yu Y , Zhao S X , Han L T , Peng L , Xu Y M , Tian Q J , Chen S , Yang Z G , Li Q D , Hu Z Q . (2025). Cycleift: a deep transfer learning model based on informer with cycle fine-tuning for water quality prediction. Stochastic Environmental Research and Risk Assessment, 39(7): 2873–2885

[135]

Yusoff N H M , Chew W J , Chong C H , Wan Y K . (2024). Artificial intelligence in color classification of 3D-printed enhanced adsorbent in textile wastewater. Journal of Water Process Engineering, 65: 105776

[136]

Zanoni M GMajone BBellin A (2022). A catchment-scale model of river water quality by Machine Learning. Science of the Total Environment, 838(Pt 3): 156377

[137]

Zevallos J , Chávarri-Velarde E , Gutierrez R R , Lavado-Casimiro W . (2025). Bayesian calibration of a 2D hydraulic model using a convolutional neural network emulator. Environmental Modelling & Software, 193: 106621

[138]

Zhang H R , Georgescu A B , Yerramilli S , Karpovich C , Apley D W , Olivetti E A , Rondinelli J M , Chen W . (2025a). Emerging microelectronic materials by design: navigating combinatorial design space with scarce and dispersed data. Accounts of Materials Research, 6(6): 730–741

[139]

Zhang X Y , Zhu Q X , Ke W , He Y L , Zhang M Q , Xu Y . (2025b). Regression loss-assisted conditional style generative adversarial network for virtual sample generation with small data in soft sensing. Engineering Applications of Artificial Intelligence, 147: 110306

[140]

Zhao Y B , Chen M , He J Y , Ma Y P . (2025). Monitoring water quality parameters using multi-source data-driven machine learning models. Engineering Applications of Computational Fluid Mechanics, 19(1): 2509658

[141]

Zhao Y H , Hu Z , Xie H J , Wu H M , Wang Y C , Xu H , Liang S , Zhang J . (2023). Size-dependent promotion of micro(nano)plastics on the horizontal gene transfer of antibiotic resistance genes in constructed wetlands. Water Research, 244: 120520

[142]

Zhao Y H , Zhao Q , Liu D X , Xie H J , Zhang J , Zheng Y , Xu X Y , Wu H M , Hu Z . (2024). Antibiotic resistomes and ecological risk elimination in field-scale constructed wetland revealed by integrated metagenomics and metatranscriptomics. Journal of Hazardous Materials, 480: 136045

[143]

Zheng ACasari A (2018). Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Sebastopol: O’Reilly Media, Inc.

[144]

Zheng X , Meng H Y , Zhao Z X , Liu X Y , Zhou L , Grieneisen M L , Zhang H , Zhan Y , Yang F M . (2025). Deep transfer learning for spatiotemporal mapping of PM2.5 nitrate across China: addressing small data challenges in environmental machine learning. Journal of Hazardous Materials, 492: 138206

[145]

Zhi W , Appling A P , Golden H E , Podgorski J , Li L . (2024). Deep learning for water quality. Nature Water, 2(3): 228–241

[146]

Zhong D , Zhang J N , Gan Y L , Ma W C , Zhou Z Y , Feng W N . (2025). Climate-responsive assessment of surface water quality in Songhua River using ensemble learning and multivariate analysis. Journal of Water Process Engineering, 77: 108342

[147]

Zhong S F , Zhang K , Bagheri M , Burken J G , Gu A , Li B K , Ma X M , Marrone B L , Ren Z J , Schrier J . et al. (2021). Machine learning: new ideas and tools in environmental science and engineering. Environmental Science & Technology, 55(19): 12741–12754

[148]

Zhou J L , Ji D X , Zhao J , Zhu S M , Peng Z Q , Lu G X , Ye Z Y . (2023a). Leveraging the feature distribution calibration and data augmentation for few-shot classification in fish counting. Computers and Electronics in Agriculture, 212: 108151

[149]

Zhou Q H , Chen X Y , Wang J L . (2025). Machine learning assisted material discovery: a small data approach. Accounts of Materials Research, 6(6): 685–694

[150]

Zhou Y C , Wang Y , Peijnenburg W , Vijver M G , Balraadjsing S , Fan W H . (2023b). Using machine learning to predict adverse effects of metallic nanomaterials to various aquatic organisms. Environmental Science & Technology, 57(46): 17786–17795

[151]

Zhu J J , Yang M Q , Ren Z J . (2023). Machine learning in environmental research: common pitfalls and best practices. Environmental Science & Technology, 57(46): 17671–17689

[152]

Zhu M Y , Wang J W , Yang X , Zhang Y , Zhang L Y , Ren H Q , Wu B , Ye L . (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2): 107–116

RIGHTS & PERMISSIONS

Higher Education Press 2026

PDF (4049KB)

Supplementary files

Supplementary materials

56

Accesses

0

Citation

Detail

Sections
Recommended

/