Transfer learning enables predictions in soil-borne diseases

Lei Xin , Penghao Xie , Tao Wen , Guoqing Niu , Jun Yuan

Soil Ecology Letters ›› 2024, Vol. 6 ›› Issue (4) : 240258

PDF (4057KB)
Soil Ecology Letters ›› 2024, Vol. 6 ›› Issue (4) : 240258 DOI: 10.1007/s42832-024-0258-y
RESEARCH ARTICLE

Transfer learning enables predictions in soil-borne diseases

Author information +
History +
PDF (4057KB)

Abstract

● The Transformer model precisely predicts soil health status from high-throughput sequencing data.

● The SMOTE algorithm addresses data imbalance issues, improving model accuracy.

● Transfer learning validates the model on small samples, strengthening its generalization capabilities.

Inhibiting the occurrence of soil-borne diseases is considered as the most favorable approach for promoting sustainable agricultural development. Constructing soil disease prediction models can serve precision agriculture. However, the analysis results of the meta-framework often contradict each other, causing inconsistency in the important features of machine learning results. Therefore, it is necessary to compare the classification accuracy of various machine learning models and further optimize the features of the models to enhance their classification accuracy. Here, we conducted a comparison of eight common machine learning algorithms (XGBoost, CatBoost, Decision Tree, LGBM, Naïve Byes, Perceptron, Logistic, and Random Forest) at the levels of family, genus, and class. The important features of the model were extracted based on the differences in model accuracy and important features, followed by an interpretable analysis of these important features using feature importance. Subsequently, the data underwent resampling using the SMOTE algorithm, and the results show that the SMOTE-Transformer model performs well, surpassing the training results of the voting and stacking strategies, with an accuracy reaching 90%. We have also deployed the SMOTE-Transformer model on sequencing data, which has an accuracy of over 80%. The construction of SMOTE-Transformer model provides a new idea for soil microbial data analysis by greatly improving the accuracy and robustness of soil microbial data processing tools.

Graphical abstract

Keywords

soil disease / feature importance / heterogeneous integration strategy / transfer learning

Cite this article

Download citation ▾
Lei Xin, Penghao Xie, Tao Wen, Guoqing Niu, Jun Yuan. Transfer learning enables predictions in soil-borne diseases. Soil Ecology Letters, 2024, 6(4): 240258 DOI:10.1007/s42832-024-0258-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2006. Greedy layer-wise training of deep networks. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. British Columbia: MIT Press, 153–160.

[2]

Breiman, L., 1996. Bagging predictors. Machine Learning24, 123–140.

[3]

Chang, H.X., Haudenshield, J.S., Bowen, C.R., Hartman, G.L., 2017. Metagenome-wide association study and machine learning prediction of bulk soil microbiome and crop productivity. Frontiers in Microbiology8, 519.

[4]

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research16, 321–357.

[5]

Clauwaert, J., McVey, Z., Gupta, R., Menschaert, G., 2023. TIS Transformer: remapping the human proteome using deep learning. NAR Genomics and Bioinformatics5, lqad021.

[6]

Del Vento, D., Fanfarillo, A., 2019. Traps, pitfalls and misconceptions of machine learning applied to scientific disciplines. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning). Chicago: ACM, 75.

[7]

DeLucia, E.H., Hamilton, J.G., Naidu, S.L., Thomas, R.B., Andrews, J.A., Finzi, A., Lavine, M., Matamala, R., Mohan, J.E., Hendrey, G.R., Schlesinger, W.H., 1999. Net primary production of a forest ecosystem with experimental CO2 enrichment. Science284, 1177–1179.

[8]

Denny, Y.R., Permata, E., Assaat, L.D., 2022. Classification of diseases of banana plant fusarium wilted banana leaf using support vector machine. Gravity: Jurnal Ilmiah Penelitian dan Pembelajaran Fisika8, 57–69.

[9]

Fung, D.L.X., Li, X., Leung, C.K., Hu, P.Z., 2023. A self-knowledge distillation-driven CNN-LSTM model for predicting disease outcomes using longitudinal microbiome data. Bioinformatics Advances3, vbad059.

[10]

Gao, Y., Cui, Y., 2020. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nature Communications11, 5131.

[11]

Gentile, C.L., Weir, T.L., 2018. The gut microbiota at the intersection of diet and human health. Science362, 776–780.

[12]

Gordon, T.R., 2017. Fusarium oxysporum and the Fusarium wilt syndrome. Annual Review of Phytopathology55, 23–39.

[13]

Guo, H., Wang, T., Louie, P.K.K., 2004. Source apportionment of ambient non-methane hydrocarbons in Hong Kong: application of a principal component analysis/absolute principal component scores (PCA/APCS) receptor model. Environmental Pollution129, 489–498.

[14]

Harikrishnan, R., del Río, L.E., 2008. A logistic regression model for predicting risk of white mold incidence on dry bean in North Dakota. Plant Disease92, 42–46.

[15]

Hayward, A.C., 1991. Biology and epidemiology of bacterial wilt caused by Pseudomonas solanacearum. Annual Review of Phytopathology29, 65–87.

[16]

Hu, C., Qi, Y.C., 2013. Long-term effective microorganisms application promote growth and increase yields and nutrition of wheat in China. European Journal of Agronomy46, 63–67.

[17]

Ioannidis, J.P.A., 2016. The mass production of redundant, misleading, and conflicted systematic reviews and meta‐analyses. The Milbank Quarterly94, 485–514.

[18]

Jansson, J.K., Hofmockel, K.S., 2020. Soil microbiomes and climate change. Nature Reviews Microbiology18, 35–46.

[19]

Jiang, G.F., Zhang, J.X., Zhang, Y.Z., Yang, X.R., Li, T.T., Wang, N.Q., Chen, X.J., Zhao, F.J., Wei, Z., Xu, Y.C., Shen, Q.R., Xue, W., 2023. DCiPatho: deep cross-fusion networks for genome scale identification of pathogens. Briefings in Bioinformatics24, bbad194.

[20]

Li, J.G., Ren, G.D., Jia, Z.J., Dong, Y.H., 2014. Composition and activity of rhizosphere microbial communities associated with healthy and diseased greenhouse tomatoes. Plant and Soil380, 337–347.

[21]

Li, Q.L., Zhu, Y.H., Shangguan, W., Wang, X.Z., Li, L., Yu, F.H., 2022. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma409, 115651.

[22]

Liu, J.W., Kang, H., Tao, W.D., Li, H.Y., He, D., Ma, L.X., Tang, H.J. Wu, S.Q., Yang, K.X., Li, X.X., 2023. A spatial distribution–Principal component analysis (SD-PCA) model to assess pollution of heavy metals in soil. Science of the Total Environment859, 160112.

[23]

Nicholson, J.K., Wilson, I.D., 2003. Understanding ‘global’ systems biology: metabonomics and the continuum of metabolism. Nature Reviews Drug Discovery2, 668–676.

[24]

Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H., 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining10, 36.

[25]

Pavlyshenko, B., 2018. Using stacking approaches for machine learning models. In: Proceedings of 2018 IEEE Second International Conference on Data Stream Mining & Processing. Lviv: IEEE, 255–258.

[26]

Penesyan, A., Kjelleberg, S., Egan, S., 2010. Development of novel drugs from marine surface associated microorganisms. Marine Drugs8, 438–459.

[27]

Qiao, Y.Y., 2021. Screening of microbial indexes for soil health assessment in wheat area of Zhejiang province. Master Degree Thesis. Northwest A&F University, Yangling.

[28]

Schapire, R.E., 2003. The boosting approach to machine learning: an overview. In: Denison, D.D., Hansen, M.H., Holmes, C.C., Mallick, B., Yu, B., eds. Nonlinear Estimation and Classification. New York: Springer, 149–171.

[29]

Schulz-Trieglaff, O., Machtejevas, E., Reinert, K., Schlüter, H., Thiemann, J., Unger, K., 2009. Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments. BioData Mining2, 4.

[30]

Sherstinsky, A., 2020. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena404, 132306.

[31]

Sokol, N.W., Slessarev, E., Marschmann, G.L., Nicolas, A., Blazewicz, S.J., Brodie, E.L., Firestone, M.K., Foley, M.M., Hestrin, R., Hungate, B.A., Koch, B.J., Stone, B.W., Sullivan, M.B., Zablocki, O., Pett-Ridge, J., 2022. Life and death in the soil microbiome: how ecological processes influence biogeochemistry. Nature Reviews Microbiology20, 415–430.

[32]

Theodoris, C.V., Xiao, L., Chopra, A., Chaffin, M.D., Al Sayed, Z.R., Hill, M.C., Mantineo, H., Brydon, E.M., Zeng, Z.X., Liu, X.S., Ellinor, P.T., 2023. Transfer learning enables predictions in network biology. Nature618, 616–624.

[33]

Trivedi, P., Delgado-Baquerizo, M., Trivedi, C., Hamonts, K., Anderson, I.C., Singh, B.K., 2017. Keystone microbial taxa regulate the invasion of a fungal pathogen in agro-ecosystems. Soil Biology and Biochemistry111, 10–14.

[34]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., 2017. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 6000–6010.

[35]

Wen, T., Ding, Z.X., Thomashow, L.S., Hale, L., Yang, S.D., Xie, P.H., Liu, X.Y., Wang, H.Q., Shen, Q.R., Yuan, J., 2023a. Deciphering the mechanism of fungal pathogen-induced disease-suppressive soil. New Phytologist238, 2634–2650.

[36]

Wen, T., Niu, G.Q., Chen, T., Shen, Q.R., Yuan, J., Liu, Y.X., 2023b. The best practice for microbiome analysis using R. Protein & Cell14, 713–725.

[37]

Wen, T., Xie, P.H., Penton, C.R., Hale, L., Thomashow, L.S., Yang, S.D., Ding, Z.X., Su, Y.Q., Yuan, J., Shen, Q.R., 2022. Specific metabolites drive the deterministic assembly of diseased rhizosphere microbiome through weakening microbial degradation of autotoxin. Microbiome10, 177.

[38]

Wheeler, T., Von Braun, J., 2013. Climate change impacts on global food security. Science341, 508–513.

[39]

Ye, X.F., Li, Z.K., Luo, X., Wang, W.H., Li, Y.K., Li, R., Zhang, B., Qiao, Y., Zhou, J., Fan, J.Q., Wang, H., Huang, Y., Cao, H., Cui, Z.L., Zhang, R.F., 2020. A predatory myxobacterium controls cucumber Fusarium wilt by regulating the soil microbial community. Microbiome8, 49.

[40]

Yuan, J., Wen, T., Zhang, H., Zhao, M.L., Penton, C.R., Thomashow, L.S., Shen, Q.R., 2020. Predicting disease occurrence with high accuracy based on soil macroecological patterns of Fusarium wilt. The ISME Journal14, 2936–2950.

[41]

Zhang, H., Cheng, S.Q., Li, H.F., Fu, K., Xu, Y., 2020. Groundwater pollution source identification and apportionment using PMF and PCA-APCA-MLR receptor models in a typical mixed land-use area in Southwestern China. Science of the Total Environment741, 140383.

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (4057KB)

767

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/