Transfer learning enables predictions in soil-borne diseases

Lei Xin, Penghao Xie, Tao Wen, Guoqing Niu, Jun Yuan

PDF(4057 KB)
PDF(4057 KB)
Soil Ecology Letters ›› 2024, Vol. 6 ›› Issue (4) : 240258. DOI: 10.1007/s42832-024-0258-y
RESEARCH ARTICLE

Transfer learning enables predictions in soil-borne diseases

Author information +
History +

Highlights

● The Transformer model precisely predicts soil health status from high-throughput sequencing data.

● The SMOTE algorithm addresses data imbalance issues, improving model accuracy.

● Transfer learning validates the model on small samples, strengthening its generalization capabilities.

Abstract

Inhibiting the occurrence of soil-borne diseases is considered as the most favorable approach for promoting sustainable agricultural development. Constructing soil disease prediction models can serve precision agriculture. However, the analysis results of the meta-framework often contradict each other, causing inconsistency in the important features of machine learning results. Therefore, it is necessary to compare the classification accuracy of various machine learning models and further optimize the features of the models to enhance their classification accuracy. Here, we conducted a comparison of eight common machine learning algorithms (XGBoost, CatBoost, Decision Tree, LGBM, Naïve Byes, Perceptron, Logistic, and Random Forest) at the levels of family, genus, and class. The important features of the model were extracted based on the differences in model accuracy and important features, followed by an interpretable analysis of these important features using feature importance. Subsequently, the data underwent resampling using the SMOTE algorithm, and the results show that the SMOTE-Transformer model performs well, surpassing the training results of the voting and stacking strategies, with an accuracy reaching 90%. We have also deployed the SMOTE-Transformer model on sequencing data, which has an accuracy of over 80%. The construction of SMOTE-Transformer model provides a new idea for soil microbial data analysis by greatly improving the accuracy and robustness of soil microbial data processing tools.

Graphical abstract

Keywords

soil disease / feature importance / heterogeneous integration strategy / transfer learning

Cite this article

Download citation ▾
Lei Xin, Penghao Xie, Tao Wen, Guoqing Niu, Jun Yuan. Transfer learning enables predictions in soil-borne diseases. Soil Ecology Letters, 2024, 6(4): 240258 https://doi.org/10.1007/s42832-024-0258-y

References

[1]
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., 2006. Greedy layer-wise training of deep networks. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. British Columbia: MIT Press, 153–160.
[2]
Breiman, L., 1996. Bagging predictors. Machine Learning24, 123–140.
[3]
Chang, H.X., Haudenshield, J.S., Bowen, C.R., Hartman, G.L., 2017. Metagenome-wide association study and machine learning prediction of bulk soil microbiome and crop productivity. Frontiers in Microbiology8, 519.
[4]
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research16, 321–357.
CrossRef Google scholar
[5]
Clauwaert, J., McVey, Z., Gupta, R., Menschaert, G., 2023. TIS Transformer: remapping the human proteome using deep learning. NAR Genomics and Bioinformatics5, lqad021.
CrossRef Google scholar
[6]
Del Vento, D., Fanfarillo, A., 2019. Traps, pitfalls and misconceptions of machine learning applied to scientific disciplines. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning). Chicago: ACM, 75.
[7]
DeLucia, E.H., Hamilton, J.G., Naidu, S.L., Thomas, R.B., Andrews, J.A., Finzi, A., Lavine, M., Matamala, R., Mohan, J.E., Hendrey, G.R., Schlesinger, W.H., 1999. Net primary production of a forest ecosystem with experimental CO2 enrichment. Science284, 1177–1179.
CrossRef Google scholar
[8]
Denny, Y.R., Permata, E., Assaat, L.D., 2022. Classification of diseases of banana plant fusarium wilted banana leaf using support vector machine. Gravity: Jurnal Ilmiah Penelitian dan Pembelajaran Fisika8, 57–69.
[9]
Fung, D.L.X., Li, X., Leung, C.K., Hu, P.Z., 2023. A self-knowledge distillation-driven CNN-LSTM model for predicting disease outcomes using longitudinal microbiome data. Bioinformatics Advances3, vbad059.
CrossRef Google scholar
[10]
Gao, Y., Cui, Y., 2020. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nature Communications11, 5131.
CrossRef Google scholar
[11]
Gentile, C.L., Weir, T.L., 2018. The gut microbiota at the intersection of diet and human health. Science362, 776–780.
CrossRef Google scholar
[12]
Gordon, T.R., 2017. Fusarium oxysporum and the Fusarium wilt syndrome. Annual Review of Phytopathology55, 23–39.
CrossRef Google scholar
[13]
Guo, H., Wang, T., Louie, P.K.K., 2004. Source apportionment of ambient non-methane hydrocarbons in Hong Kong: application of a principal component analysis/absolute principal component scores (PCA/APCS) receptor model. Environmental Pollution129, 489–498.
CrossRef Google scholar
[14]
Harikrishnan, R., del Río, L.E., 2008. A logistic regression model for predicting risk of white mold incidence on dry bean in North Dakota. Plant Disease92, 42–46.
CrossRef Google scholar
[15]
Hayward, A.C., 1991. Biology and epidemiology of bacterial wilt caused by Pseudomonas solanacearum. Annual Review of Phytopathology29, 65–87.
CrossRef Google scholar
[16]
Hu, C., Qi, Y.C., 2013. Long-term effective microorganisms application promote growth and increase yields and nutrition of wheat in China. European Journal of Agronomy46, 63–67.
CrossRef Google scholar
[17]
Ioannidis, J.P.A., 2016. The mass production of redundant, misleading, and conflicted systematic reviews and meta‐analyses. The Milbank Quarterly94, 485–514.
CrossRef Google scholar
[18]
Jansson, J.K., Hofmockel, K.S., 2020. Soil microbiomes and climate change. Nature Reviews Microbiology18, 35–46.
CrossRef Google scholar
[19]
Jiang, G.F., Zhang, J.X., Zhang, Y.Z., Yang, X.R., Li, T.T., Wang, N.Q., Chen, X.J., Zhao, F.J., Wei, Z., Xu, Y.C., Shen, Q.R., Xue, W., 2023. DCiPatho: deep cross-fusion networks for genome scale identification of pathogens. Briefings in Bioinformatics24, bbad194.
CrossRef Google scholar
[20]
Li, J.G., Ren, G.D., Jia, Z.J., Dong, Y.H., 2014. Composition and activity of rhizosphere microbial communities associated with healthy and diseased greenhouse tomatoes. Plant and Soil380, 337–347.
CrossRef Google scholar
[21]
Li, Q.L., Zhu, Y.H., Shangguan, W., Wang, X.Z., Li, L., Yu, F.H., 2022. An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma409, 115651.
CrossRef Google scholar
[22]
Liu, J.W., Kang, H., Tao, W.D., Li, H.Y., He, D., Ma, L.X., Tang, H.J. Wu, S.Q., Yang, K.X., Li, X.X., 2023. A spatial distribution–Principal component analysis (SD-PCA) model to assess pollution of heavy metals in soil. Science of the Total Environment859, 160112.
CrossRef Google scholar
[23]
Nicholson, J.K., Wilson, I.D., 2003. Understanding ‘global’ systems biology: metabonomics and the continuum of metabolism. Nature Reviews Drug Discovery2, 668–676.
CrossRef Google scholar
[24]
Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H., 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining10, 36.
CrossRef Google scholar
[25]
Pavlyshenko, B., 2018. Using stacking approaches for machine learning models. In: Proceedings of 2018 IEEE Second International Conference on Data Stream Mining & Processing. Lviv: IEEE, 255–258.
[26]
Penesyan, A., Kjelleberg, S., Egan, S., 2010. Development of novel drugs from marine surface associated microorganisms. Marine Drugs8, 438–459.
CrossRef Google scholar
[27]
Qiao, Y.Y., 2021. Screening of microbial indexes for soil health assessment in wheat area of Zhejiang province. Master Degree Thesis. Northwest A&F University, Yangling.
[28]
Schapire, R.E., 2003. The boosting approach to machine learning: an overview. In: Denison, D.D., Hansen, M.H., Holmes, C.C., Mallick, B., Yu, B., eds. Nonlinear Estimation and Classification. New York: Springer, 149–171.
[29]
Schulz-Trieglaff, O., Machtejevas, E., Reinert, K., Schlüter, H., Thiemann, J., Unger, K., 2009. Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments. BioData Mining2, 4.
CrossRef Google scholar
[30]
Sherstinsky, A., 2020. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena404, 132306.
CrossRef Google scholar
[31]
Sokol, N.W., Slessarev, E., Marschmann, G.L., Nicolas, A., Blazewicz, S.J., Brodie, E.L., Firestone, M.K., Foley, M.M., Hestrin, R., Hungate, B.A., Koch, B.J., Stone, B.W., Sullivan, M.B., Zablocki, O., Pett-Ridge, J., 2022. Life and death in the soil microbiome: how ecological processes influence biogeochemistry. Nature Reviews Microbiology20, 415–430.
CrossRef Google scholar
[32]
Theodoris, C.V., Xiao, L., Chopra, A., Chaffin, M.D., Al Sayed, Z.R., Hill, M.C., Mantineo, H., Brydon, E.M., Zeng, Z.X., Liu, X.S., Ellinor, P.T., 2023. Transfer learning enables predictions in network biology. Nature618, 616–624.
CrossRef Google scholar
[33]
Trivedi, P., Delgado-Baquerizo, M., Trivedi, C., Hamonts, K., Anderson, I.C., Singh, B.K., 2017. Keystone microbial taxa regulate the invasion of a fungal pathogen in agro-ecosystems. Soil Biology and Biochemistry111, 10–14.
CrossRef Google scholar
[34]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I., 2017. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 6000–6010.
[35]
Wen, T., Ding, Z.X., Thomashow, L.S., Hale, L., Yang, S.D., Xie, P.H., Liu, X.Y., Wang, H.Q., Shen, Q.R., Yuan, J., 2023a. Deciphering the mechanism of fungal pathogen-induced disease-suppressive soil. New Phytologist238, 2634–2650.
CrossRef Google scholar
[36]
Wen, T., Niu, G.Q., Chen, T., Shen, Q.R., Yuan, J., Liu, Y.X., 2023b. The best practice for microbiome analysis using R. Protein & Cell14, 713–725.
[37]
Wen, T., Xie, P.H., Penton, C.R., Hale, L., Thomashow, L.S., Yang, S.D., Ding, Z.X., Su, Y.Q., Yuan, J., Shen, Q.R., 2022. Specific metabolites drive the deterministic assembly of diseased rhizosphere microbiome through weakening microbial degradation of autotoxin. Microbiome10, 177.
CrossRef Google scholar
[38]
Wheeler, T., Von Braun, J., 2013. Climate change impacts on global food security. Science341, 508–513.
CrossRef Google scholar
[39]
Ye, X.F., Li, Z.K., Luo, X., Wang, W.H., Li, Y.K., Li, R., Zhang, B., Qiao, Y., Zhou, J., Fan, J.Q., Wang, H., Huang, Y., Cao, H., Cui, Z.L., Zhang, R.F., 2020. A predatory myxobacterium controls cucumber Fusarium wilt by regulating the soil microbial community. Microbiome8, 49.
CrossRef Google scholar
[40]
Yuan, J., Wen, T., Zhang, H., Zhao, M.L., Penton, C.R., Thomashow, L.S., Shen, Q.R., 2020. Predicting disease occurrence with high accuracy based on soil macroecological patterns of Fusarium wilt. The ISME Journal14, 2936–2950.
CrossRef Google scholar
[41]
Zhang, H., Cheng, S.Q., Li, H.F., Fu, K., Xu, Y., 2020. Groundwater pollution source identification and apportionment using PMF and PCA-APCA-MLR receptor models in a typical mixed land-use area in Southwestern China. Science of the Total Environment741, 140383.
CrossRef Google scholar

Acknowledgements

This study was financially supported by Natural Science Foundation of China (Grant No. 42322708), Natural Science Foundation of Jiangsu Province (Grant No. BK20211577), the Jiangsu Agricultural Science and Technology Innovation Fund [Grant No. CX (23) 3112]. J.Y.was supported by Qing Lan Project of Jiangsu Province.

Software availability

The code has been uploaded to the website of GitHub Packages.

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

RIGHTS & PERMISSIONS

2024 Higher Education Press
AI Summary AI Mindmap
PDF(4057 KB)

Accesses

Citations

Detail

Sections
Recommended

/