MicroGraphBERT: soil microbial gene sequence classification via fusing taxonomic hierarchies and DNABERTbased contextual embeddings
Han Yang , Di Wang , Wenjie Pan , Chaoying Jiang , Weichang Gao , Xiaoji Luo , Zugui Tu
Intelligence & Robotics ›› 2025, Vol. 5 ›› Issue (3) : 541 -61.
MicroGraphBERT: soil microbial gene sequence classification via fusing taxonomic hierarchies and DNABERTbased contextual embeddings
Soil microbial communities are crucial for essential ecosystem functions such as nitrogen cycling and organic matter decomposition. However, accurately classifying their gene sequences remains challenging due to overlooked taxonomic hierarchies, environmental variability, and insufficient structural dynamics. Current methods predominantly focus on intra-sequence nucleotide features while neglecting the community’s hierarchical taxonomy. To address these gaps, we analyzed soil samples collected from the loess regions of Guizhou and investigated dynamic changes in microbial community composition across plant growth stages. We propose MicroGraphBERT, a deep learning framework synergizing DNABERT’s context-aware embeddings with taxonomy-aware priors via graph attention network to enable joint modeling of sequence and ecological features for microbial classification. Trained on high-throughput sequencing data from the Guizhou loess regions, MicroGraphBERT integrates nucleotide-level contextual semantics from DNABERT and cross-species relational learning with graph attention network to capture both sequence features and taxonomic hierarchies. This approach identifies complex microbial patterns under varying soil conditions, achieving a classification accuracy of 98.72%. Our work advances precision microbiome analytics by providing a scalable solution for soil health monitoring, intelligent fertilizer optimization, and sustainable agroecosystem management.
Soil microbiome / taxonomic hierarchies / DNABERT / graph attention network / high-throughput sequencing
/
| 〈 |
|
〉 |