1 Introduction
Nitrogen (N) is a crucial nutrient for sustaining terrestrial ecosystem function, including plant productivity and soil carbon dynamics (
Pilon-Smits et al., 2009;
Yahaya et al., 2023). Anthropogenic N inputs have played an indispensable role in supporting population growth, driven largely by synthetic fertilizers that have boosted agricultural yields in recent decades (
Nasar et al., 2023). However, excessive N application not only causes soil acidification (
Steffen et al., 2015) but also leads to significant N loss from surface soils through nitrogenous compounds (e.g., N
2O), hydrological leaching, and erosional losses, triggering cascading impacts on both human and ecosystem health. Addressing the dual challenge of maintaining agricultural productivity while minimizing environmental consequences underscores the critical need to better understand soil N dynamics. Advancing our understanding of soil N dynamics is essential for designing sustainable agricultural intensification practices and science-based policies to mitigate N pollution.
The retention of soil inorganic N, such as ammonium and nitrate, is important for sustaining crop productivity but remains underappreciated in intensively managed agricultural systems. Most crops depend entirely on soil-derived inorganic N (
Oldroyd and Leyser, 2020) while leguminous crops utilize rhizobia symbionts to fix atmospheric N. Conventional farming practices involve massive fertilizer applications to improve yields, yet approximately 40% of applied N is assimilated by crops, with the remainder escaping into the environment (
Yin et al., 2021). Despite decades of research on fertilizer optimization and yield improvement (
Chen et al., 2011), the dynamics of soil N retention, especially for inorganic N, have been largely overlooked in contemporary research. Precise quantification of soil inorganic N dynamics (
Wei et al., 2019) could revolutionize the management of soil N reservoirs, enabling synergistic improvements in agronomic productivity and sustainable nutrient stewardship.
Nitrous oxide (N
2O) produced from nitrification and denitrification significantly contributes to global warming, since N
2O is a potent greenhouse gas with a global warming potential being about 300 times higher than that of carbon dioxide in the long run, and N
2O accounts for approximately 7% of total anthropogenic radiative forcing (
Butterbach-Bahl and Wolf, 2017;
Govindasamy et al., 2023). Beyond its climate impacts, N
2O synergizes with ammonia to exacerbate respiratory disease incidence through enhanced formation of fine particles (
Townsend et al., 2003). Furthermore, reactive N has been epidemiologically linked to elevated risks of certain cancers and adverse reproductive outcomes (
Ward et al., 2005). Despite recognizing these multidimensional threats, fundamental gaps persist in our ability to quantify and predict N
2O emissions, particularly at landscape to regional scales. Accurately capturing the dynamics of N
2O is important for early warning of climate change and public health protection.
Contemporary N cycling research has expanded across disciplines, from microbial ecology to satellite-based remote sensing, yet a critical challenge persists: how to reconcile microscale mechanistic insights with macroscale predictive frameworks (
Kuypers et al., 2018;
Van Damme et al., 2018;
Zhou et al., 2024). Bibliometric analysis reveals that N cycling studies over the past three decades fall into three thematic clusters (Fig. 1(a)), such as 1) soil N content centering on “Soil”, “Organic matter”, and “Total N”, 2) N transformations centering on “N cycling”, “Denitrification”, and “Mineralization”, as well as 3) organismal interactions centering on “Plant”, “Bacterium”, and “Biomass”. These clusters reflect persistent research priorities: quantifying soil N reservoirs, elucidating coupled N processes, and resolving biotic controls on N cycling. Notably, microscale studies dominate the literature (Fig. 1(b)), outnumbering macroscale studies by an order of magnitude. Although macroscale N publications have accelerated since 2015, they remain disproportionately lagging behind. One feasible approach to bridge this scale dichotomy and promote predictive capacity in macroscale N
2O emissions and soil inorganic N is to integrate microscale process understanding into ecosystem-level models.
Scaling up is a crucial step to reconcile microscale mechanistic insights with macroscale predictive frameworks. While microscale studies enhance mechanistic understanding, they often cannot account for spatial heterogeneity due to specific geographic and climatic conditions. The goal of scaling up is to integrate microscale data while accounting for microscale heterogeneities to improve predictions of N cycling at macroscales. For instance, accurately capturing the dynamics of N2O and soil N at macroscales requires integrating microbial traits into a universal framework. However, scaling up faces two major challenges: (i) the lack of a systematic research framework, and (ii) insufficient computational capacity.
2 A paradigm for studying N cycling at macroscales – a hierarchical framework combining environmental and microbial factors
The integration of microbial traits (microbial biomass, stoichiometric ratios in microbial biomass, and functional microorganisms) into empirical models provides a promising approach to enhance the accuracy of N cycling projections at macroscales (Fig. 2). Microorganisms directly participate in N transformation processes and play crucial roles across various biogeochemical cycles. Their sensitivity to environmental changes makes them key indicators for ecosystem functioning. Models that integrate interactions among microbial traits, climate factors, soil properties, and substrates can enhance the predictive accuracy of N cycling at macroscales. At microscales, our understanding of N transformations has advanced significantly from the perspectives of microbiological research. Studies have demonstrated that both bacteria and fungi mediate soil heterotrophic nitrification, with their relative abundances explaining substantial variation in denitrification rates (
Lang and Jagnow, 1986;
Brierley and Wood, 2001). Similarly, ammonia-oxidizing bacteria and archaea have been identified as key drivers of autotrophic nitrification (
Martens-Habbena et al., 2009). Mechanistic models that incorporate microbial traits and enzyme kinetics effectively simulate N transformations in soil incubations (
Yan et al., 2024). Nonetheless, directly extrapolating such mechanistic models to field conditions remains challenging due to the complex impacts of heterogeneous environmental conditions (
Colman and Schimel, 2013). Instead, refining empirical models through systematic incorporation of microbial traits, substrates, and climatic variables may provide a more robust framework for macroscale projections. Some studies focus solely on abiotic factors, for example, soil pH can explain 38%−41% of variability in soil N immobilization rates (
Wang et al., 2019), and mean annual temperature accounts for 23% of spatial variation in N turnover rates at continental scales (
Li et al., 2020b). Substrate quality and availability also influence N processes (
Baldock and Skjemstad, 2000), with substrates bound by clay surfaces retarding the decomposition by microorganisms (
Giardina and Ryan, 2000). Previous studies overlooked critical biological drivers. After integrating microbial traits, soil microorganisms can account for 19% of variation in soil mineralization rates (
Li et al., 2019), highlighting the essential role of microorganisms in modeling N processes.
A hierarchical driving paradigm integrating microbial traits and environmental factors has been proposed to improve prediction of N processes across large spatial scales by Li et al. (2019). This paradigm demonstrates that although climatic variables, soil properties, and substrate availability might directly influence N processes, their predominant influence occurs indirectly through changing soil microbial biomass. This unified and concise model serves as an effective scaling tool primarily based on microbial traits, capable of predicting the rates of N processes at the continental scale. Its advantages include: 1) incorporating microscale mechanistic understanding; 2) integrating critical microbial factors; and 3) maintaining structural simplicity for accurate, efficient predictions (
Li et al., 2019). Notwithstanding, this model is not suitable for predicting at small scales due to the specific environments. Subsequent work has successfully implemented this hierarchical driving paradigm to parameterize several N processes (
Li et al., 2021, 2022), demonstrating its utility for scaling microbial mechanisms to ecosystem-level predictions. Moreover, this hierarchical driving paradigm may be particularly valuable for modeling N
2O emissions and the dynamics of soil inorganic N when the relevant N processes are tightly coupled, because N
2O production and inorganic N transformation are closely related to several N processes. As illustrated in Fig. 2, simultaneous consideration of the interacting N processes is essential for accurate predictions. While
Li et al. (2020b) have characterized continental-scale soil N turnover patterns under steady-state assumptions, such approaches fail to capture transient dynamics of N
2O emissions and inorganic N dynamics. The hierarchical framework addresses this limitation by explicitly coupling N processes (Fig. 2), thereby enabling more robust projections of both N
2O emissions and soil inorganic N dynamics across spatial and temporal scales.
2.1 Hierarchical driving paradigm facilitates the scaling individual soil nitrogen process up
Effectively integrating N processes across different scales remains a fundamental challenge, despite significant advances in understanding N cycling at both microscales and macroscales. Three barriers hinder this integration. First, environmental heterogeneity complicates the direct extrapolation of microscale models to macroscales, since localized processes become obscured by landscape variability. Second, some microbial traits with hypersensitivity in microscale N processes, such as the activity of specific microbial taxa and/or enzymes, may not work at macroscales, because their influence often averages out at macroscales (
Li et al., 2022). Third, synthesizing diverse data, e.g., field experiments and model outputs, into a unified framework generally confronts methodological hurdles. Notwithstanding, emerging opportunities exist to address these challenges through the hierarchical driving paradigm with the advent of big data and artificial intelligence (
Niu et al., 2020;
Wang et al., 2023).
Current macroscale projections of N processes predominantly rely on empirical models incorporating abiotic drivers like temperature, pH, and organic matter content (
Cheng et al., 2017;
Dawes et al., 2017;
Liu et al., 2017;
Dai et al., 2020), but they often lack mechanistic grounding in microbial physiology (
Luo et al., 2014). Many macroscale models fail to integrate the cutting-edge mechanistic findings at microscales to characterize critical biotic controls in soil N processes (
Fujita et al., 2014). The limitations of experimental data and algorithmic frameworks impede this integrating progress for empirical models. Addressing these gaps requires systematic efforts to parameterize N processes using available data sets and advanced computational methods. The effective integration of microscale processes with macroscale predictions is achieved based on the hierarchical framework, with parameterization being a critical issue. Integrating the mechanisms (e.g., enzyme kinetics, substrate limitation) with big data provides a robust foundation for model parameterization. Parameters are factored out using a weighted mixed-effects model on the basis of training data sets, and their predictive performance is validated with independent data sets. This approach generates a universal parameter framework suitable for macroscale N process predictions. Incorporating microbial biomass within the hierarchical driving paradigm (Fig. 2) has already improved predictions for five N processes, viz. N mineralization (
Li et al., 2019), immobilization (
Li et al., 2021), nitrification (
Li et al., 2020a), denitrification (
Li et al., 2022), and anammox (
Yao et al., 2023). Nevertheless, some key N processes, such as biological N fixation, remain poorly represented in this framework, limiting our ability to forecast ecosystem-level N fluxes with accuracy.
2.2 Coupling soil N processes within hierarchical driving paradigm can better project N2O emissions
Soil N
2O originates from multiple N transformation processes, with nitrification and denitrification being the dominant sources in soils (
Braker and Conrad, 2011). Ammonium is oxidized to nitrate with N
2O as a byproduct during nitrification, and nitrate can be transformed into N
2O under anaerobic conditions during denitrification. N
2O generation from nitrification and denitrification is primarily carried out by soil microorganisms, e.g., ammonia-oxidizing microorganisms (AOM) are the rate-limiting step in nitrification (
Yao et al., 2011). The hierarchical driving paradigm offers a feasible framework by explicitly incorporating functional microbial groups in nitrification and denitrification that govern N
2O production pathways. Typically, nitrification and denitrification co-occur in soils, necessitating joint consideration of both N processes to accurately simulate soil N
2O emissions (
Butterbach-Bahl et al., 2013).
While microbial controls on N
2O production, e.g., community composition and enzyme kinetics, have been extensively characterized at microscales (
Baggs, 2011;
Hu et al., 2015), a knowledge gap persists in translating these mechanistic insights to macroscale predictions. The hierarchical driving paradigm addresses this challenge by incorporating functional microorganisms with environmental covariates, enabling cross-scale linkages between microbial processes and ecosystem N
2O fluxes. Current process-based models such as NOE and DAYCENT, though capable of simulating N
2O emissions (
Parton et al., 2001;
Hénault et al., 2005), lack explicit parameterizations of microbial traits, limiting their capacity to resolve spatiotemporal variability in N
2O emissions. A recent study has highlighted that the relative abundance of functional microorganisms involved in N
2O production over nitrification and denitrification serves as a key controlling factor for N
2O emissions (
Han et al., 2024). Future efforts should prioritize integrating functional microorganisms within the hierarchical driving paradigm to refine predictions of N
2O emissions under heterogeneous environmental conditions.
2.3 Coupling soil N processes within hierarchical driving paradigm can better project the dynamics of soil N
Soil N exists in multiple chemical forms, including organic N, ammonium, nitrate, and nitrite. Although plants can absorb organic N in low-molecular-weight under specific environmental conditions (
Näsholm et al., 1998,
2009;
McKane et al., 2002), in most cases inorganic N is more easily utilized by plants. Soil organic N is estimated at approximately 140 petagrams in the upper 1 m of soils (
Batjes, 2014), however, the spatiotemporal distribution and turnover of inorganic N in surface soils remain poorly understood. The knowledge gap fundamentally limits our ability to predict inorganic N uptake by plant roots and ecosystem-level N cycling efficiencies.
To model soil N dynamics at macroscales, researchers conventionally partition soil N into discrete pools, such as inorganic N, organic N, microbial N, and plant N (
Bai et al., 2013). However, such frameworks often oversimplify inter-pool transformation by employing the static transfer coefficients, introducing significant uncertainties in projection of N transformations across heterogeneous landscapes. Next generation empirical models that mechanistically couple N transformation processes, such as mineralization, nitrification, and denitrification, show promise for resolving these limitations, enabling spatially explicit predictions of inorganic N concentrations across regional to global scales. These advancements in coupling N processes can provide an important foundation to refine N cycling module in Earth system models.
3 Prospects on nitrogen studies at macroscales with the advent of new approaches
The emerging methods, bayesian, deep learning, convergent cross-mapping (CCM), and digital twin, may bring opportunities to promote the predictions of N cycling at macroscales (Fig. 3). Data synthesis is a powerful approach to reveal the patterns of target variables, such as N mineralization, through systematic data mining and integrative analysis (
Ghajarzadeh, 2025). Data synthesis is a distinct approach compared with conventional meta-analysis (
Zhang, 2018), because data synthesis can provide direct parameterization for models by extracting quantitative relationships from aggregated data sets. For instance,
Li et al. (2019) demonstrated that soil microbial biomass accounts for the majority of spatial variation in N mineralization rates through a comprehensive data synthesis. Their findings enabled a 19% improvement in model accuracy for predicting net N mineralization while soil microbial biomass was incorporated, a breakthrough that facilitated the integration of microbial traits into Earth System Models (
Huang et al., 2021). However, current data synthesis based on frequentist statistics faces several limitations. First, the application of uniform weighting schemes across heterogeneous landscapes fails to precisely match reality. For example, precipitation exhibits differential thresholds for N leaching in different regions, yet the synthesis applies constant weighting factors, bringing uncertainties in N loss estimates (
Vegas-Vilarrúbia et al., 2012). Second, methodological inconsistencies across studies bring systematic biases that remain to be reconciled. Atmospheric ammonia measurements exemplify this issue, where satellite-derived concentrations show significantly greater uncertainty than ground-based observations due to high spatiotemporal variability in ammonia emissions (
Nair and Yu, 2020). Bayesian statistical frameworks present a promising solution to these challenges by probabilistically weighting parameters according to their uncertainty (
Van de Schoot et al., 2021). This approach has demonstrated remarkable success in corresponding studies, e.g.,
Tao et al. (2020) achieved a 34% increase in
R2 and 28% reduction in RMSE in the prediction of soil organic carbon distributions using Bayesian methods. Therefore, Bayesian approaches may overcome some shortcomings in conventional data synthesis through dynamically adjusting parameter weights and explicitly quantifying uncertainty, particularly for scaling N processes from microscales to macroscales.
The Random Forest algorithm, a widely used machine learning approach, has demonstrated substantial improvements in predictive accuracy across ecological studies (
Roy and Larocque, 2012). For example, Random Forest models exhibited 43% greater accuracy than multivariate linear regression in predicting crop yield (wheat, maize, and potato) (
Jeong et al., 2016;
Basha et al., 2020). Random Forest model achieved a high predictive power with an
R2 of 0.80 and an RMSE of 0.08 in estimating the spatial distribution of total soil N by integrating physical predictors (original spectral bands and spectral indices) and environmental factors (
Zhang et al., 2019). The great performance in Random Forest stems from the algorithm’s ensemble structure, which constructs multiple decision trees to handle nonlinear relationships in the data set (
Breiman, 2001). Despite these advances, some shortcomings persist in applying Random Forest to N cycling research. The algorithm in Random Forest exhibits heightened susceptibility to overfitting with high-dimensional data sets or limited training data, since it may over-interpret stochastic noise as meaningful values, but it should be noted that high-dimensional data per se are not bound to overfitting (
Probst et al., 2019;
Simon et al., 2023). Furthermore, Random Forest is unable to actively extract complex data features, thus potentially overlooking complex hierarchical relationships within data sets. In contrast, deep learning architectures excel at handling latent patterns in high-dimensional data through multilayer neural networks that iteratively optimize feature representation (
LeCun et al., 2015). For instance, deep learning has outperformed conventional machine learning in ecological predictions, yielding superior accuracy for soil microbial-derived carbon stocks (
R2 = 0.76 vs. 0.67;
Hu et al., 2024) and global carbon storage (
R2 = 0.68 vs. 0.43;
Tao et al., 2023). Deep learning has successfully addressed diverse ecological issues including biodiversity assessment, species distribution predictions, and community development (
Chen et al., 2017;
Salamon et al., 2017;
Villon et al., 2018;
Christin et al., 2019). Recently, deep learning has also been applied to refine model construction, quantify the driving factors, and improve the resolution of spatiotemporal heterogeneity for carbon cycling (
Laffitte et al., 2025); however, it remains underutilized for N cycling research. In the coming years, the application of deep learning promises transformative advances in macroscale N research, particularly for predicting N
2O emissions and soil N transformations by capturing complex and/or nonlinear interactions across spatial and temporal scales.
Correlation analysis, a fundamental statistical toolkit, can quantify linear relationships in N cycling. This approach has been widely applied to examine the relationships between soil nitrate accumulation and fertilizer application rates, as well as the relationships between soil N
2O emissions and rainfall intensity in agricultural systems (
Sogbedji et al., 2000;
Zhang et al., 2023). However, conventional correlation methods present several limitations, e.g., they fail to capture complex and nonlinear interactions and are particularly inadequate for time series data where lagged biological responses occur. For instance, the relationship between microbial biomass changes and N mineralization rates immediately following rainfall events may obscure their true relationships, because microbial responses are often delayed following environmental perturbations (
Yu et al., 2022). More fundamentally, correlation analyses cannot pinpoint the causal directionality between variables. To address these issues, CCM, a novel approach grounded in dynamical systems theory, can overcome some limitations by detecting causal relationships in time-series data (
Sugihara et al., 2012). This method has proven valuable in ecological research, e.g.,
McGowan et al. (2017) employed CCM analysis to ascertain the drivers of algal blooms among multiple environmental factors. CCM has also been applied to address issues such as interspecies synchrony and biological adaptation to environmental change (
Matin and Bourque, 2015;
Kawatsu et al., 2020;
Wang et al., 2021). Despite these applications, CCM remains notably absent from macroscale N cycling research. CCM implementation could significantly advance our understanding of N processes, including disentangling complex drivers of inorganic N transformations and identifying time-lagged causal relationships in N cycling. Realizing this potential will require continued refinement of CCM methodologies and development of specialized statistical software tailored for N studies.
Regression analysis has been widely applied in N cycling research, with advanced implementations including generalized linear models, ridge regression, and generalized additive models. These approaches have proven particularly valuable for quantifying environmental controls on N transformations. For instance, linear mixed-effects models have been effectively employed to assess drivers of soil denitrification rates across diverse ecosystems (
Li et al., 2022). However, conventional regression methods face growing limitations in the era of big data, often struggling to capture the dynamic and/or nonlinear interactions characteristic in N cycling. Digital twin emerges as a transformative solution to these challenges, creating bidirectional linkages between physical ecosystems and their virtual counterparts to enable real-time simulation and projection (
Grieves and Vickers, 2017). Digital twin can real-timely monitor and simulate growth parameters at different growth stages for winter wheat, enabling intelligent decision-making systems to optimize crop management (
Skobelev et al., 2024). Currently, the application of digital twins in N cycling research remains nascent (
Pylianidis et al., 2021;
Purcell and Neubauer, 2023). Foreseeably, the integration of digital twins in N cycling research will revolutionize the field through: 1) transitioning from ‘static correlation analyses’ to ‘dynamic system simulation’ and 2) enabling real-time data assimilation and model updating. The advent of these new approaches will be particularly valuable for addressing longstanding challenges in predicting N
2O emissions and soil inorganic N dynamics.
4 Conclusions
In summary, a hierarchical framework integrating environmental factors and microbial traits is proposed as a paradigm for scaling soil N processes up across spatial scales. The explicit incorporation of microbial traits into empirical models represents a crucial advancement for refining the parameterization in empirical models. As of now, while this approach has been applied to several N processes (e.g., mineralization, nitrification, and denitrification), important gaps remain, particularly for biological N fixation, which requires integration within this framework. The coupling of interacting N processes is urgently needed to accurately predict soil N2O emissions and inorganic N dynamics at macroscales, because both N2O and inorganic N dynamics are involved in complex interactions among multiple N processes. Such integration demands both big data and advanced computational power. Emergent analytical methods, such as Bayesian approaches, deep learning, convergent cross-mapping, and digital twins, offer transformative potential for advancing macroscale N cycling studies. The next-generation empirical models systematically incorporating microscale mechanistic findings in N cycling upon the hierarchical framework may precisely capture soil N dynamics. The conceptual framework proposed in this study, combining the hierarchical framework with process coupling, provides a scientific foundation for developing effective N management strategies that simultaneously enhance agricultural sustainability and contribute to climate change mitigation efforts.