The Necessity of Systems Thinking in the Era of Agentic Artificial Intelligence for Healthcare

Narjes Shojaati

Intelligent Healthcare ›› : 1 -8.

PDF (1052KB)
Intelligent Healthcare ›› :1 -8. DOI: 10.15302/IH.2026.000005
Perspective
The Necessity of Systems Thinking in the Era of Agentic Artificial Intelligence for Healthcare
Author information +
History +
PDF (1052KB)

Abstract

The emergence of agentic artificial intelligence (AI) systems—frameworks characterized by autonomous perception, reasoning, action execution, and iterative learning—requires rigorous oversight, particularly as they are integrated into healthcare domains. A growing body of literature has demonstrated a persistent gap between measured performance and real-world effectiveness in conventional healthcare AI, often resulting in limited reliability and poor contextual generalization. The integration of autonomy, adaptation, and feedback loops by agentic AI into clinical structures may further magnify these vulnerabilities, potentially eroding patient trust, amplifying cascading errors, worsening health inequities, expanding accountability gaps, and creating self-reinforcing decline. This perspective manuscript argues that systems-thinking methodologies are essential for the prospective evaluation of agentic healthcare AI, as they provide a holistic perspective that captures emergent downstream impacts within contextual complexity over time, thereby enabling the assessment of a decision-making architecture that underlies autonomy, action, and adaptation.

Graphical abstract

Keywords

artificial intelligence / healthcare AI / agentic AI systems / systems thinking / prospective evaluation framework

Cite this article

Download citation ▾
Narjes Shojaati. The Necessity of Systems Thinking in the Era of Agentic Artificial Intelligence for Healthcare. Intelligent Healthcare 1-8 DOI:10.15302/IH.2026.000005

登录浏览全文

4963

注册一个新账户 忘记密码

Introduction: nascency of agentic AI systems in healthcare

The integration of artificial intelligence (AI) into clinical practice meaningfully affects healthcare delivery and potentially enhances medical practice[1]. AI in healthcare has undergone an expansion, progressing beyond traditional machine learning (ML) approaches. It now includes deep learning, reinforcement learning, neural networks and large language models (LLMs) for clinical pattern recognition and decision-making support[2]. This expansion has been driven primarily by substantial financial investment, the increasing availability of large-scale healthcare data, and growing evidence in the literature of AI’s clinical utility[3]. At present, the global AI community is observing the rapid growth of an emerging framework called agentic AI, which provides autonomous perception, reasoning, action execution, and continuous learning over time[4].

In healthcare, agentic AI systems refer to autonomous systems that perceive their clinical environments, reason over multimodal patient data, plan and execute multi-step actions, and adjust based on received feedback. Although conventional AI systems in healthcare perform a range of specialized functions, they remain more limited than agentic AI systems. Basic workflow automation follows rigid rules without reasoning, LLM-based tools generate text responses without real-world actions, and decision support systems provide static recommendations without iterative adaptation. Agentic AI systems, by contrast, enable goal-directed reasoning, dynamic action execution, and adaptive problem-solving within clinical workflows[5,6].

Recent literature on agentic AI in healthcare emphasizes that its ability to operate autonomously toward clinical goals may reduce physician workload, lower documentation burden, and support faster diagnostic processes[7]. Beyond clinical workflows, agentic AI is claimed to be a key enabler of personalized and preventive medicine, supporting adaptive treatment plans and chronic disease management for elderly individuals and people with disabilities[8]. At a systems level, researchers project broader impacts from agentic AI in healthcare such as improved resource utilization, enhanced health policy decision-making, and increased economic value for healthcare systems[9,10].

Despite these favorable prospects, the real-world deployment of agentic AI in healthcare may face critical challenges at each core layer[11]. These challenges arise from the human-centered nature of medical interventions, the ethical importance of care delivery, and the evolving nature of health outcomes[12]. In healthcare settings, trust is central to the delivery of effective care[13]. Any health data integration within the data sensing layer of agentic AI systems requires ethical approval and adherence to privacy regulations[14]. Reliance on patient consent in AI-assisted clinical practices is a concern, as consent may be shaped by reluctance to engage with additional procedures or by insufficient understanding of the underlying process. In addition, patient consent alone has not traditionally been regarded as a primary legal basis for all uses of patient data within healthcare[15]. Transparency, responsibility and fairness may also be challenging to achieve in agentic healthcare AI[16], particularly given the introduction of autonomy through the brain, action, and learning layers. The brain layer may limit transparency because of the black-box nature of its internal reasoning processes, making it challenging to provide clear information to patients about their treatment decisions. Meanwhile, tasks currently performed by humans, which are already constrained by liability concerns, may be replaced by action layers, where AI-generated recommendations could introduce additional tasks without clear accountability for any party. Finally, continuous data integration into feedback loops for learning raises the risk of overfitting to recently observed data and may ultimately alter system behavior in ways that lead to undesirable outcomes[17].

The current literature on healthcare AI employs a Strengths, Weaknesses, Opportunities, and Threats (SWOT) framework to assess its implications for healthcare systems[6,1825], providing an informative foundation for a more structured perspective on the challenges of agentic healthcare AI. The current perspective manuscript is motivated by the growing interest in deploying agentic AI in healthcare, despite unresolved challenges in its comprehensive evaluation. To help bridge this gap, this manuscript details the inadequacies of current evaluation frameworks for agentic AI in healthcare, presents structured examples of these challenges, and emphasizes systems thinking as a critical foundation for the responsible deployment of agentic AI in healthcare.

Problem statement: the inadequacy of current evaluation frameworks for agentic healthcare AI systems

Three main stages of deployment assessment are evident in the current healthcare AI evaluation literature. The first stage focuses on technical performance evaluation, in which data scientists assess how effectively healthcare AI systems work using a set of predefined computational metrics. The second stage evaluates the usability and acceptability of AI systems by investigating whether physicians and patients can effectively employ healthcare AI systems within real-world clinical workflows. The third stage assesses whether AI systems lead to substantial improvements in health outcomes, through randomized or non-randomized studies conducted in real-world settings[26].

However, this evaluation paradigm frequently results in significant performance gaps across these stages[27], implying that high technical performance at early stages does not always result in a reliable AI system in later stages for real-world scenarios. Several studies in the literature indicate that some AI and ML healthcare models that achieved high performance on internal tests significantly underperform in real-world cases[2831]. For instance, a systematic review[28] assessing the generalizability of AI in radiology demonstrates that radiology AI models could underperform in real-world settings, even when they achieve area under the curve (AUC) values above 0.90 on internal datasets. The study suggests that this observed underperformance is likely due to differences in hospital equipment and patient populations, which introduce inconsistencies across datasets.

This discrepancy underscores that current evaluation frameworks are insufficient, as they evaluate the performance of healthcare AI systems in isolation rather than their functioning as part of broader healthcare systems[32]. These frameworks rely on static performance metrics and may overlook downstream system-level impacts in dynamic clinical environments. This is particularly important for agentic healthcare AI systems. The autonomy, adaptation, and feedback loops inherent in agentic AI can give rise to evolving system behavior over time, potentially generating emergent consequences that challenge static evaluation frameworks. Thus, current evaluation frameworks are likely to become critically insufficient when applied to agentic AI and require substantial revision.

Illustrative examples: projecting the consequences of constrained evaluation for agentic healthcare AI systems

The following examples present forward-looking scenarios illustrating possible future risks associated with the deployment of agentic AI systems in clinical contexts.

Trade-off between patients trust and automation

Significant data collection and unclear privacy protections in agentic AI systems can raise patient concerns about the safety of their data. The need for personal medical data to generate AI-driven insights could increase the likelihood of breaches of patient confidentiality and the misuse of sensitive information, for example through sharing data with third-party AI providers or the need to link data back to identifiable individuals. As a result, patients may become less willing to disclose sensitive information and may instead decline to fully engage in their care. Consequently, clinicians may be forced to make decisions based on incomplete information provided by patients, which can hinder accurate diagnosis and effective treatment. Over time, this may lead to more complex cases and repeated patient visits. Therefore, although agentic AI systems may increase the number of patients seen, they risk reducing the overall quality of care due to diminished trust generated by autonomy.

Error magnitude in cascading autonomous action

Profound differences in the ethical gravity of clinical errors already exist within evaluation frameworks for AI systems. For example, a 5% false-negative rate in influenza screening among healthy adults carries different mortality implications than a 5% false-negative rate in detecting ventricular arrhythmias in intensive care patients, where missed cases in the latter scenario may result in a rapid progression to fatality. These disparities can be further amplified in autonomous contexts, as agentic healthcare AI systems can transform a single misclassification into a cascade of downstream automated actions. For instance, an incorrect diagnosis may also lead to automated treatment implementation and prescription adjustments that collectively compound the initial error. Therefore, evaluation metrics that are limited to isolated classification performance are insufficient to capture the cascading risks inherent in autonomous clinical workflows.

Population inequities amplified by adaptive actions

The uneven consequences of errors across distinct subpopulations are currently obscured by aggregate evaluation metrics in AI systems, making these metrics unrepresentative of real-world conditions. Conventional AI systems with similar sensitivity and specificity may produce different real-world health outcomes depending on variations in healthcare access across populations. In well-resourced populations, false positives may cause minor inconvenience and false negatives may be corrected through follow-up, whereas in disadvantaged communities, false positives can impose disproportionate financial burdens and false negatives may lead to significant health burdens. Similarly, in an agentic AI system responsible for scheduling clinical visits, a misdiagnosis of a less privileged patient, combined with their inability to attend the appointment, may lead to reduced prioritization of future care. Furthermore, because of the system’s adaptive features, the missed visit may be interpreted as indicating reduced need for care, thereby further lowering the patient’s future priority. As illustrated above, in the adaptive environments enabled by agentic healthcare AI systems, existing inequities may be amplified over time, turning small statistical biases into structural inequities.

System capacity strain and accountability gaps

Clinical system infrastructure and staff operate under finite operational, temporal, and cognitive resources; thus, liability frameworks are established to ensure accountability for decisions and their consequences. An agentic healthcare AI system operating without these constraints may generate fallacious reasoning requiring expert-level judgment to detect, potentially overloading healthcare resources in the absence of clear accountability. For example, an agentic AI system for hospital staff scheduling may misestimate workload or overlook staff constraints. In the absence of clear accountability mechanisms that enable staff to raise concerns, system decisions may be perceived as reliable even when they produce harmful outcomes. Potential responses, such as implementing temporal auditability of system actions, recognize that agentic AI systems are inherently dynamic and therefore cannot be fully assessed through static evaluation frameworks alone.

Temporal feedback loops and self-reinforcing degradation

Agentic healthcare AI systems challenge assumptions about stability and independence in healthcare by integrating feedback loops and self-learning; however, these effects may only become apparent after initial risks emerge in real-world deployment. For example, an agentic antibiotic‑prescribing AI system can estimate resistance patterns, but once deployed, its prescribing decisions can recursively influence its own perceived validity by shaping the development of new antibiotic resistance, which in turn alters the effectiveness of subsequent recommendations. In this setting, the system is only informed of resistance patterns after they have already been shaped by its own prior actions. Therefore, agentic healthcare AI systems cannot rely solely on retrospective performance feedback after new problems have already been created and require prospective assessment of the long-term consequences of recursive decision-making.

Considering the points discussed above, the introduction of autonomy, adaptive execution, and feedback learning poses a significant risk to the generalizability of agentic healthcare AI systems and may lead to a loss of patient trust, the generation of cascading errors, the exacerbation of population inequities, the occupation of clinical resources without proper accountability, and self-reinforcing degradation. Consequently, evaluation metrics limited to snapshot assessments are insufficient to capture the dynamic interactions that agentic AI systems may exhibit across diverse clinical environments over time. Overall, this mismatch between static evaluation approaches and the adaptive, context-dependent nature of agentic AI systems highlights the need for continuous, system-aware evaluation paradigms.

Proposed solution: systems thinking as a solid evaluation framework for agentic healthcare AI systems

Systems thinking[33] is a holistic approach for studying complex systems that enables the mapping of interactions among different system components and the examination of how these interactions evolve over time within their context. Systems thinking approaches, including causal loop diagrams[34], system dynamics[35], agent-based modeling[36], and discrete event simulation[37], can be applied to enable a comprehensive evaluation of complex systems by capturing temporal and spatial dynamics, feedback mechanisms, population heterogeneity, workflow processes, resource utilization, and emergent behavior[38,39]. Table 1 compares these methods by outlining their purpose, assumptions, core components, and implications for evaluating agentic AI in healthcare.

These systems thinking methodologies are well established in public health for a range of research applications, such as epidemiological modeling, health systems analysis, policy evaluation, and resource planning[40] and can be adapted to evaluate agentic healthcare AI systems by facilitating comprehensive assessment and prospective insights into their real-world operation. Causal loop diagrams serve as the conceptual starting point by mapping causal relationships among system components, making reinforcing and balancing feedback structures explicit, and highlighting potential unintended consequences such as policy resistance that may remain invisible during the algorithmic phase. System dynamics simulations formalize these relationships into stock-and-flow structures to examine nonlinear feedback processes and long-term accumulative trends. Agent-based modeling captures emergent phenomena arising from interactions among heterogeneous agents, and the environment, representing adaptive behavior, nonlinear interactions, and system-level patterns that emerge from individual, localized rules. Finally, discrete event simulation examines operational-level processes by modelling the flow of entities through predefined system pathways over time, enabling the evaluation of clinical workflows, waiting times, and resource utilization. Figure 1 illustrates the integrated architecture of these methods for assessing the efficiency of agentic AI systems. A hybrid configuration of these approaches leverages the complementary strengths of different systems thinking paradigms by integrating conceptual, dynamic, behavioral, and operational perspectives, thereby supporting a more comprehensive assessment of agentic AI in healthcare.

Subsequently, the real-world deployment of agentic AI systems inevitably involves shifts in key parameters and assumptions across time and settings; therefore, performing systematic sensitivity analyses[41] on key parameters can provide insight into the range of possible outcomes under diverse conditions. These processes can help reduce uncertainty and identify context-dependent vulnerabilities.

Collectively, systems thinking evaluation frameworks extend beyond static evaluation paradigms by supporting the explanation of emergent downstream impacts. They enable adaptive system configurations and facilitate the anticipation of unintended consequences of agentic AI systems over time. Ongoing collaboration among modeling specialists, subject-matter experts, ethics experts, and healthcare economists remains essential to establish multidisciplinary review panels that can comprehensively assess results. These partnerships should be maintained through continuous monitoring of agentic healthcare AI systems and their real-world implications, enabling modification or discontinuation when predefined equity, safety, or performance standards are not met.

Conclusion: toward systematic evaluation of agentic healthcare AI systems

The evaluation of agentic AI systems in healthcare presents significant challenges and is likely to become more pronounced with large-scale adoption. These systems, with autonomous capabilities in perception, reasoning, action execution, and iterative learning, create additional evaluation demands, as current evaluation methods fail to account for the autonomous, action-oriented, and feedback-rich nature of agentic AI. These challenges highlight the need for a comprehensive evaluation paradigm that moves beyond static performance metrics to capture downstream impacts and long-term contextual shifts. A systems-thinking approach offers a more robust framework for assessing agentic healthcare AI in real-world settings, as it can explicitly model population heterogeneity, workflow structures, resource limitations, feedback mechanisms, and temporal–spatial dynamics. Without a holistic evaluation approach, the deployment of agentic AI systems in healthcare could potentially exhibit a tendency toward provider-centered optimization, unless it is carefully balanced with ethical care considerations, privacy safeguards, and accountability mechanisms.

References

[1]

Karalis VD. The integration of artificial intelligence into clinical practice. Appl Biosci. 2024;3(1):14-44.

[2]

Thomas KS, Edpuganti S, Puthooran DM, Thomas A, Joy A, Latheef S. Artificial intelligence in modern clinical practice (Review). Med Int (Lond). 2026;6(1):5.

[3]

Kolasa K, Admassu B, Hołownia-Voloskova M, Kędzior KJ, Poirrier JE, Perni S. Systematic reviews of machine learning in healthcare: a literature review. Expert Rev Pharmacoecon Outcomes Res. 2024;24(1):63-115.

[4]

Arunkumar V, Gangadharan GR, Buyya R. Agentic artificial intelligence (AI): architectures, taxonomies, and evaluation of large language model agents. arXiv. Preprint posted online January 28, 2026. arXiv: 2601.12560.

[5]

Njei B, Al-Ajlouni YA, Sidney Kanmounye U, et al. Artificial intelligence agents in healthcare research: a scoping review. PLoS One. 2026;21(2):e0342182.

[6]

Srinivasu PN, Aruna Kumari GL, Ahmed S, Alhumam A. Exploring agentic AI in healthcare: a study on its working mechanism. Front Med (Lausanne). 2025;12:1753443.

[7]

Hinostroza Fuentes VG, Karim HA, Tan MJT, AlDahoul N. AI with agency: a vision for adaptive, efficient, and ethical healthcare. Front Digit Health. 2025;7:1600216.

[8]

Begum S, Singh TM, Khan FA, Doss S. AI in personalized medicine: redefining healthcare through agentic intelligence. In: The Power of Agentic AI: Redefining Human Life and Decision-Making. Springer Nature Switzerland; 2025:41-57.

[9]

Karunanayake N. Next-generation agentic AI for transforming healthcare. Inform Health. 2025;2:73-83.

[10]

Collaco BG, Haider SA, Prabha S, et al. The role of agentic artificial intelligence in healthcare: a scoping review. NPJ Digit Med. 2026;9(1):345.

[11]

Cajas Ordóñez SA, Castro R, Celi LA, et al. Beyond overconfidence: Embedding curiosity and humility for ethical medical AI. PLOS Digit Health. 2026;5(1):e0001013.

[12]

Egbunu AS, Okedoye AM. Harnessing artificial intelligence for early disease detection: opportunities and challenges in modern healthcare. J Comput Theor Appl. 2026;3:384-401.

[13]

Goktas P, Grzybowski A. Shaping the future of healthcare: ethical clinical challenges and pathways to trustworthy AI. J Clin Med. 2025;14(5):1605.

[14]

Spector-Bagdady K. Generative-AI-generated challenges for health data research. Am J Bioeth. 2023;23(10):1-5.

[15]

Kramcsak PT. Can legitimate interest be an appropriate lawful basis for processing Artificial Intelligence training datasets? Comput Law Secur Rev. 2023;48:105765.

[16]

Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17(1):195.

[17]

Bellogín A, Giudici P, Larsson S, et al. Systemic risks associated with agentic AI: a policy brief. ACM Europe TPC-Autonomous Systems Subcommittee. ACM. 2025.

[18]

Mumuni AN, Hasford F, Udeme NI, Dada MO, Awojoyogbe BO. A SWOT analysis of artificial intelligence in diagnostic imaging in the developing world: making a case for a paradigm shift. Phys Sci Rev. 2024;9:443-476.

[19]

Martin T. AI and Healthcare: SWOT Analysis. December 03, 2025.

[20]

Mohammad Talebi H. Analyzing the strengths, weaknesses, opportunities, and threats (SWOT) of chatbots in emergency nursing: a narrative review of literature. Creat Nurs. 2025;31(4):451-460.

[21]

Torkay G, Fadlallah N, Karagöz A, et al. Artificial intelligence in cancer: a SWOT analysis. J AI. 2024;8(1):107-137.

[22]

Sallam M, Snygg J, Allam D, Kassem R, Damani M. Artificial intelligence in clinical medicine: a SWOT analysis of AI progress in diagnostics, therapeutics, and safety. J Innov Med Res. 2025;4:1-20.

[23]

Attoh-Mensah E, Boujut A, Desmons M, Perrochon A. Artificial intelligence in personalized rehabilitation: current applications and a SWOT analysis. Front Digit Health. 2025;7:1606088.

[24]

Zaman M. ChatGPT for healthcare sector: SWOT analysis. Int J Res Ind Eng. 2023;12:221-233.

[25]

Ghaffari Jolfayi A, Sarikhani M, Aarabi A, Ghasemirad H, Dehghanian Z, Mozafarybazargany M. SWOT analysis of the integration of MRI in cardiology: strengths, weaknesses, opportunities, and threats. In: Ghaffari Jolfayi A, eds. Navigating Cardiology’s Future. Singapore: Springer; 2025:115-128.

[26]

Park Y, Jackson GP, Foreman MA, Gruen D, Hu J, Das AK. Evaluating artificial intelligence in medicine: phases of clinical research. JAMIA Open. 2020;3(3):326-331.

[27]

El Arab RA, Abu-Mahfouz MS, Abuadas FH, et al. Bridging the Gap: From AI success in clinical trials to real-world healthcare implementation—a narrative review. Healthcare. 2025;13:701.

[28]

Suleman MU, Mursaleen M, Khalil U, et al. Assessing the generalizability of artificial intelligence in radiology: a systematic review of performance across different clinical settings. Ann Med Surg (Lond). 2025;87(12):8803-8811.

[29]

Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Confounding variables can degrade generalization performance of radiological deep learning models. arXiv. Preprint posted online July 1, 2018. arXiv:1807.00431.

[30]

Yasaka K, Abe O. Deep learning and artificial intelligence in radiology: current applications and future directions. PLoS Med. 2018;15(11):e1002707.

[31]

Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018;15(11):e1002683.

[32]

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44-56.

[33]

Cabrera D, Cabrera L. What is systems thinking? In: Learning, Design, and Technology: An International Compendium of Theory, Research, Practice, and Policy. Springer; 2023:1495-1522.

[34]

Barbrook-Johnson P, Penn AS. Causal loop diagrams. In: Systems Mapping: How to Build and Use Causal Models of Systems. Springer; 2022:47-59.

[35]

Homer JB, Hirsch GB. System dynamics modeling for public health: background and opportunities. Am J Public Health. 2006;96(3):452-458.

[36]

Tracy M, Cerdá M, Keyes KM. Agent-based modeling in public health: current applications and future directions. Annu Rev Public Health. 2018;39:77-94.

[37]

Vázquez-Serrano JI, Peimbert-García RE, Cárdenas-Barrón LE. Discrete-event simulation modeling in healthcare: a comprehensive review. Int J Environ Res Public Health. 2021;18:12262.

[38]

Schünemann C, Johanning S, Reger E, Herold H, Bruckner T. Complex system policy modelling approaches for policy advice-comparing systems thinking, system dynamics and agent-based modelling. Public Policy Model Appl. 2024;3(1):1-18.

[39]

Katina PF, Tolk A, Keating CB, Joiner KF. Modelling and simulation in complex system governance. Int J Syst Syst Eng. 2020;10:262-292.

[40]

Thelen J, Sant Fruchtman C, Bilal M, et al. Development of the systems thinking for health actions framework: a literature review and a case study. BMJ Glob Health. 2023;8(3):e010191.

[41]

Sysoev A. Sensitivity analysis of mathematical models. Computation. 2023;11:159.

RIGHTS & PERMISSIONS

The Author(s) 2026. This article is published by Higher Education Press at journal.hep.com.cn.

PDF (1052KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/