Introduction
Dermatology stands on the threshold of a profound transformation driven by artificial intelligence (AI). The fundamental catalyst for this evolution is an intensifying global paradox: the substantial and escalating burden of skin diseases versus the critical shortage and inequitable distribution of specialized dermatological resources. According to the latest data from the Global Burden of Disease (GBD) study, skin conditions have emerged as the fourth leading cause of non-fatal disease burden worldwide, affecting nearly one-third of the global population. In 2021, the number of skin disease cases reached 4.7 billion globally, representing a staggering 65% increase in disease burden since 1990[
1].
In stark contrast to the vast patient population, global dermatological resources remain severely inadequate. The average global density of dermatologists is a mere 2.66 per 100,000 people, with a disparity exceeding 13-fold between high-income countries (5.05 per 100,000) and low-income nations (0.37 per 100,000)[
2]. This scarcity directly results in prolonged wait times for consultations. In the United States, the average wait time for a dermatology appointment exceeds 30 days, with a 2025 study further indicating that 55% of patients wait more than 3 months[
3]. This substantial supply-demand gap constitutes a primary barrier to equitable access to global skin health and provides an extensive landscape for the application of AI.
Significant clinical demand, coupled with technological breakthroughs, has catalyzed exponential market growth. The global AI skin analysis market size reached USD 1.54 billion in 2024 and is projected to expand to USD 7.11 billion by 2034. Similarly, the scientific research landscape has witnessed explosive growth; the volume of publications regarding AI in dermatology increased sharply after 2016. Public interest has also intensified significantly since 2022, with search interest growth rates peaking at 143.6% in 2023[
4,
5].
Against this backdrop, the application of AI in dermatology is undergoing a fundamental transition from perceptive intelligence toward cognitive and actionable intelligence, signaling the emergence of “AI Dermatology 2.0”. While the core of “AI 1.0” centered on deep learning-based pattern recognition—aiming to match or exceed human performance in specific tasks such as skin cancer classification—AI 2.0 is dedicated to constructing intelligent systems equipped with causal inference, dynamic prediction, and autonomous decision-making capabilities. This review aims to systematically explore this profound paradigm shift. We first analyze the epistemological revolution moving from correlative prediction to causal understanding. Subsequently, we delineate the leap from population-based statistics to individualized prediction through the construction of high-fidelity skin digital twins. Building upon this, we discuss how predictive intervention shifts the clinical focus from “reactive management” to “risk interception”. Finally, we envision how distributed intelligence networks driven by autonomous AI agents will reshape physician-patient collaboration and clinical roles, ultimately advancing the field from “symptom-based treatment” toward the holistic vision of “life-cycle skin homeostasis management”.
The epistemological revolution: from correlation to causality
The application of AI in dermatology is undergoing a profound paradigm shift, reshaping our cognitive models of disease. The core of this revolution lies in the transition from correlative predictions—satisfying the question of “what”—to a causal understanding that explores “why”. If the hallmark of “AI Dermatology 1.0” was the machine’s ability to match or exceed human performance in pattern recognition, the cornerstone of “AI Dermatology 2.0” is the endowment of machines with the capacity for logical reasoning and causal exploration. This represents not merely an algorithmic iteration, but a fundamental driver for deepening our understanding of complex dermatological conditions and effectively addressing them.
Beyond “explainability”: the cognitive ceiling of correlative models
In the era of AI Dermatology 1.0, image recognition technologies centered on deep learning achieved significant success[
6]. To mitigate the “black-box” effect, explainable artificial intelligence (XAI) was considered essential for building trust[
7]. However, techniques such as heatmaps and class activation mapping (CAM) essentially remain limited to “visual attribution”; they fail to reveal underlying biological mechanisms and may even induce false confidence in erroneous decision-making pathways. A classic case demonstrated that if malignant lesion images in a training set frequently feature a ruler, the AI model might incorrectly identify the “ruler” as a criterion for malignancy[
8].
More importantly, models relying solely on correlation possess an inherent cognitive ceiling. A systematic review and meta-analysis published in a
Nature portfolio journal in 2024 conducted a comprehensive comparison of diagnostic performance between AI and clinicians in skin cancer tasks. The results indicated that the pooled sensitivity (87.0%) and specificity (77.1%) of AI algorithms were statistically superior to the average levels of all clinicians (sensitivity 79.8%, specificity 73.6%). Notably, however, when compared with experienced dermatologists, the diagnostic performance of both was clinically comparable (AI: Sn 86.3%, Sp 78.4%; Experts: Sn 84.2%, Sp 74.4%)[
9]. This underscores that once AI 1.0 reaches expert-level proficiency, substantial breakthroughs through mere model iteration become increasingly difficult to achieve. Such models can only quantify the co-occurrence frequency of features but cannot resolve the underlying causal logic. This limitation hinders their ability to manage rare diseases or atypical cases and fails to drive the cognitive evolution of disease mechanisms[
10]. Consequently, transitioning from pure statistical correlation to an AI 2.0 architecture equipped with causal inference capabilities has become an inevitable choice for the intelligent transformation of dermatology.
The rise of causal inference: a new language for understanding disease
To break through the cognitive ceiling of correlative models, we must introduce a novel theoretical framework—causal inference. This paradigm shift from correlation to causality represents a fundamental leap in the cognitive depth of AI.
As illustrated in Figure 1, the essence of causal inference lies in its capacity to endow AI with reasoning abilities. Causal science, particularly the Structural Causal Model (SCM) developed by Judea Pearl and colleagues, provides a rigorous mathematical language to describe causal relationships between variables. This framework enables us to perform not only predictions but also interventions (How would Y change if I actively altered X?) and counterfactual reasoning (What would Y have been if X had not occurred?)[
11].
This is not a purely philosophical concept; it has already demonstrated substantial practical value in the field of medical diagnosis. A pioneering study published in
Nat Commun redefined the diagnostic task from a simple classification problem into a counterfactual inference task that more closely aligns with clinical reasoning. In a test set comprising 1671 clinical cases, the causal algorithm achieved an accuracy of 77.26%, reaching expert-level performance within the top 25% of clinicians, whereas the traditional correlative algorithm achieved only 72.52%. More importantly, this advantage was further amplified in the diagnosis of rare and ultra-rare diseases, where the performance of the causal algorithm improved by 29.2% and 32.9%[
12], respectively, compared to the correlative algorithm. These results indicate that causal reasoning is a critical attribute necessary for AI to evolve into a truly trustworthy clinical partner when addressing complex medical challenges.
From biomarkers to pathogenic pathways: the rise of causal multi-omics
The integration of causal AI with multi-omics data marks a paradigm shift in scientific research, moving from data-driven correlation discovery toward model-driven causal validation. Previously, the application of multi-omics data was largely confined to biomarker identification—for instance, utilizing Genome-Wide Association Studies (GWAS) to identify loci associated with psoriasis susceptibility[
13]. However, such approaches failed to elucidate the specific underlying mechanisms of action.
Currently, researchers are beginning to employ causal inference methodologies—such as Causal Bayesian Networks and Mendelian Randomization (MR)—to analyze multi-omics data. These methods enable the simulation of how factors like ultraviolet (UV) exposure and genetic background synergistically drive the skin aging process[
14], or the investigation of whether a genuine causal relationship exists between specific skin microbiota and atopic dermatitis (AD)[
15].
To further ground these methodological advancements in specific dermatopathophysiological mechanisms, recent studies have utilized MR to unravel complex disease pathways. For instance, a two-sample MR approach was employed to reveal a causal link between specific plasma metabolites—such as (S)-α-amino-ω-caprolactam and glycochenodeoxycholate 3-sulfate—and an increased risk of scar formation, offering novel metabolic targets for intervention[
16]. Similarly, another study constructed a sophisticated two-step MR framework to elucidate the epigenetic and immunological drivers of keloids, demonstrating that specific microRNAs (e.g., miR-6887-5p) causally mediate keloid formation via B-cell activating factor receptors on specific immune cell subpopulations[
17]. These concrete discoveries exemplify how causal inference transcends theoretical biomarker identification, directly mapping multi-omics data onto actionable pathogenic pathways.
This conceptual evolution is fostering a novel research paradigm: the “AI hypothesis-experimental validation” human-machine collaborative loop. In this model, AI transitions from a passive data tool to an active discovery engine[
18,
19]. It constructs disease pathway hypotheses based on massive omics datasets and precisely recommends critical targets (such as specific gene knockouts) for prioritized experimental validation. Once scientists complete the experiments, the new data are fed back into the AI to refine the causal map. This intelligent dialogue and efficient iteration allow scientific resources to be concentrated on the most pivotal validation steps, significantly accelerating breakthroughs in understanding complex disease mechanisms.
Skin digital twin: constructing a high-fidelity personalized predictive engine
Following the elucidation of disease mechanisms through causal inference, the frontier of clinical practice has shifted toward prediction. The skin digital twin (DT) is pivotal to achieving this objective, driving a transformation in dermatology from population-level statistics to individualized precision intervention[
20] (Figure 2). Its core value lies not in static replication, but in the dynamic simulation of individual responses to internal and external perturbations, thereby genuinely realizing personalized predictive medicine.
Core technical characteristics of skin digital twins
A high-fidelity skin DT is not a mere geometric model or a simple data compilation; rather, it is a highly complex dynamic computational system characterized by three core technical dimensions[
21,
22]. First is the deep integration of multi-scale data. Cutaneous pathological processes span multiple biological scales, from macroscopic clinical imaging and microscopic histopathology to molecular omics. By integrating cellular heterogeneity captured through single-cell RNA sequencing (scRNA-seq) with
in-situ geographical coordinates provided by spatial transcriptomics[
23,
24], the DT enhances data “granularity” and elucidates complex cell-cell interaction patterns within lesional areas. This hierarchical architecture, which synthesizes high-resolution omics with lifestyle and environmental exposure data, establishes the systemic foundation of an individual health model (Figure 3).
Second is the capacity for dynamic sensing and real-time updates. As a highly sensitive organ, the physiological state of the skin fluctuates in real-time according to internal and external environments. Utilizing non-invasive wearable biosensors—such as intelligent patches monitoring pH, temperature, and inflammatory biomarkers—the DT enables continuous acquisition of physiological parameters, transcending the limitations of static snapshots[
25]. This capability for “here-and-now” synchronous updating provides critical support for real-time clinical decision-making and early warning systems. Finally, the defining essence of this technology, which distinguishes it from traditional predictive models, is its causal-based counterfactual simulation capability. Unlike traditional machine learning that relies solely on historical data for correlative analysis, a DT embedded with an SCM can address hypothetical questions—such as “what would happen if a specific intervention were implemented”—through virtual experimentation[
26]. This ability to rehearse intervention outcomes within a digital environment is the core driver for optimizing individualized treatment regimens and achieving precision medicine.
Applications of digital twins in drug response simulation
Although the clinical validation of the skin DT remains in its infancy, its successful application in fields such as cardiology and oncology has yielded preliminary results, providing critical methodological references for the digital transformation of dermatology. In cardiovascular medicine, DT-guided intervention strategies have been demonstrated to reduce arrhythmia recurrence rates by over 13% and improve patient prognosis by 25%[
27,
28]. In precision oncology, Camacho-Gómez et al. utilized physics-informed machine learning to construct a prostate cancer DT model; by integrating prostate-specific antigen (PSA) kinetic data with biomechanical models of tumor growth, the model successfully reconstructed tumor evolution trajectories spanning 2.5 years, with relative errors in volume prediction ranging only between 0.8% and 12.28%[
29].
Drawing upon these interdisciplinary success paradigms, the prospects for DT in predicting dermatological drug responses and individualized therapy are becoming increasingly clear. Taking clinical decision-making for moderate-to-severe psoriasis as an example, clinicians frequently face the challenge of selecting from various high-cost biologics with differing targets, such as interleukin-17 (IL-17), interleukin-23 (IL-23), and tumor necrosis factor-alpha (TNF-α). Traditional decision-making models rely heavily on physician experience and population-average evidence from large-scale clinical trials, which often fail to account for individual heterogeneity. The introduction of a DT framework initiates a paradigm shift in this process: by constructing patient-specific “virtual mirrors”, physicians can conduct “virtual clinical trials” prior to actual drug administration. By integrating pharmacokinetic/pharmacodynamic (PK/PD) models of specific drugs into the DT system, it is possible to parallelly simulate evolution curves of disease activity (e.g., Psoriasis Area and Severity Index (PASI) scores), probabilities of potential adverse reactions, and genetic-influenced variations in drug sensitivity over the subsequent years[
30,
31].
The simulation results provide high-confidence decision support for clinical practice. For instance, if the system indicates that an IL-23 targeted regimen, while slower in onset, significantly outperforms other options in long-term remission rates and safety, the clinician and patient can collaboratively establish an optimal individualized strategy. This process transforms clinical decision-making from a probability-based “trial-and-error” model into an individualized numerical simulation-based “regimen rehearsal”, which is expected to enhance therapeutic efficacy while significantly reducing medical expenditures and side effects associated with ineffective administration.
Key bottlenecks in constructing high-fidelity models
While the skin DT demonstrates significant clinical potential, its evolution from a theoretical concept to a reliable clinical tool necessitates overcoming a series of core technical obstacles. These challenges are primarily concentrated in three dimensions: data integration, model construction, and clinical translation, which represent the focal points of current academic discourse[
20].
First, the fusion and interoperability of multi-source heterogeneous data constitute the underlying technical bottleneck. As illustrated in Figure 3, a high-fidelity DT relies on the deep integration of structured omics data, semi-structured electronic health records (EHR), and unstructured clinical images and textual notes. However, data from different sources currently follow disparate standards and formats, lacking a unified semantic framework. This “data silo” phenomenon severely restricts the possibility of constructing full-scale individual models[
32]. Consequently, the promotion of common data models, such as the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), and the development of standardized data interfaces are essential prerequisites for achieving data synergy and model generalization[
33].
To address this challenge, the academic and industrial communities have been actively promoting the construction of open-access, shared datasets, aiming to break down data silos and provide high-quality, standardized foundations for model development and validation. These datasets span multiple modalities—from macroscopic clinical photographs and dermoscopic images to microscopic whole-slide histopathological images—and encompass a rich diversity of disease types, patient populations, and skin tones, providing invaluable resources for research into algorithmic fairness and generalizability. Table 1 systematically summarizes the major open-access datasets currently available in the field of dermatology, illustrating the community’s collective efforts and incremental progress in data sharing.
Second, validating the robustness and fidelity of computational models in non-ideal clinical scenarios remains a central difficulty. As the predictive engine of the DT, deep learning-based “black-box” models, despite their excellence in specific phenotypic recognition, exhibit an opacity in decision logic that fails to satisfy the rigorous safety requirements of high-stake clinical decision-making[
74]. Real-world data (RWD) in dermatology are frequently characterized by noisy images, uneven lighting, or missing clinical indicators, imposing demanding requirements on algorithmic robustness[
75]. To enhance predictive fidelity, the application of Physics-Informed Neural Networks (PINNs) extends beyond mere simulation; by incorporating biophysical constraints—such as skin moisture diffusion equations or PK models—into the loss function, PINNs force the model to adhere to biological principles even in the presence of noisy data, thereby correcting biases inherent in purely data-driven approaches[
76]. Regarding missing values, SCMs demonstrate a unique advantage: unlike traditional interpolation methods, SCMs utilize preset causal graphs for counterfactual attribution, inferring missing states through the causal logic between variables. This ensures that even in non-ideal data environments, the model outputs decision paths that are logically transparent and consistent with medical knowledge. Nevertheless, reverse-engineering complex biological networks from limited and noisy clinical data remains a frontier scientific challenge[
9]. More critically, the industry still lacks a unified evaluation framework and “gold standard” to quantify the consistency between virtual simulation results and actual biological responses[
32].
To systematically address the “black-box” challenge inherent in purely data-driven deep learning architectures, the development of hybrid modeling approaches has become paramount for the clinical adoption of skin digital twins. While traditional neural networks excel at pattern recognition, their opaque decision logic poses significant barriers to regulatory approval and clinician trust. Recent advances in PINNs and mechanistic models offer a robust pathway to overcome this limitation by integrating data-driven learning with established biophysical laws. For instance, by embedding governing principles—such as Fick’s laws for skin diffusion or multi-compartment PK models—directly into the neural network’s loss function, these hybrid models constrain the solution space to physically plausible outcomes[
77]. This architecture not only enhances predictive fidelity under noisy or limited data conditions but also transforms the DT into an interpretable “grey-box” system. Such mechanistic transparency allows clinicians to audit the underlying biological logic of a simulation, which is increasingly recognized by regulatory bodies (e.g., FDA and EMA) as a critical requirement for the approval and safe deployment of AI-driven clinical decision support tools[
78].
Finally, the translational gap between laboratory research and actual clinical deployment cannot be ignored. The high computational overhead of complex multi-scale models often conflicts with the requirement for real-time clinical decision-making. Furthermore, the system integration process must ensure that the DT platform can be seamlessly embedded into existing hospital information systems (HIS)[
20]. Designing intuitive interfaces that align with physician intuition and ensuring system robustness within dynamically fluctuating real-world environments represent the pivotal engineering challenges for the large-scale implementation of skin digital twins[
32].
Predictive intervention: ushering in a new paradigm for proactive dermatological management
If the DT provides a forward-looking predictive dimension for clinical decision-making, then predictive intervention is the critical path for translating this predictive capability into precise clinical practice. The core value of AI Dermatology 2.0 lies in shifting the clinical focus from “diagnosing established diseases” to “predicting potential risks”, thereby initiating a new era of preemptive intervention. The essence of this transition is to move medical intervention points from the downstream of disease evolution (the symptomatic stage) to the early upstream of biological changes (the risk accumulation stage). This drives a fundamental transformation of clinical medicine from a reactive response model to a proactive prevention paradigm.
Chronic disease management: from reactive response to proactive defense
Taking AD as an example, traditional management models are predominantly characterized by reactive response mechanisms: patients typically seek medical intervention only after the acute exacerbation of symptoms such as pruritus and erythema, following which clinicians implement high-dose reactive treatments. This model not only diminishes patient quality of life but also escalates the systemic risks associated with long-term medication.
Under the AI 2.0 framework, chronic disease management has transitioned from population-level statistical probability prediction to individualized, causal simulation-based decision-making. Existing machine learning models can now dynamically assess the risk of flare-ups within specific windows—such as 72 h or one week—by integrating multi-source heterogeneous data, including Patient-Reported Outcomes (PROs), environmental exposure monitoring, physiological signals from wearable devices, and biomarkers. A 2025 study involving 878 patients with AD confirmed that previous flare frequency, duration, and severity are critical predictors of future disease trajectories, validating the feasibility of risk early-warning systems based on historical flare patterns[
79,
80]. Furthermore, integrated models combining smartphone-acquired lesion images with environmental factors have demonstrated exceptional discriminative power (area under the curve
[AUC] > 0.90) in predicting disease flares within the subsequent 24–72 h[
81,
82].
This precise predictive capability provides a scientific foundation for proactive intervention. When the predicted risk exceeds an individualized threshold, digital health systems can automatically alert patients to initiate low-dose, non-pharmacological preventive interventions, such as intensive moisturizing or the avoidance of environmental triggers. This “minimal effective intervention” strategy aims to suppress the disease in a subclinical state, thereby reducing the frequency of acute flares without increasing the pharmacological burden. Currently, mobile app-based digital health tools have exhibited high reliability and feasibility in the remote monitoring of AD[
83], establishing the technical foundation for a closed-loop management system that spans from “risk prediction” to “precision intervention”.
Dermato-oncology: from population screening to individualized risk trajectory prediction
In the field of dermato-oncology, predictive intervention aims to achieve a paradigmatic leap from generalized population screening to precise individualized risk management. Current melanoma screening relies primarily on static risk factors—such as age, skin phenotype, and family history—a model that often leads to over-examination of low-risk individuals and under-diagnosis of atypical high-risk patients due to a lack of dynamic specificity[
84–
86]. AI Dermatology 2.0 is dedicated to constructing dynamic, longitudinal risk prediction models for individuals. By integrating static genetic backgrounds (e.g., high-penetrance mutations in genes such as
CDKN2A, which account for approximately 40% of familial cases[
87]), dynamic environmental exposures (e.g., cumulative ultraviolet radiation doses monitored in real-time via wearable sensors)[
88,
89], and dynamic phenotypic evolution (e.g., automated monitoring of new or morphological changes in lesions using AI-based 3D total-body photography systems)[
90], these models generate individualized cancer risk curves that fluctuate over time.
This dynamic predictive capability is poised to revolutionize traditional screening guidelines. These models can quantify the marginal impact of specific environmental exposures, such as ultraviolet dosage, on an individual’s cancer risk, thereby providing highly quantitative decision support. Based on the gentle or steep trends of the risk trajectory, clinicians can precisely adjust screening frequencies, achieving a rational bifurcation between high-frequency intensive monitoring and low-frequency follow-up. Recent studies indicate that AI-assisted decision-making has already significantly reduced the misdiagnosis rate of malignant lesions by non-specialists from nearly 60% to below 5%[
90], providing a solid technical foundation for the implementation of this strategy.
Ultimately, predictive intervention will drive dermatology from a focus on “post-disease treatment” toward “health maintenance”. Its core objective is no longer limited to the elimination of established lesions but rather to maintaining long-term physiological homeostasis of the skin through precise risk interception before pathological morphology manifests. This preemptive intervention model will not only significantly enhance patient prognostic quality but will also fundamentally innovate the medical philosophy and practical pathways of skin health management.
Autonomous intelligence and distributed collaboration: from reorganizing diagnostic systems to full-lifecycle management
In the era of AI Dermatology 2.0, the ultimate objective of technological evolution is to reshape the organizational structure and collaborative models of healthcare services. This transformation is realized through two parallel pathways: at the macro level, the construction of multi-stakeholder, AI-driven distributed intelligence networks to optimize diagnostic workflows and resource allocation; and at the micro level, the deployment of highly autonomous AI agents to achieve full-lifecycle, closed-loop health management for individuals. These two pathways complement each other, collectively driving the evolution of the dermatologist’s role toward a higher-order dimension.
Distributed intelligence networks: reshaping diagnostic workflows and enhancing clinical efficiency
Traditional Clinical Decision Support Systems (CDSS) inherently follow a “physician-centric” auxiliary model, which has failed to fundamentally transform linear and inefficient diagnostic workflows. In contrast, distributed intelligence networks driven by AI 2.0 evolve AI from a simple decision-support tool into a collaborative hub that integrates multi-source information, reshaping diagnostic processes from reactive responses to proactive guidance. The value of this network is most concentrated in primary care settings, where primary care physicians—as the first line of dermatological defense—have long faced severe challenges regarding diagnostic accuracy. A 2024 systematic review indicated that diagnostic accuracy among non-dermatologists fluctuates between only 24% and 70%[
91]. This “diagnostic gap” directly leads to substantial ineffective referrals and treatment delays. The intervention of AI is narrowing this gap with unprecedented efficacy. Multiple studies have confirmed that with AI assistance, the diagnostic consistency of primary care physicians and nurse practitioners significantly improved by 10% and 12%[
92], respectively. A recent clinical study conducted in Spain using the Legit.Health tool further validated this trend: the overall accuracy of primary care physicians increased from 72.96% to 82.22%, with accuracy improvements exceeding 24% in specific conditions such as urticaria and hidradenitis suppurativa (HS)[
93]. The case of HS is particularly illustrative of a significant unmet need that AI is poised to address. While prevalent in Western countries at rates of 0.1%–1.0%, its prevalence in China is considerably lower, estimated at approximately 0.034% (33.49 per 100,000), leading to its inclusion in the national list of rare diseases[
94]. This relative rarity contributes to a staggering diagnostic delay, which averages 10.2 years in China—even longer than the already unacceptable 7–10 year delay reported globally[
95]. Such a prolonged period between symptom onset and diagnosis often results in patients progressing to severe disease stages, underscoring a critical gap in the current healthcare pathway. This challenge, however, aligns directly with the core thesis of AI 2.0. The deployment of AI-powered diagnostic aids in primary care, as discussed, holds the potential to significantly shorten this delay, functioning as an intelligent triage system that empowers non-specialists to identify suspected HS cases earlier and facilitate timely referrals, thereby mitigating disease progression and improving patient outcomes.
Beyond accuracy gains, AI demonstrates immense potential in optimizing referral pathways and enhancing cost-effectiveness. A cost-minimization analysis revealed that while the average cost of a traditional in-person dermatology outpatient visit is $324.90, teledermatology consultations cost only $44.25; the implementation of remote triaging allows for an average saving of $140.12 per newly referred patient[
96]. These economic benefits stem from the structural remodeling of diagnostic workflows. Research has found that with AI assistance, the proportion of cases requiring no referral to a specialist increased by 17% (with a total of 49% of cases managed within primary care), and up to 60.74% of cases could be managed entirely via remote models[
96]. As illustrated in Figure 4, distributed intelligence networks reconstruct traditional linear processes into efficient closed-loop management systems through intelligent triaging and pathway optimization, significantly enhancing the accessibility and operational efficiency of healthcare services.
Autonomous AI agents: core capabilities, regulation, and ethics
Building upon the macro-level restructuring of workflows by distributed networks, the ultimate manifestation of AI 2.0 focuses on highly autonomous management at the individual level. The core vehicle for this is the autonomous AI agent—an intelligent entity authorized to execute decisions, interact with patients in real-time, and dynamically optimize management plans within a preset framework. This evolution marks a qualitative leap for AI from a “decision-support tool” to an “autonomous acting subject”.
Mature autonomous AI agents rely on two core capabilities: high-fidelity conversational patient support and closed-loop autonomous monitoring and intervention. Regarding conversational support, agents based on large language models (LLMs) have demonstrated exceptional potential as tools for clinical education and preliminary consultation. Research indicates that while ChatGPT-4.0 achieved an accuracy score of 4.9/5.0 in answering questions related to AD, its output exhibited readability barriers—tending toward an academic tone—and showed relatively lower accuracy regarding specific treatment regimens. These findings define clear boundaries for its application: its role should be that of an efficient knowledge provider rather than a final medical decision-maker, and it must incorporate “red-flag” mechanisms to automatically escalate to human physicians when treatment adjustments are involved.
In terms of closed-loop actions, autonomous agents are achieving breakthroughs from “verbal guidance” to “physical intervention”. For instance, researchers have developed a self-powered closed-loop skin patch for AD treatment that real-time monitors skin status and autonomously triggers a microneedle module for drug release, achieving full “monitoring-decision-execution” automation. More importantly, a new generation of multimodal foundation models is bringing this vision to fruition[
97]. The PanDerm model, released in 2025 and pre-trained on a dataset of over 2 million real-world multimodal dermatological images, outperformed human clinicians by 10.2% in the longitudinal monitoring of early melanoma across three independent reader studies. Furthermore, it improved the diagnostic accuracy of clinicians for skin cancer on dermoscopic images by 11% and enhanced the differential diagnostic capability of non-dermatologists for 128 skin conditions by 16.5%[
98]. This signifies that AI is no longer merely a single-task auxiliary tool but possesses the cognitive capacity to integrate multi-source information and perform complex reasoning—a pivotal step toward autonomous intelligence.
The clinical entry of autonomous AI agents is predicated on a rigorous and technologically adaptive regulatory and ethical framework. In January 2024, the U.S. FDA approved DermaSensor, the first AI skin cancer detection device for primary care, providing a significant reference for the regulatory pathways of future autonomous agents. The approval process for DermaSensor revealed a shift in regulatory logic: its authorization via the De Novo pathway marks a transition from static validation toward dynamic management through the “Predetermined Change Control Plan (PCCP)”. This establishes a precedent for subsequent self-iterating autonomous agents to enter the market[
99]. AI agents capable of autonomously adjusting treatment regimens will face more stringent scrutiny, with core challenges residing in the definition and validation of autonomy boundaries, as well as the legal transition from traditional product liability to shared collaborative responsibility. Table 2 provides a comprehensive overview of representative AI dermatology products that have received approval from major global regulatory bodies or entered the clinical validation stage, illustrating the technological maturity and commercial progress of the field.
Beyond the macroscopic definition of legal liability, the ethical scrutiny of autonomous AI agents must penetrate into the nascent “algorithmic side effects”. This encompasses not only physical injuries directly resulting from hardware malfunctions—such as sensor failure or actuator anomalies leading to skin irritation or systemic toxicity via excessive drug delivery in closed-loop patches—but also profound risks derived from the algorithmic decision-making process itself. First, algorithmic bias may exacerbate health inequalities: if training datasets are skewed across ethnicities, skin tones, or socioeconomic backgrounds, the performance of AI agents may significantly degrade for specific populations, leading to erroneous intervention decisions and widening rather than bridging existing health disparities. Second, psychological side effects arising from human-computer interaction cannot be overlooked; continuous health monitoring and automated interventions may induce “digital hypochondria” or persistent anxiety, where unnecessary alerts not only increase the psychological burden but also erode patient trust in the system. Finally, at the clinician level, there is a risk of entrenching “automation bias”, wherein an over-reliance on AI recommendations may compromise independent critical thinking, potentially leading to the misdiagnosis of rare or atypical cases. Therefore, future regulatory frameworks must not only validate the intrinsic safety of AI systems but also evaluate their potential impact on the decision-making behavior of both physicians and patients, while establishing corresponding risk-mitigation and educational mechanisms. These propositions transcend technical, legal, and ethical boundaries, necessitating the construction of a multidisciplinary consensus.
The concern over algorithmic bias demands a more granular and structurally grounded analysis, particularly in the context of AI 2.0’s expanded autonomy. Empirical evidence has established that the performance disparities of dermatological AI are not merely theoretical; a recent systematic review and meta-analysis revealed that AI models achieved an AUROC of 0.89 for lighter skin tones (Fitzpatrick types I–III) but only 0.82 for darker skin tones (Fitzpatrick types IV–VI), a gap attributable to the chronic underrepresentation of darker-skinned populations in training datasets[
100]. This disparity is critically amplified in the context of AI-driven mobile health (mHealth) applications deployed in low- and middle-income countries (LMICs), where these tools are simultaneously most needed and most likely to fail. As digital dermatology expands its reach, the risk of a “digital health divide” emerges: without deliberate equity-focused design and governance, AI innovations may deepen the very disparities they seek to close, delivering inferior diagnostic accuracy to the populations with the greatest unmet need[
101]. To systematically address this structural challenge, the concept of “equity audits”—analogous to financial audits but applied to algorithmic performance stratified by skin tone, geography, and socioeconomic status—must become a standard prerequisite for clinical deployment. This imperative is increasingly recognized at the highest levels of research funding. For instance, the National Institutes of Health (NIH) recently funded projects aimed at building reliable dermatological visual language assistants through participatory multimodal AI and modeling uncertainty, allowing clinicians to correct errors and modify AI behavior to address underrepresented patient groups. This NIH funding direction signals a pivotal regulatory shift: the future benchmark for AI approval in dermatology will extend beyond mere diagnostic efficacy to encompass robustness across diverse populations and demonstrable algorithmic fairness.
To fundamentally mitigate the risk of “automation bias” and ensure safe deployment in high-stakes clinical settings, such as primary care triage, future AI 2.0 systems must evolve beyond mere predictive accuracy to incorporate robust uncertainty quantification (UQ). In dermatological decisions, it is critical that autonomous agents not only provide diagnoses but also “know when they are uncertain”. Advanced methodologies, such as Bayesian deep learning (e.g., Monte Carlo dropout) and evidential reasoning, are being integrated to estimate predictive uncertainty. By quantifying confidence levels, these models can effectively identify and reject out-of-distribution (OOD) samples—such as rare dermatoses, atypical presentations, or artifacts not represented in the training data—rather than forcing a potentially erroneous prediction. When an autonomous system flags a low-confidence prediction or detects an OOD sample, it triggers a mechanism that seamlessly escalates the case to a human expert. This uncertainty-aware architecture not only serves as a crucial safety guardrail but also calibrates the physician’s reliance on the system, thereby reducing automation bias and fostering a trustworthy human-AI collaborative environment.
Evolution of the physician’s role: from executor to system Commander
The rise of autonomous AI agents does not portend the obsolescence of dermatologists; rather, it deconstructs high-frequency, redundant routine tasks, driving an essential evolution of their role from “reactive responders” to “designers and managers of intelligent systems”[
102]. In the AI 2.0 era, the core value of the physician will undergo a profound shift toward higher-order dimensions.
First, physicians will transition into roles as system architects and macro-regulators. They will be deeply involved in constructing the underlying logic of autonomous agents, utilizing clinical insights to define treatment pathways, set individualized intervention thresholds, and establish “red-flag” alert parameters[
103]. The focus of their function will shift from managing specific cases to monitoring the global performance of systems, a role akin to a captain overseeing an automated flight system based on navigational data[
104].
Second, physicians will assume the critical responsibility of final arbiters for complex and atypical cases. While AI agents excel at processing routine tasks aligned with statistical distributions, they remain highly dependent on the heuristic thinking and interdisciplinary insights of human experts when confronting OOD scenarios—such as rare diseases, multimorbidity, or atypical phenotypes—that transcend model cognition[
75,
102,
105].
More importantly, technological advancements will provide physicians with broader windows to return to the humanistic core of medicine: providing psychological support, explaining complex prognoses, and collaboratively establishing long-term health goals with patients[
103]. This empathy-based interaction remains the irreplaceable core value of healthcare. In summary, future medical education and professional development should be restructured around these role transitions, aiming to cultivate a new generation of clinical experts capable of navigating highly intelligent systems and possessing cross-domain collaborative capabilities[
106,
107].
Conclusion
The application of AI in dermatology is undergoing a historic leap from “perceptive intelligence” to “cognitive and actionable intelligence”, transitioning into the era of AI Dermatology 2.0. This review has systematically explored the four core pillars of this paradigm shift: causal inference endows AI with logical reasoning capabilities beyond superficial correlations, improving performance in complex diagnostic scenarios (such as rare diseases) by over 30% compared to traditional models; skin digital twins provide a prospective simulation dimension for individualized disease evolution and drug response by integrating multi-source heterogeneous data; predictive intervention shifts the clinical focus from “reactive treatment” to preemptive “risk interception”, achieving flare prediction accuracy with AUC > 0.90 in chronic disease management (e.g., AD); and distributed intelligence networks driven by autonomous AI agents reshape the organizational structure of healthcare, enabling full-lifecycle, continuous management.
The realization of this vision still faces severe multi-dimensional challenges, including interoperability bottlenecks for multi-source heterogeneous data, the lack of validation standards for model robustness and fidelity, and the systemic restructuring of ethico-legal frameworks regarding algorithmic fairness, data sovereignty, and liability. However, AI 2.0 is not intended to replace human physicians. Instead, it automates redundant tasks, driving the transition of physicians from “reactive responders” into commanders of intelligent systems and final decision-makers for complex challenges. AI Dermatology 2.0 will no longer be a mere superposition of technologies but a profound revolution integrating digital technology, biological insights, and ethical governance, ultimately advancing dermatology from “symptom-based treatment” toward the holistic vision of “life-cycle skin homeostasis management”.
The Author(s) 2026. This article is published by Higher Education Press on behalf of People’s Medical Publishing House.