2026-12-15 2026, Volume 20 Issue 12

  • Select all
  • RESEARCH ARTICLE
    Qian TAO, Xiyuan WANG, Muhan ZHANG, Shuxian HU, Wenyuan YU, Jingren ZHOU

    Graph neural networks (GNNs) have become a prevalent framework for graph tasks. Many recent studies have proposed the use of graph convolution methods over the numerous subgraphs of each graph, known as subgraph graph neural networks. Despite their impressive performance, subgraph GNNs face challenges of both storage and computational inefficiencies due to the vast number and large size of subgraphs. In response to this problem, this paper introduces Ego-Nets-Fit-All (ENFA), a model that uniformly takes the small ego nets as subgraphs, thereby providing greater storage and computational efficiency, while at the same time guarantees identical outputs to the original subgraph GNNs. Experiments reveal that ENFA can reduce storage space by 29.0% to 84.5% and improve training efficiency by up to 1.66 ×.

  • RESEARCH ARTICLE
    Hao CHEN, Junbo ZHAO

    Distributional shift between domains poses great challenges to modern machine learning algorithms. The domain generalization (DG) signifies a popular line targeting this issue, where these methods intend to uncover universal patterns across disparate distributions. Noted, the crucial challenge behind DG is the existence of irrelevant domain features, and most prior works overlook this information. Motivated by this, we propose a novel Contrastive-based Disentanglement method for Domain Generalization (CDDG), to effectively utilize the disentangled features to exploit the over-looked domain-specific features, and thus facilitate the extraction of the desired cross-domain category features for DG tasks. Specifically, CDDG learns to decouple inherent mutually exclusive features by leveraging them in the latent space, thus making the learning discriminative. Extensive experiments conducted on various benchmark datasets, including PACS, VLCS, Office-Home, TerraIncognita and DomainNet, have demonstrated the superiority of our method compared to other state-of-the-art approaches. Furthermore, visualization evaluations confirm the potential of our method in achieving effective feature disentanglement.

  • REVIEW ARTICLE
    Cong JIN, Jingru FAN, Jinfa HUANG, Jinyuan FU, Tao MEI, Li YUAN, Jiebo LUO

    Multimodal Foundation Models (MFMs), including diffusion models and multimodal large language models, have attracted significant interest owing to their scalable capabilities in tasks involving vision as well as vision-language understanding and generation. Despite the growing body of research focusing on MFMs’ advancements, a comprehensive review of their applications in the text-to-media domain remains limited. This review aims to bridge that gap by offering an exhaustive overview of MFMs’ development within the text-to-media landscape. We focus on four popular text prompt-based AI-generated content (AIGC) tasks: text-to-image, text-to-video, text-to-music, and text-to-motion generation. We delve into fundamental concepts, model architectures, training strategies, and dataset settings for each task. This work serves as a crucial resource for researchers aiming to develop text-to-media models using MFMs tailored to specific requirements. Moreover, we expose the evolution from traditional AIGC to Next-Gen AIGC by discussing the adaptation of advanced MFMs for text-to-media innovations. We identify existing challenges and outline future directions to help researchers gain deeper insights into the future trajectory of AIGC development and deployment. In summary, our review provides a comprehensive understanding of advancing the field of MFMs in text-to-media applications, which may have a far-reaching impact on the community.

  • RESEARCH ARTICLE
    Guohao LI, Hongyu YANG, Di HUANG, Yunhong WANG

    Recent advances in 3D head avatar generation combine 3D Gaussian Splatting (3DGS) with 3D Morphable Models (3DMM) to reconstruct animatable avatars from monocular video inputs. However, existing approaches exhibit two critical limitations: prohibitive storage requirements from per-primitive animation parameters and spherical harmonics (SH) coefficients, and compromised facial fidelity due to insufficient dynamic detail modeling. To address these challenges, we propose PBR-GAvatar, a novel framework featuring two key innovations: First, we develop hierarchical parametric adaptation that combines coarse 3DMM basis refinement via Low-Rank Adaptation (LoRA) with a lightweight Dynamic Detail Generator (DDG) producing expression-conditioned details. Second, we introduce a material decomposition paradigm that replaces SH coefficients with compact Physically Based Rendering (PBR) textures. Our framework jointly optimizes geometry, dynamics, and material properties through differentiable rendering. The proposed framework achieves a 20× size reduction (under 10 MB) compared with state-of-the-art methods, while demonstrating superior reconstruction fidelity on INSTA and GBS benchmarks. The PBR material system not only reduces storage demands but also supports photorealistic relighting under arbitrary illumination conditions. Our implementation will be made publicly available at the website of liguohao96.github.io/PBR-GAvatar/.

  • LETTER
    Donglin ZHOU, Weike PAN, Zhong MING
  • LETTER
    Junyi LI, Chenweinan JIANG, Daixin WANG, Guo YE, Libang ZHANG, Huimei HE, Binbin HU, Zhiqiang ZHANG, Fuzhen ZHUANG
  • LETTER
    Chen CHEN, Yunchun LI, Mingyuan XIA, Wei LI
  • REVIEW ARTICLE
    Zheng LI, Wentai ZHU, Haohui HUANG, Yu WANG, Linzhang WANG

    Automated driving systems (ADSs) have made significant strides in recent years through the combined efforts of academia and industry. A typical ADS is composed of various complex modules, including perception, planning, and control. As emerging and complex computer programs, ADSs inevitably contain flaws, making it crucial to ensure their safety since any unsafe behavior can result in catastrophic outcomes. Testing is widely recognized as a key approach to ensuring ADS safety by uncovering unsafe behaviors. However, designing effective testing techniques for ADSs is exceptionally challenging due to the high complexity and multidisciplinary nature of these systems. Although an extensive body of literature focuses on ADS testing and several surveys summarizing technical advancements have been published, most concentrate on system-level testing performed within software simulators. Consequently, they often overlook the distinct characteristics, testing requirements, and datasets associated with various ADS modules. In this paper, we present a comprehensive survey of existing ADS testing literature. We begin by investigating the testing infrastructure for ADSs, including available datasets and tools, detailing their capabilities and characteristics. We then survey testing techniques for individual ADS modules (e.g., AI-based modules and firmware) and the integrated system, highlighting technical differences between validation layers. Finally, based on our findings, we identify key challenges and outline potential research opportunities in the field.

  • LETTER
    Shaorong XIE, Han ZHANG, Xiangfeng LUO, Zhenyu ZHANG, Mengke WANG, Hang YU
  • RESEARCH ARTICLE
    Li SUN, Philip S. YU

    Graphs are ubiquitous, and graph learning has long been a fundamental topic in machine learning. While Graph Neural Networks (GNNs) have achieved remarkable results, they are typically designed for specific tasks or graphs, requiring retraining to adapt the diversity of real-world graph data. Recently, foundation models, such as Large Language Models (LLMs), have driven revolutionary progress in the language domain through universal pretraining. Their success has sparked growing interest in designing Graph Foundation Models (GFMs), a novel family of graph neural networks pre-trained on large-scale, diverse graph data, which are capable of supporting a wide range of downstream tasks on different graphs. Early efforts in the literature often adapt LLMs by transforming graph data into sequential representations. However, unlike word sequences in natural language, graphs are inherently non-Euclidean structures that encapsulate complex intercorrelations among entities. Existing GFMs often trivialize the structural diversity and complexity inherent in graph data. To address this gap, we propose to study graph foundation model from the perspective of Riemannian geometry, and design a novel Curvature-guided Riemannian Graph Foundation Model (CRGFM). To the best of our knowledge, CRGFM is the first GFM to introduce a curvature-based graph description along with geometric standardization. Specifically, to capture structural diversity, the input graph is represented using a mixture of geometric experts. A novel geometric standardization is then introduced via an augmented Lorentz transformation. To model structural complexity, we design a Riemannian graph transformer within a standardized product bundle that disentangles graph structure from node attributes. Finally, we introduce graph prompt learning on the manifold to bridge contrastive learning with downstream tasks. Extensive experiment on a diverse set of real-world graphs demonstrates the superiority of CRGFM.

Publishing model
1

{"submissionFirstDecision":"40","jcrJfStr":"4.6 (2024)","editorEmail":"zhangdf@hep.com.cn"}

Downloads

{"submissionFirstDecision":"40","jcrJfStr":"4.6 (2024)","editorEmail":"zhangdf@hep.com.cn"}
Monthly

ISSN 2095-2228 (Print)
ISSN 2095-2236 (Online)
CN 10-1014/TP