Frontiers of Computer Science

LETTER

ECOSight: an explainable graph AI tool for automated decision-making in timing ECO

Huiqing YOU, Xiaowei HE, Wencheng JIANG, Bo HU, Peiyun BIAN, Zexiang CHENG, Chaochao FENG, Daheng LE, Pengcheng HUANG, Chiyuan MA, Zhenyu ZHAO

2027, 21 (7): 2107207. https://doi.org/10.1007/s11704-026-51891-6

Download PDF

RESEARCH ARTICLE

Consistent-point: consistent pseudo-points for semi-supervised crowd counting and localization

Yuda ZOU, Zelong LIU, Yuliang GU, Bo DU, Yongchao XU

2027, 21 (7): 2107339. https://doi.org/10.1007/s11704-026-51063-6

Download PDF

Crowd counting and localization are critical for applications such as public security and traffic management. While existing methods have achieved impressive results, they rely heavily on extensive manual annotations. This paper proposes a novel point-localization-based semi-supervised crowd counting and localization method termed Consistent-Point. We identify and address two key inconsistencies of pseudo-points that have not been adequately explored. To enhance their position consistency, we aggregate the positions of neighboring auxiliary proposal-points, while an instance-wise uncertainty calibration is proposed to alleviate the class consistency of pseudo-points. By generating higher-quality pseudo-points with enhanced consistency, Consistent-Point provides more stable and effective supervision during training, yielding superior crowd counting and localization performance. Extensive experiments across five widely used datasets and three different labeled ratio settings demonstrate that our method achieves state-of-the-art performance in crowd localization while also attaining impressive crowd counting results.

RESEARCH ARTICLE

Perception-guided accuracy estimation: a universal framework for robust model evaluation

Hao SUN, Zhongyi HAN, Yilong YIN

2027, 21 (7): 2107340. https://doi.org/10.1007/s11704-026-51697-6

Download PDF

Model evaluation is crucial for ensuring machine learning models meet performance standards, becoming especially vital under distribution shifts where reliable deployment in dynamic, non-stationary environments requires robust evaluation strategies. Given the inaccessibility of supervised information about target domains, existing methods utilize statistical metrics or common patterns based on restrictive mathematical assumptions. Consequently, these conventional approaches often lead to task-specific overfitting and sub-optimal evaluation performance. To overcome these limitations, we propose Human-like Perception Training (HPT), a novel and universal framework that approaches model evaluation from a human-like visual perspective by focusing on feature-level insights. This approach offers strong universality and robustness while minimizing the reliance on strict mathematical assumptions. Specifically, HPT incorporates two novel modules: 1) The Human-like Perception Representing (HPR) module quantifies a given model’s representational capability by mimicking human visual perception, offering a distinct evaluation perspective. 2) Building on this perception representation, the Human-like Perception Mentoring (HPM) module guides the regression model to emulate human-like decisions through the incorporation of local perception priors and a novel coherent contrastive learning loss. Extensive experiments on standard benchmarks demonstrate that HPT achieves a strong correlation with true model accuracy, precisely estimates model performance, and significantly outperforms prior state-of-the-art methods.

RESEARCH ARTICLE

Label driven contrastive fusion for multi-view multi-label learning

Zun LI, Yi SHAN, Xiangning ZENG, Songxuan SHI, Gengyu LYU

2027, 21 (7): 2107342. https://doi.org/10.1007/s11704-026-52113-9

Download PDF

Multi-view multi-label learning (MVML) aims to leverage heterogeneous features and semantic labels for robust model learning. While existing methods often integrate cross-view consensus and view-specific information at the data level, they typically overlook the rich semantics of labels and the inherent view-label correspondences, leading to sub-optimal performance. In this paper, we propose a novel Label-Driven Contrastive Fusion (LDCF) method, which explicitly embeds multi-label semantic correlations into a contrastive fusion scheme with cross-view commonalities exploitation and view-specific individualities extraction. Specifically, LDCF first disentangles consensus and view-specific features through an orthogonal constraint. Then, a label-driven feature selector is designed to construct contrastive sample pairs based on inter-instance label similarities, by which both intra-view and inter-view contrastive learning are applied, pulling semantically similar pairs closer while pushing dissimilar ones apart, thus enhancing the feature discriminability. Finally, LDCF proposes to jointly optimize the orthogonal constraint, the multi-view contrastive learning objective, and the multi-label BCE loss function through a multi-head collaborative classification framework. Extensive experiments on multiple datasets demonstrate the superior performance of the proposed method against state-of-the-art approaches.

RESEARCH ARTICLE

Proxy robustness in vision language models is effortlessly transferable

Xiaowei FU, Fuxiang HUANG, Lei ZHANG

2027, 21 (7): 2107343. https://doi.org/10.1007/s11704-026-50951-1

Download PDF

As a pivotal technique for improving the defense of deep models, adversarial robustness transfer via distillation has demonstrated remarkable success in conventional image classification tasks. However, this paradigm encounters critical challenges when applied to vision-language models (VLM) (e.g., CLIP): constructing adversarially robust teacher for large-scale multi-modal models demands prohibitively high computational resources. We bridge this gap by revealing an interesting phenomenon: vanilla CLIP (without adversarial training) exhibits intrinsic defensive capabilities against adversarial examples generated by another CLIP with different architectures. We formally define this as proxy adversarial robustness, and naturally propose a Heterogeneous Proxy Transfer (HPT) framework that establishes cross-architectural robustness distillation channels between CLIP variants, effortlessly enabling the VLM robustness transfer from proxy to target models. Yet, such proxy transfer paradigm easily induces severe overfitting, leading to a sharp degradation in zero-shot natural generalization. To resolve that, we design Generalization-Pivot Decoupling (GPD) by leveraging the difference in learning rate scheduling. This decouples the proxy transfer process into a generalization-anchored warm-up that maintains generalization and a generalization-pulled HPT that promotes adversarial robustness, to achieve an equilibrium between natural generalization and adversarial robustness. Extensive experiments on 15 zero-shot datasets demonstrate the effectiveness of our HPT-GPD method. The code is available at the website of github.com/fxw13/HPT-GPD.

LETTER

Online multi-label streaming feature selection by label causal relationships and multigranulation fuzzy implication information

Jianhua DAI, Qiuyu WU

2027, 21 (7): 2107344. https://doi.org/10.1007/s11704-026-52218-1

Download PDF

LETTER

Trajectory alignment via diffusion models in cross-domain offline reinforcement learning

Yujia ZHANG, Lin LI, Jianguo WU, Ting GUO, Wei WEI, Jiye LIANG

2027, 21 (7): 2107345. https://doi.org/10.1007/s11704-026-52191-9

Download PDF

LETTER

PREP: input-aware expert pruning for efficient MoE deployment

Chaoran ZHANG, Lixin ZOU, Xixun LIN, Wen ZOU

2027, 21 (7): 2107346. https://doi.org/10.1007/s11704-026-52030-x

Download PDF

RESEARCH ARTICLE

PVDD: a practical benchmark dataset and network for video denoising

Xiaogang XU, Yitong YU, Nianjuan JIANG, Jiafei WU, Bei YU, Jiangbo LU, Jiaya JIA

2027, 21 (7): 2107707. https://doi.org/10.1007/s11704-025-50966-0

Download PDF

To facilitate video denoising research, we have developed a comprehensive dataset known as the “Practical Video Denoising Dataset” (PVDD), which comprises a total of 200 pairs of noisy-clean dynamic videos, available in both sRGB and RAW formats. In comparison to the existing datasets which are characterized by limited motion information, PVDD stands out by encompassing dynamic scenes that exhibit diverse and natural motion patterns. Different from datasets using primarily Gaussian or Poisson distributions to synthesize noise in the sRGB domain, PVDD synthesizes realistic noise from the RAW domain with a physically meaningful sensor noise model followed by ISP processing. Moreover, we also propose a new video denoising framework, called Recurrent Video Denoising Transformer (RVDT), which can achieve SOTA performance on PVDD and other current video denoising benchmarks. RVDT consists of both spatial and temporal transformer blocks to conduct denoising with long-range operations on the spatial dimension and long-term propagation on the temporal dimension. In particular, RVDT exploits the attention mechanism to implement the bi-directional feature propagation with both implicit and explicit temporal modeling. Extensive experiments demonstrate that 1) models trained on PVDD achieve superior denoising performance on many challenging real-world videos, outperforming models trained on other existing datasets; 2) in instances where models are trained on the same dataset, our proposed RVDT exhibits superior denoising capabilities when juxtaposed with alternative network architectures.

RESEARCH ARTICLE

RFG-GS: renderability field guided high-fidelity reconstruction for 3D buildings

Yongwei MIAO, Lingtao CHEN, Jianfei GE, Zhenghui HU, Jiangjian XIAO

2027, 21 (7): 2107708. https://doi.org/10.1007/s11704-026-51733-5

Download PDF

Reconstruction of large-scale 3D scenes using drone scanning is a significant research focus in computer graphics and 3D vision. To address limitations such as inadequate data coverage, low modeling accuracy, and insufficient rendering details during horizontal flight, we propose RFG-GS, a novel “structural scanning – fine reconstruction” framework based on 3D Gaussian Splatting (3D-GS). This approach guides drone scanning viewpoint planning and facilitates high-fidelity active 3D reconstruction of unknown complex building scenes, achieving full coverage through renderability fields. First, RGB images from initial drone oblique photography are used to extract color features of the underlying 3D buildings. After segmenting and clustering the structures, candidate viewpoints for a subsequent close-range scanning pass are generated at safe distances. A renderability field for this viewpoint set is then established, incorporating resolution, angular consistency, and geometric reliability metrics. Second, the renderability values of candidate viewpoints are computed to determine an optimal viewpoint set, which guides the secondary data collection. Finally, a KNN algorithm based on KD-Tree optimizes adaptive density control during gaussian sphere splitting, significantly improving splitting accuracy. Experimental results demonstrate that RFG-GS increases the structural similarity index (SSIM) of novel view synthesis images by approximately 3.0% compared with state-of-the-art (SOTA) methods. Our renderability-guided scheme provides an accurate, efficient, and robust solution for 3D building modeling.

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Please choose a citation manager