Frontiers of Computer Science

RESEARCH ARTICLE

Dynamic-EC: an efficient dynamic erasure coding method for permissioned blockchain systems

Mizhipeng ZHANG, Chentao WU, Jie LI, Minyi GUO

2025, 19(1): 191101. https://doi.org/10.1007/s11704-023-3209-3

Download PDF

Blockchain as a decentralized storage technology is widely used in many fields. It has extremely strict requirements for reliability because there are many potentially malicious nodes. Generally, blockchain is a chain storage structure formed by interconnecting blocks, which are stored by full replication method, where each node stores a replica of all blocks and the data consistency is maintained by the consensus protocol. To decrease the storage overhead, previous approaches such as BFT-Store and Partition Chain store blocks via erasure codes. However, existing erasure coding based methods utilize static encoding schema to tolerant f malicious nodes, but in the typical cases, the number of malicious nodes is much smaller than f as described in previous literatures. Using redundant parities to tolerate excessive malicious nodes introduces unnecessary storage overhead.

To solve the above problem, we propose Dynamic-EC, which is a Dynamic Erasure Coding method in permissioned blockchain systems. The key idea of Dynamic-EC is to reduce the storage overhead by dynamically adjusting the total number of parities according to the risk level of the whole system, which is determined by the number of perceived malicious nodes, while ensuring the system reliability. To demonstrate the effectiveness of Dynamic-EC, we conduct several experiments on an open source blockchain software Tendermint. The results show that, compared to the state-of-the-art erasure coding methods, Dynamic-EC reduces the storage overhead by up to 42%, and decreases the average write latency of blocks by up to 25%, respectively.

RESEARCH ARTICLE

BAFT: bubble-aware fault-tolerant framework for distributed DNN training with hybrid parallelism

Runzhe CHEN, Guandong LU, Yakai WANG, Rui ZHANG, Zheng HU, Yanming MIAO, Zhifang CAI, Jingwen LENG, Minyi GUO

2025, 19(1): 191102. https://doi.org/10.1007/s11704-023-3401-5

Download PDF

As deep neural networks (DNNs) have been successfully adopted in various domains, the training of these large-scale models becomes increasingly difficult and is often deployed on compute clusters composed of many devices like GPUs. However, as the size of the cluster increases, so does the possibility of failures during training. Currently, faults are mainly handled by recording checkpoints and recovering, but this approach causes large overhead and affects the training efficiency even when no error occurs. The low checkpointing frequency leads to a large loss of training time, while the high recording frequency affects the training efficiency. To solve this contradiction, we propose BAFT, a bubble-aware fault tolerant framework for hybrid parallel distributed training. BAFT can automatically analyze parallel strategies, profile the runtime information, and schedule checkpointing tasks at the granularity of pipeline stage depending on the bubble distribution in the training. It supports higher checkpoint efficiency and only introduces less than 1% time overhead, which allows us to record checkpoints at high frequency, thereby reducing the time loss in error recovery and avoiding the impact of fault tolerance on training.

RESEARCH ARTICLE

RVAM16: a low-cost multiple-ISA processor based on RISC-V and ARM Thumb

Libo HUANG, Jing ZHANG, Ling YANG, Sheng MA, Yongwen WANG, Yuanhu CHENG

2025, 19(1): 191103. https://doi.org/10.1007/s11704-023-3239-x

Download PDF

The rapid development of ISAs has brought the issue of software compatibility to the forefront in the embedded field. To address this challenge, one of the promising solutions is the adoption of a multiple-ISA processor that supports multiple different ISAs. However, due to constraints in cost and performance, the architecture of a multiple-ISA processor must be carefully optimized to meet the specific requirements of embedded systems. By exploring the RISC-V and ARM Thumb ISAs, this paper proposes RVAM16, which is an optimized multiple-ISA processor microarchitecture for embedded devices based on hardware binary translation technique. The results show that, when running non-native ARM Thumb programs, RVAM16 achieves a significant speedup of over 2.73× with less area and energy consumption compared to using hardware binary translation alone, reaching more than 70% of the performance of native RISC-V programs.

RESEARCH ARTICLE

ICCG: low-cost and efficient consistency with adaptive synchronization for metadata replication

Chenhao ZHANG, Liang WANG, Jing SHANG, Zhiwen XIAO, Limin XIAO, Meng HAN, Bing WEI, Runnan SHEN, Jinquan WANG

2025, 19(1): 191105. https://doi.org/10.1007/s11704-023-2772-y

Download PDF

The rapid growth in the storage scale of wide-area distributed file systems (DFS) calls for fast and scalable metadata management. Metadata replication is the widely used technique for improving the performance and scalability of metadata management. Because of the POSIX requirement of file systems, many existing metadata management techniques utilize a costly design for the sake of metadata consistency, leading to unacceptable performance overhead. We propose a new metadata consistency maintenance method (ICCG), which includes an incremental consistency guaranteed directory tree synchronization (ICGDT) and a causal consistency guaranteed replica index synchronization (CCGRI), to ensure system performance without sacrificing metadata consistency. ICGDT uses a flexible consistency scheme based on the state of files and directories maintained through the conflict state tree to provide an incremental consistency for metadata, which satisfies both metadata consistency and performance requirements. CCGRI ensures low latency and consistent access to data by establishing a causal consistency for replica indexes through multi-version extent trees and logical time. Experimental results demonstrate the effectiveness of our methods. Compared with the strong consistency policies widely used in modern DFSes, our methods significantly improve the system performance. For example, in file creation, ICCG can improve the performance of directory tree operations by at least 36.4 times.

REVIEW ARTICLE

Performance issue monitoring, identification and diagnosis of SaaS software: a survey

Rui WANG, Xiangbo TIAN, Shi YING

2025, 19(1): 191201. https://doi.org/10.1007/s11704-023-2701-0

Download PDF

SaaS (Software-as-a-Service) is a service model provided by cloud computing. It has a high requirement for QoS (Quality of Software) due to its method of providing software service. However, manual identification and diagnosis for performance issues is typically expensive and laborious because of the complexity of the application software and the dynamic nature of the deployment environment. Recently, substantial research efforts have been devoted to automatically identifying and diagnosing performance issues of SaaS software. In this survey, we comprehensively review the different methods about automatically identifying and diagnosing performance issues of SaaS software. We divide them into three steps according to their function: performance log generation, performance issue identification and performance issue diagnosis. We then comprehensively review these methods by their development history. Meanwhile, we give our proposed solution for each step. Finally, the effectiveness of our proposed methods is shown by experiments.

RESEARCH ARTICLE

Code context-based reviewer recommendation

Dawei YUAN, Xiao PENG, Zijie CHEN, Tao ZHANG, Ruijia LEI

2025, 19(1): 191202. https://doi.org/10.1007/s11704-023-3256-9

Download PDF

Code review is a critical process in software development, contributing to the overall quality of the product by identifying errors early. A key aspect of this process is the selection of appropriate reviewers to scrutinize changes made to source code. However, in large-scale open-source projects, selecting the most suitable reviewers for a specific change can be a challenging task. To address this, we introduce the Code Context Based Reviewer Recommendation (CCB-RR), a model that leverages information from changesets to recommend the most suitable reviewers. The model takes into consideration the paths of modified files and the context derived from the changesets, including their titles and descriptions. Additionally, CCB-RR employs KeyBERT to extract the most relevant keywords and compare the semantic similarity across changesets. The model integrates the paths of modified files, keyword information, and the context of code changes to form a comprehensive picture of the changeset. We conducted extensive experiments on four open-source projects, demonstrating the effectiveness of CCB-RR. The model achieved a Top-1 accuracy of 60%, 55%, 51%, and 45% on the Android, OpenStack, QT, and LibreOffice projects respectively. For Mean Reciprocal Rank (MRR), CCB achieved 71%, 62%, 52%, and 68% on the same projects respectively, thereby highlighting its potential for practical application in code reviewer recommendation.

RESEARCH ARTICLE

New logarithmic step size for stochastic gradient descent

Mahsa Soheil SHAMAEE, Sajad Fathi HAFSHEJANI, Zeinab SAEIDIAN

2025, 19(1): 191301. https://doi.org/10.1007/s11704-023-3245-z

Download PDF

In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an $O (1 T)$ convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the FashionMinst, CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by 0.9% for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.

RESEARCH ARTICLE

KD-Crowd: a knowledge distillation framework for learning from crowds

Shaoyuan LI, Yuxiang ZHENG, Ye SHI, Shengjun HUANG, Songcan CHEN

2025, 19(1): 191302. https://doi.org/10.1007/s11704-023-3578-7

Download PDF

Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate each worker’s expertise, and aggregate over them to infer the latent true labels. In this paper, we show that as one of the major research directions, the noise transition matrix based worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation. To solve this problem, we propose a knowledge distillation framework (KD-Crowd) by combining the complementary strength of noise-model-free robust learning techniques and transition matrix based worker expertise modeling. The framework consists of two stages: in Stage 1, a noise-model-free robust student model is trained by treating the prediction of a transition matrix based crowdsourcing teacher model as noisy labels, aiming at correcting the teacher’s mistakes and obtaining better true label predictions; in Stage 2, we switch their roles, retraining a better crowdsourcing model using the crowds’ annotations supervised by the refined true label predictions given by Stage 1. Additionally, we propose one f-mutual information gain ( $M I G f$ ) based knowledge distillation loss, which finds the maximum information intersection between the student’s and teacher’s prediction. We show in experiments that $M I G f$ achieves obvious improvements compared to the regular KL divergence knowledge distillation loss, which tends to force the student to memorize all information of the teacher’s prediction, including its errors. We conduct extensive experiments showing that, as a universal framework, KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker expertise estimation.

RESEARCH ARTICLE

A data representation method using distance correlation

Xinyan LIANG, Yuhua QIAN, Qian GUO, Keyin ZHENG

2025, 19(1): 191303. https://doi.org/10.1007/s11704-023-3396-y

Download PDF

Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed data is undoubtedly higher than that of original data, and adopted association measure method does not well balance effectiveness and efficiency. To address above two issues, this paper proposes a novel association-based representation improvement method, named as AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. The effectiveness of AssoRep is validated on 120 datasets and the fruits further prefect our previous work on the association data reconstruction.

RESEARCH ARTICLE

Revisiting multi-dimensional classification from a dimension-wise perspective

Yi SHI, Hanjia YE, Dongliang MAN, Xiaoxu HAN, Dechuan ZHAN, Yuan JIANG

2025, 19(1): 191304. https://doi.org/10.1007/s11704-023-3272-9

Download PDF

Real-world objects exhibit intricate semantic properties that can be characterized from a multitude of perspectives, which necessitates the development of a model capable of discerning multiple patterns within data, while concurrently predicting several Labeling Dimensions (LDs) — a task known as Multi-dimensional Classification (MDC). While the class imbalance issue has been extensively investigated within the multi-class paradigm, its study in the MDC context has been limited due to the imbalance shift phenomenon. A sample’s classification as a minor or major class instance becomes ambiguous when it belongs to a minor class in one LD and a major class in another. Previous MDC methodologies predominantly emphasized instance-wise criteria, neglecting prediction capabilities from a dimension aspect, i.e., the average classification performance across LDs. We assert the significance of dimension-wise metrics in real-world MDC applications and introduce two such metrics. Furthermore, we observe imbalanced class distributions within each LD and propose a novel Imbalance-Aware fusion Model (IMAM) for addressing the MDC problem. Specifically, we first decompose the task into multiple multi-class classification problems, creating imbalance-aware deep models for each LD separately. This straightforward method performs well across LDs without sacrificing performance in instance-wise criteria. Subsequently, we employ LD-wise models as multiple teachers and transfer their knowledge across all LDs to a unified student model. Experimental results on several real-world datasets demonstrate that our IMAM approach excels in both instance-wise evaluations and the proposed dimension-wise metrics.

RESEARCH ARTICLE

D²-GCN: a graph convolutional network with dynamic disentanglement for node classification

Shangwei WU, Yingtong XIONG, Hui LIANG, Chuliang WENG

2025, 19(1): 191305. https://doi.org/10.1007/s11704-023-3339-7

Download PDF

Classic Graph Convolutional Networks (GCNs) often learn node representation holistically, which ignores the distinct impacts from different neighbors when aggregating their features to update a node’s representation. Disentangled GCNs have been proposed to divide each node’s representation into several feature units. However, current disentangling methods do not try to figure out how many inherent factors the model should assign to help extract the best representation of each node. This paper then proposes D²-GCN to provide dynamic disentanglement in GCNs and present the most appropriate factorization of each node’s mixed features. The convergence of the proposed method is proved both theoretically and experimentally. Experiments on real-world datasets show that D²-GCN outperforms the baseline models concerning node classification results in both single- and multi-label tasks.

LETTER

Foundation model enhanced derivative-free cognitive diagnosis

Mingjia LI, Hong QIAN, Jinglan LV, Mengliang HE, Wei ZHANG, Aimin ZHOU

2025, 19(1): 191318. https://doi.org/10.1007/s11704-024-40029-1

Download PDF

LETTER

Debiasing vision-language models for vision tasks: a survey

Beier ZHU, Hanwang ZHANG

2025, 19(1): 191321. https://doi.org/10.1007/s11704-024-40051-3

Download PDF

LETTER

Towards practical data alignment in production federated learning

Yexuan SHI, Wei YU, Yuanyuan ZHANG, Chunbo XUE, Yuxiang ZENG, Zimu ZHOU, Manxue GUO, Lun XIN, Wenjing NIE

2025, 19(1): 191603. https://doi.org/10.1007/s11704-024-3936-0

Download PDF

LETTER

Adaptive network combination for single-image reflection removal: a domain generalization perspective

Ming LIU, Jianan PAN, Zifei YAN, Wangmeng ZUO, Lei ZHANG

2025, 19(1): 191703. https://doi.org/10.1007/s11704-024-3582-6

Download PDF

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Please choose a citation manager