Autonomous agents have long been a research focus in academic and industry communities. Previous research often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of Web knowledge, large language models (LLMs) have shown potential in human-level intelligence, leading to a surge in research on LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of LLM-based autonomous agents from a holistic perspective. We first discuss the construction of LLM-based autonomous agents, proposing a unified framework that encompasses much of previous work. Then, we present a overview of the diverse applications of LLM-based autonomous agents in social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field.
Model-based reinforcement learning is a promising direction to improve the sample efficiency of reinforcement learning with learning a model of the environment. Previous model learning methods aim at fitting the transition data, and commonly employ a supervised learning approach to minimize the distance between the predicted state and the real state. The supervised model learning methods, however, diverge from the ultimate goal of model learning, i.e., optimizing the learned-in-the-model policy. In this work, we investigate how model learning and policy learning can share the same objective of maximizing the expected return in the real environment. We find model learning towards this objective can result in a target of enhancing the similarity between the gradient on generated data and the gradient on the real data. We thus derive the gradient of the model from this target and propose the Model Gradient algorithm (MG) to integrate this novel model learning approach with policy-gradient-based policy optimization. We conduct experiments on multiple locomotion control tasks and find that MG can not only achieve high sample efficiency but also lead to better convergence performance compared to traditional model-based reinforcement learning approaches.
Graphs that are used to model real-world entities with vertices and relationships among entities with edges, have proven to be a powerful tool for describing real-world problems in applications. In most real-world scenarios, entities and their relationships are subject to constant changes. Graphs that record such changes are called dynamic graphs. In recent years, the widespread application scenarios of dynamic graphs have stimulated extensive research on dynamic graph processing systems that continuously ingest graph updates and produce up-to-date graph analytics results. As the scale of dynamic graphs becomes larger, higher performance requirements are demanded to dynamic graph processing systems. With the massive parallel processing power and high memory bandwidth, GPUs become mainstream vehicles to accelerate dynamic graph processing tasks. GPU-based dynamic graph processing systems mainly address two challenges: maintaining the graph data when updates occur (i.e., graph updating) and producing analytics results in time (i.e., graph computing). In this paper, we survey GPU-based dynamic graph processing systems and review their methods on addressing both graph updating and graph computing. To comprehensively discuss existing dynamic graph processing systems on GPUs, we first introduce the terminologies of dynamic graph processing and then develop a taxonomy to describe the methods employed for graph updating and graph computing. In addition, we discuss the challenges and future research directions of dynamic graph processing on GPUs.
Significant progress has been made in machine learning with large amounts of clean labels and static data. However, in many real-world applications, the data often changes with time and it is difficult to obtain massive clean annotations, that is, noisy labels and time series are faced simultaneously. For example, in product-buyer evaluation, each sample records the daily time behavior of users, but the long transaction period brings difficulties to analysis, and salespeople often erroneously annotate the user’s purchase behavior. Such a novel setting, to our best knowledge, has not been thoroughly studied yet, and there is still a lack of effective machine learning methods. In this paper, we present a systematic approach RTS both theoretically and empirically, consisting of two components, Noise-Tolerant Time Series Representation and Purified Oversampling Learning. Specifically, we propose reducing label noise’s destructive impact to obtain robust feature representations and potential clean samples. Then, a novel learning method based on the purified data and time series oversampling is adopted to train an unbiased model. Theoretical analysis proves that our proposal can improve the quality of the noisy data set. Empirical experiments on diverse tasks, such as the house-buyer evaluation task from real-world applications and various benchmark tasks, clearly demonstrate that our new algorithm robustly outperforms many competitive methods.
Drug side effects have become paramount concerns in drug safety research, ranking as the fourth leading cause of mortality following cardiovascular diseases, cancer, and infectious diseases. Simultaneously, the widespread use of multiple prescription and over-the-counter medications by many patients in their daily lives has heightened the occurrence of side effects resulting from Drug-Drug Interactions (DDIs). Traditionally, assessments of drug side effects relied on resource-intensive and time-consuming laboratory experiments. However, recent advancements in bioinformatics and the rapid evolution of artificial intelligence technology have led to the accumulation of extensive biomedical data. Based on this foundation, researchers have developed diverse machine learning methods for discovering and detecting drug side effects. This paper provides a comprehensive overview of recent advancements in predicting drug side effects, encompassing the entire spectrum from biological data acquisition to the development of sophisticated machine learning models. The review commences by elucidating widely recognized datasets and Web servers relevant to the field of drug side effect prediction. Subsequently, The study delves into machine learning methods customized for binary, multi-class, and multi-label classification tasks associated with drug side effects. These methods are applied to a variety of representative computational models designed for identifying side effects induced by single drugs and DDIs. Finally, the review outlines the challenges encountered in predicting drug side effects using machine learning approaches and concludes by illuminating important future research directions in this dynamic field.
In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an
The rapid development of ISAs has brought the issue of software compatibility to the forefront in the embedded field. To address this challenge, one of the promising solutions is the adoption of a multiple-ISA processor that supports multiple different ISAs. However, due to constraints in cost and performance, the architecture of a multiple-ISA processor must be carefully optimized to meet the specific requirements of embedded systems. By exploring the RISC-V and ARM Thumb ISAs, this paper proposes RVAM16, which is an optimized multiple-ISA processor microarchitecture for embedded devices based on hardware binary translation technique. The results show that, when running non-native ARM Thumb programs, RVAM16 achieves a significant speedup of over 2.73× with less area and energy consumption compared to using hardware binary translation alone, reaching more than 70% of the performance of native RISC-V programs.
Recently advancements in deep learning models have significantly facilitated the development of sequential recommender systems (SRS). However, the current deep model structures are limited in their ability to learn high-quality embeddings with insufficient data. Meanwhile, highly skewed long-tail distribution is very common in recommender systems. Therefore, in this paper, we focus on enhancing the representation of tail items to improve sequential recommendation performance. Through empirical studies on benchmarks, we surprisingly observe that both the ranking performance and training procedure are greatly hindered by the poorly optimized tail item embeddings. To address this issue, we propose a sequential recommendation framework named TailRec that enables contextual information of tail item well-leveraged and greatly improves its corresponding representation. Given the characteristics of the sequential recommendation task, the surrounding interaction records of each tail item are regarded as contextual information without leveraging any additional side information. This approach allows for the mining of contextual information from cross-sequence behaviors to boost the performance of sequential recommendations. Such a light contextual filtering component is plug-and-play for a series of SRS models. To verify the effectiveness of the proposed TailRec, we conduct extensive experiments over several popular benchmark recommenders. The experimental results demonstrate that TailRec can greatly improve the recommendation results and speed up the training process. The codes of our methods have been available
See github.com/Mingyue-Cheng/TailRec website.
.Recent years have seen the wide application of natural language processing (NLP) models in crucial areas such as finance, medical treatment, and news media, raising concerns about the model robustness and vulnerabilities. We find that prompt paradigm can probe special robust defects of pre-trained language models. Malicious prompt texts are first constructed for inputs and a pre-trained language model can generate adversarial examples for victim models via mask-filling. Experimental results show that prompt paradigm can efficiently generate more diverse adversarial examples besides synonym substitution. Then, we propose a novel robust training approach based on prompt paradigm which incorporates prompt texts as the alternatives to adversarial examples and enhances robustness under a lightweight minimax-style optimization framework. Experiments on three real-world tasks and two deep neural models show that our approach can significantly improve the robustness of models to resist adversarial attacks.
Deterministic databases are able to reduce coordination costs in a replication. This property has fostered a significant interest in the design of efficient deterministic concurrency control protocols. However, the state-of-the-art deterministic concurrency control protocol Aria has three issues. First, it is impractical to configure a suitable batch size when the read-write set is unknown. Second, Aria running in low-concurrency scenarios, e.g., a single-thread scenario, suffers from the same conflicts as running in high-concurrency scenarios. Third, the single-version schema brings write-after-write conflicts.
To address these issues, we propose Gria, an efficient deterministic concurrency control protocol. Gria has the following properties. First, the batch size of Gria is auto-scaling. Second, Gria’s conflict probability in low-concurrency scenarios is lower than that in high-concurrency scenarios. Third, Gria has no write-after-write conflicts by adopting a multi-version structure. To further reduce conflicts, we propose two optimizations: a reordering mechanism as well as a rechecking strategy. The evaluation result on two popular benchmarks shows that Gria outperforms Aria by 13x.
Graph convolutional networks (GCNs) have become prevalent in recommender system (RS) due to their superiority in modeling collaborative patterns. Although improving the overall accuracy, GCNs unfortunately amplify popularity bias — tail items are less likely to be recommended. This effect prevents the GCN-based RS from making precise and fair recommendations, decreasing the effectiveness of recommender systems in the long run.
In this paper, we investigate how graph convolutions amplify the popularity bias in RS. Through theoretical analyses, we identify two fundamental factors: (1) with graph convolution (i.e., neighborhood aggregation), popular items exert larger influence than tail items on neighbor users, making the users move towards popular items in the representation space; (2) after multiple times of graph convolution, popular items would affect more high-order neighbors and become more influential. The two points make popular items get closer to almost users and thus being recommended more frequently. To rectify this, we propose to estimate the amplified effect of popular nodes on each node’s representation, and intervene the effect after each graph convolution. Specifically, we adopt clustering to discover highly-influential nodes and estimate the amplification effect of each node, then remove the effect from the node embeddings at each graph convolution layer. Our method is simple and generic — it can be used in the inference stage to correct existing models rather than training a new model from scratch, and can be applied to various GCN models. We demonstrate our method on two representative GCN backbones LightGCN and UltraGCN, verifying its ability in improving the recommendations of tail items without sacrificing the performance of popular items. Codes are open-sourced
See github.com/MEICRS/DAP website.
.Deep learning has gained superior accuracy on Euclidean structure data in neural networks. As a result, non-Euclidean structure data, such as graph data, has more sophisticated structural information, which can be applied in neural networks as well to address more complex and practical problems. However, actual graph data obeys a power-law distribution, so the adjacent matrix of a graph is random and sparse. Graph processing accelerator (GPA) is designed to handle the problems above. However, graph computing only processes 1-dimensional data. In graph neural networks (GNNs), graph data is multi-dimensional. Consequently, GNNs include the execution processes of both traditional graph processing and neural network, which have irregular memory access and regular computation, respectively. To obtain more information in graph data and require better model generalization ability, the layers of GNN are deeper, so the overhead of memory access and computation is considerable. At present, GNN accelerators are designed to deal with this issue. In this paper, we conduct a systematic survey regarding the design and implementation of GNN accelerators. Specifically, we review the challenges faced by GNN accelerators, and existing related works in detail to process them. Finally, we evaluate previous works and propose future directions in this booming field.
Information Extraction (IE) aims to extract structural knowledge from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. As a result, numerous works have been proposed to integrate LLMs for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and techniques, and then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on a thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related works and resources on GitHub (LLM4IE repository).
Learning modality-fused representations and processing unaligned multimodal sequences are meaningful and challenging in multimodal emotion recognition. Existing approaches use directional pairwise attention or a message hub to fuse language, visual, and audio modalities. However, these fusion methods are often quadratic in complexity with respect to the modal sequence length, bring redundant information and are not efficient. In this paper, we propose an efficient neural network to learn modality-fused representations with CB-Transformer (LMR-CBT) for multimodal emotion recognition from unaligned multi-modal sequences. Specifically, we first perform feature extraction for the three modalities respectively to obtain the local structure of the sequences. Then, we design an innovative asymmetric transformer with cross-modal blocks (CB-Transformer) that enables complementary learning of different modalities, mainly divided into local temporal learning, cross-modal feature fusion and global self-attention representations. In addition, we splice the fused features with the original features to classify the emotions of the sequences. Finally, we conduct word-aligned and unaligned experiments on three challenging datasets, IEMOCAP, CMU-MOSI, and CMU-MOSEI. The experimental results show the superiority and efficiency of our proposed method in both settings. Compared with the mainstream methods, our approach reaches the state-of-the-art with a minimum number of parameters.
Entity alignment (EA) is an important technique aiming to find the same real entity between two different source knowledge graphs (KGs). Current methods typically learn the embedding of entities for EA from the structure of KGs for EA. Most EA models are designed for rich-resource languages, requiring sufficient resources such as a parallel corpus and pre-trained language models. However, low-resource language KGs have received less attention, and current models demonstrate poor performance on those low-resource KGs. Recently, researchers have fused relation information and attributes for entity representations to enhance the entity alignment performance, but the relation semantics are often ignored. To address these issues, we propose a novel Semantic-aware Graph Neural Network (SGNN) for entity alignment. First, we generate pseudo sentences according to the relation triples and produce representations using pre-trained models. Second, our approach explores semantic information from the connected relations by a graph neural network. Our model captures expanded feature information from KGs. Experimental results using three low-resource languages demonstrate that our proposed SGNN approach out performs better than state-of-the-art alignment methods on three proposed datasets and three public datasets.
The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.
With the recent demonstration of quantum computers, interests in the field of reversible logic synthesis and optimization have taken a different turn. As every quantum operation is inherently reversible, there is an immense motivation for exploring reversible circuit design and optimization. When it comes to faults in circuits, the parity-preserving feature donates to the detection of permanent and temporary faults. In the context of reversible circuits, the parity-preserving property ensures that the input and output parities are equal. In this paper we suggest six parity-preserving reversible blocks (Z, F, A, T, S, and L) with improved quantum cost. The reversible blocks are synthesized using an existing synthesis method that generates a netlist of multiple-control Toffoli (MCT) gates. Various optimization rules are applied at the reversible circuit level, followed by transformation into a netlist of elementary quantum gates from the NCV library. The designs of full-adder and unsigned and signed multipliers are proposed using the functional blocks that possess parity-preserving properties. The proposed designs are compared with state-of-the-art methods and found to be better in terms of cost of realization. Average savings of 25.04%, 20.89%, 21.17%, and 51.03%, and 18.59%, 13.82%, 13.82%, and 27.65% respectively, are observed for 4-bit unsigned and 5-bit signed multipliers in terms of quantum cost, garbage output, constant input, and gate count as compared to recent works.
In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels from different crowd workers and then infer its integrated label via label aggregation. In spite of the effectiveness of label aggregation methods, there still remains a certain level of noise in the integrated labels. Thus, some noise correction methods have been proposed to reduce the impact of noise in recent years. However, to the best of our knowledge, existing methods rarely consider an instance’s information from both its features and multiple noisy labels simultaneously when identifying a noise instance. In this study, we argue that the more distinguishable an instance’s features but the noisier its multiple noisy labels, the more likely it is a noise instance. Based on this premise, we propose a label distribution similarity-based noise correction (LDSNC) method. To measure whether an instance’s features are distinguishable, we obtain each instance’s predicted label distribution by building multiple classifiers using instances’ features and their integrated labels. To measure whether an instance’s multiple noisy labels are noisy, we obtain each instance’s multiple noisy label distribution using its multiple noisy labels. Then, we use the Kullback-Leibler (KL) divergence to calculate the similarity between the predicted label distribution and multiple noisy label distribution and define the instance with the lower similarity as a noise instance. The extensive experimental results on 34 simulated and four real-world crowdsourced datasets validate the effectiveness of our method.
Exploration strategy design is a challenging problem in reinforcement learning (RL), especially when the environment contains a large state space or sparse rewards. During exploration, the agent tries to discover unexplored (novel) areas or high reward (quality) areas. Most existing methods perform exploration by only utilizing the novelty of states. The novelty and quality in the neighboring area of the current state have not been well utilized to simultaneously guide the agent’s exploration. To address this problem, this paper proposes a novel RL framework, called clustered reinforcement learning (CRL), for efficient exploration in RL. CRL adopts clustering to divide the collected states into several clusters, based on which a bonus reward reflecting both novelty and quality in the neighboring area (cluster) of the current state is given to the agent. CRL leverages these bonus rewards to guide the agent to perform efficient exploration. Moreover, CRL can be combined with existing exploration strategies to improve their performance, as the bonus rewards employed by these existing exploration strategies solely capture the novelty of states. Experiments on four continuous control tasks and six hard-exploration Atari-2600 games show that our method can outperform other state-of-the-art methods to achieve the best performance.
A large body of research effort has been dedicated to automated issue classification for Issue Tracking Systems (ITSs). Although the existing approaches have shown promising performance, the different design choices, including the different textual fields, feature representation methods and machine learning algorithms adopted by existing approaches, have not been comprehensively compared and analyzed. To fill this gap, we perform the first extensive study of automated issue classification on 9 state-of-the-art issue classification approaches. Our experimental results on the widely studied dataset reveal multiple practical guidelines for automated issue classification, including: (1) Training separate models for the issue titles and descriptions and then combining these two models tend to achieve better performance for issue classification; (2) Word embedding with Long Short-Term Memory (LSTM) can better extract features from the textual fields in the issues, and hence, lead to better issue classification models; (3) There exist certain terms in the textual fields that are helpful for building more discriminating classifiers between bug and non-bug issues; (4) The performance of the issue classification model is not sensitive to the choices of ML algorithms. Based on our study outcomes, we further propose an advanced issue classification approach, DEEPLABEL, which can achieve better performance compared with the existing issue classification approaches.
IP geolocation is essential for the territorial analysis of sensitive network entities, location-based services (LBS) and network fraud detection. It has important theoretical significance and application value. Measurement-based IP geolocation is a hot research topic. However, the existing IP geolocation algorithms cannot effectively utilize the distance characteristics of the delay, and the nodes’ connection relation, resulting in high geolocation error. It is challenging to obtain the mapping between delay, nodes’ connection relation, and geographical location. Based on the idea of network representation learning, we propose a representation learning model for IP nodes (IP2vec for short) and apply it to street-level IP geolocation. IP2vec model vectorizes nodes according to the connection relation and delay between nodes so that the IP vectors can reflect the distance and topological proximity between IP nodes. The steps of the street-level IP geolocation algorithm based on IP2vec model are as follows: Firstly, we measure landmarks and target IP to obtain delay and path information to construct the network topology. Secondly, we use the IP2vec model to obtain the IP vectors from the network topology. Thirdly, we train a neural network to fit the mapping relation between vectors and locations of landmarks. Finally, the vector of target IP is fed into the neural network to obtain the geographical location of target IP. The algorithm can accurately infer geographical locations of target IPs based on delay and topological proximity embedded in the IP vectors. The cross-validation experimental results on 10023 target IPs in New York, Beijing, Hong Kong, and Zhengzhou demonstrate that the proposed algorithm can achieve street-level geolocation. Compared with the existing algorithms such as Hop-Hot, IP-geolocater and SLG, the mean geolocation error of the proposed algorithm is reduced by 33%, 39%, and 51%, respectively.
Learning-outcome prediction (LOP) is a long-standing and critical problem in educational routes. Many studies have contributed to developing effective models while often suffering from data shortage and low generalization to various institutions due to the privacy-protection issue. To this end, this study proposes a distributed grade prediction model, dubbed FecMap, by exploiting the federated learning (FL) framework that preserves the private data of local clients and communicates with others through a global generalized model. FecMap considers local subspace learning (LSL), which explicitly learns the local features against the global features, and multi-layer privacy protection (MPP), which hierarchically protects the private features, including model-shareable features and not-allowably shared features, to achieve client-specific classifiers of high performance on LOP per institution. FecMap is then achieved in an iteration manner with all datasets distributed on clients by training a local neural network composed of a global part, a local part, and a classification head in clients and averaging the global parts from clients on the server. To evaluate the FecMap model, we collected three higher-educational datasets of student academic records from engineering majors. Experiment results manifest that FecMap benefits from the proposed LSL and MPP and achieves steady performance on the task of LOP, compared with the state-of-the-art models. This study makes a fresh attempt at the use of federated learning in the learning-analytical task, potentially paving the way to facilitating personalized education with privacy protection.
Random sample partition (RSP) is a newly developed big data representation and management model to deal with big data approximate computation problems. Academic research and practical applications have confirmed that RSP is an efficient solution for big data processing and analysis. However, a challenge for implementing RSP is determining an appropriate sample size for RSP data blocks. While a large sample size increases the burden of big data computation, a small size will lead to insufficient distribution information for RSP data blocks. To address this problem, this paper presents a novel density estimation-based method (DEM) to determine the optimal sample size for RSP data blocks. First, a theoretical sample size is calculated based on the multivariate Dvoretzky-Kiefer-Wolfowitz (DKW) inequality by using the fixed-point iteration (FPI) method. Second, a practical sample size is determined by minimizing the validation error of a kernel density estimator (KDE) constructed on RSP data blocks for an increasing sample size. Finally, a series of persuasive experiments are conducted to validate the feasibility, rationality, and effectiveness of DEM. Experimental results show that (1) the iteration function of the FPI method is convergent for calculating the theoretical sample size from the multivariate DKW inequality; (2) the KDE constructed on RSP data blocks with sample size determined by DEM can yield a good approximation of the probability density function (p.d.f.); and (3) DEM provides more accurate sample sizes than the existing sample size determination methods from the perspective of p.d.f. estimation. This demonstrates that DEM is a viable approach to deal with the sample size determination problem for big data RSP implementation.
Accurately predicting the Remaining Useful Life (RUL) of lithium-ion batteries is crucial for battery management systems. Deep learning-based methods have been shown to be effective in predicting RUL by leveraging battery capacity time series data. However, the representation learning of features such as long-distance sequence dependencies and mutations in capacity time series still needs to be improved. To address this challenge, this paper proposes a novel deep learning model, the MLP-Mixer and Mixture of Expert (MMMe) model, for RUL prediction. The MMMe model leverages the Gated Recurrent Unit and Multi-Head Attention mechanism to encode the sequential data of battery capacity to capture the temporal features and a re-zero MLP-Mixer model to capture the high-level features. Additionally, we devise an ensemble predictor based on a Mixture-of-Experts (MoE) architecture to generate reliable RUL predictions. The experimental results on public datasets demonstrate that our proposed model significantly outperforms other existing methods, providing more reliable and precise RUL predictions while also accurately tracking the capacity degradation process. Our code and dataset are available at the website of github.
Federated Learning (FL) has emerged as a powerful technology designed for collaborative training between multiple clients and a server while maintaining data privacy of clients. To enhance the privacy in FL, Differentially Private Federated Learning (DPFL) has gradually become one of the most effective approaches. As DPFL operates in the distributed settings, there exist potential malicious adversaries who manipulate some clients and the aggregation server to produce malicious parameters and disturb the learning model. However, existing aggregation protocols for DPFL concern either the existence of some corrupted clients (Byzantines) or the corrupted server. Such protocols are limited to eliminate the effects of corrupted clients and server when both are in existence simultaneously due to the complicated threat model. In this paper, we elaborate such adversarial threat model and propose BVDFed. To our best knowledge, it is the first Byzantine-resilient and Verifiable aggregation for Differentially private FEDerated learning. In specific, we propose Differentially Private Federated Averaging algorithm (DPFA) as our primary workflow of BVDFed, which is more lightweight and easily portable than traditional DPFL algorithm. We then introduce Loss Score to indicate the trustworthiness of disguised gradients in DPFL. Based on Loss Score, we propose an aggregation rule DPLoss to eliminate faulty gradients from Byzantine clients during server aggregation while preserving the privacy of clients’ data. Additionally, we design a secure verification scheme DPVeri that are compatible with DPFA and DPLoss to support the honest clients in verifying the integrity of received aggregated results. And DPVeri also provides resistance to collusion attacks with no more than t participants for our aggregation. Theoretical analysis and experimental results demonstrate our aggregation to be feasible and effective in practice.