As various types of data grow explosively, large-scale data storage, backup, and transmission become challenging, which motivates many researchers to propose efficient universal compression algorithms for multi-source data. In recent years, due to the emergence of hardware acceleration devices such as GPUs, TPUs, DPUs, and FPGAs, the performance bottleneck of neural networks (NN) has been overcome, making NN-based compression algorithms increasingly practical and popular. However, the research survey for the NN-based universal lossless compressors has not been conducted yet, and there is also a lack of unified evaluation metrics. To address the above problems, in this paper, we present a holistic survey as well as benchmark evaluations. Specifically, i) we thoroughly investigate NN-based lossless universal compression algorithms toward multi-source data and classify them into 3 types: static pre-training, adaptive, and semi-adaptive. ii) We unify 19 evaluation metrics to comprehensively assess the compression effect, resource consumption, and model performance of compressors. iii) We conduct experiments more than 4600 CPU/GPU hours to evaluate 17 state-of-the-art compressors on 28 real-world datasets across data types of text, images, videos, audio, etc. iv) We also summarize the strengths and drawbacks of NN-based lossless data compressors and discuss promising research directions. We summarize the results as the NN-based Lossless Compressors Benchmark (NNLCB, See fahaihi.github.io/NNLCB website), which will be updated and maintained continuously in the future.
Compiling Model-Based Diagnosis to maximum satisfiability (MaxSAT) is currently a popular method because it can directly calculate the diagnosis. Although the method based on dominator component encoding can reduce the difficulty of the problem, with the increase of the system size, the complexity of the solution is also increasing. In this paper, we propose an efficient encoding method to solve this problem. The method makes several significant contributions. First, our strategy significantly reduces the size of the encoding required for constructing MaxSAT formulations in the offline phase, without the need for additional observations. Second, this strategy significantly decreases the number of clauses and variables through system observations, even when dealing with components that have uncertain output values. Last, our algorithm is applicable to both single and multiple observation diagnosis problems, without sacrificing the completeness of the solution set. Experimental results on ISCAS-85 benchmarks show that our algorithm outperforms the state-of-the-art algorithms on both single and multiple observation problems.
Federated Learning (FL) is a machine learning paradigm where multiple data owners collaboratively train a model under the coordination of a central server, while keeping all data decentralized. Such a paradigm allows models to be trained effectively while avoiding data privacy leakage. However, federated learning is vulnerable to various kinds of failures as a result of both intentional (malicious) and none intentional (non-malicious) attacks. Existing studies on attacks in federated learning are mostly dedicated to the automatic defense against malicious attacks (e.g., data poisoning attacks). Relatively, less attention has been given to handling non-malicious failures (e.g., non-independent and identically distributed data failures), which are actually more common and difficult-to-handle in a federated learning setting. In this paper, we propose FedCare, a real-time visual diagnosis approach for handling failures in federated learning systems. The functionality of FedCare includes the identification of failures, the assessment of their nature (malicious or non-malicious), the study of their impact, and the recommendation of adequate defense strategies. Our design is multi-faceted, giving perspectives from the angles of model performance, anomaly/contribution assessment of clients, features maps, group activities, and client impact. We demonstrate the effectiveness of our approach through two case studies, a quantitative experiment and an expert interview.
Multi-view learning is an emerging field that aims to enhance learning performance by leveraging multiple views or sources of data across various domains. By integrating information from diverse perspectives, multi-view learning methods effectively enhance accuracy, robustness, and generalization capabilities. The existing research on multi-view learning can be broadly categorized into four groups in the survey based on the tasks it encompasses, namely multi-view classification approaches, multi-view semi-supervised classification approaches, multi-view clustering approaches, and multi-view semi-supervised clustering approaches. Despite its potential advantages, multi-view learning poses several challenges, including view inconsistency, view complementarity, optimal view fusion, the curse of dimensionality, scalability, limited labels, and generalization across domains. Nevertheless, these challenges have not discouraged researchers from exploring the potential of multi-view learning. It continues to be an active and promising research area, capable of effectively addressing complex real-world problems.
Video-Grounded Dialogue System (VGDS), focusing on generating reasonable responses based on multi-turn dialogue contexts and a given video, has received intensive attention recently. The key to building a superior VGDS lies in efficiently reasoning over visual and textual concepts of various granularities and achieving comprehensive visual-textual multi-modality alignment. Despite remarkable research progress, existing studies suffer from identifying context-relevant video parts while disregarding the impact of redundant information in long-form and content-dynamic videos. Further, current methods usually align all semantics in different modalities uniformly using a one-time cross-attention scheme, which neglects the sophisticated correspondence between various granularities of visual and textual concepts (e.g., still objects with nouns, dynamic events with verbs). To this end, we propose a novel system, namely Cascade cOntext-oriented Spatio-Temporal Attention Network (COSTA), to generate reasonable responses efficiently and accurately. Specifically, COSTA first adopts a cascade attention network to localize only the most relevant video clips and regions in a coarse-to-fine manner which effectively filters the irrelevant visual semantics. Secondly, we design a memory distillation-inspired iterative visual-textual cross-attention strategy to progressively integrate visual semantics with dialogue contexts across varying granularities, facilitating extensive multi-modal alignment. Experiments on several benchmarks demonstrate significant improvements in our model over state-of-the-art methods across various metrics.
Knowledge graph completion (KGC) aims to fill in missing entities and relations within knowledge graphs (KGs) to address their incompleteness. Most existing KGC models suffer from knowledge coverage as they are designed to operate within a single KG. In contrast, Multilingual KGC (MKGC) leverages seed pairs from different language KGs to facilitate knowledge transfer and enhance the completion of the target KG. Previous studies on MKGC based on graph neural networks (GNNs) have primarily focused on using relation-aware GNNs to capture the combined features of neighboring entities and relations. However, these studies still have some shortcomings, particularly in the context of MKGCs. First, each language’s specific semantics, structures, and expressions contribute to the increased heterogeneity of the KG. Therefore, the completion of MKGCs necessitates a thorough consideration of the heterogeneity of the KG and the effective integration of its heterogeneous features. Second, MKGCs typically have a large graph scale due to the need to store and manage information from multiple languages. However, current relation-aware GNNs often inherit complex GNN operations, resulting in unnecessary complexity. Therefore, it is necessary to simplify GNN operations. To address these limitations, we propose a Simplified Multi-view Graph Neural Network (SM-GNN) for MKGC. SM-GNN incorporates two simplified multi-view GNNs as components. One GNN is utilized for learning multi-view graph features to complete the KG. The other generates new alignment pairs, facilitating knowledge transfer between different views of the KG. We simplify the two multi-view GNNs by retaining feature propagation while discarding linear transformation and nonlinear activation to reduce unnecessary complexity and effectively leverage graph contextual information. Extensive experiments demonstrate that our proposed model outperforms competing baselines. The code and dataset are available at the website of github.com/dbbice/SM-GNN.
Low-Rank Adaptation (LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA’s performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field.
With the advancement of multimedia internet, the impact of visual characteristics on the decision of users to click or not within the online retail industry is increasingly significant. Thus, incorporating visual features is a promising direction for further performance improvements in click-through rate (CTR). However, experiments on our production system revealed that simply injecting the image embeddings trained with established pre-training methods only has marginal improvements. We believe that the main advantage of existing image feature pre-training methods lies in their effectiveness for cross-modal predictions. However, this differs significantly from the task of CTR prediction in recommendation systems. In recommendation systems, other modalities of information (such as text) can be directly used as features in downstream models. Even if the performance of cross-modal prediction tasks is excellent, it is challenging to provide significant information gain for the downstream models. We argue that a visual feature pre-training method tailored for recommendation is necessary for further improvements beyond existing modality features. To this end, we propose an effective user intention reconstruction module to mine visual features related to user interests from behavior histories, which constructs a many-to-one correspondence. We conduct extensive experimental evaluations on public datasets and our production system to verify that our method can learn users’ visual interests. Our method achieves 0.46% improvement in offline AUC and 0.88% improvement in Taobao GMV (Cross Merchandise Volume) with p-value < 0.01.
The shadow tomography problem introduced by [