Existing job shop scheduling methods often neglect job mobility and machine spatial distribution. This paper addresses the flexible job shop scheduling problem under the spatial constraints. Specifically, it incorporates both job movement time and potential collision risks caused by local job density. The paper defines a spatially constrained scheduling environment with non-sequential machine distribution. The spatial constraints are then refined into moving distance constraints and local density constraints. Additionally, a reward function is designed, including penalties for both movement and density. This paper employs a multi-agent reinforcement learning method that combines dual attention and counterfactual baselines to solve the scheduling problem. Experimental results show that our approach effectively balances temporal and spatial factors. It reduces job movement costs and collision risks while achieving the shortest completion time.
Large language models (LLMs) have demonstrated tremendous potential in game playing, while little attention has been paid to their ethical implications in those contexts. This work investigates and analyses the ethical considerations of applying LLMs in game playing, using Werewolf, also known as Mafia, as a case study. Gender bias, which affects game fairness and player experience, has been observed from the behaviour of LLMs. Some roles, such as the Guard and Werewolf, are more sensitive than others to gender information, presented as a higher degree of behavioural change. We further examine scenarios in which gender information is implicitly conveyed through names, revealing that LLMs still exhibit discriminatory tendencies even in the absence of explicit gender labels. This research showcases the importance of developing fair and ethical LLMs. Beyond our research findings, we discuss the challenges and opportunities that lie ahead in this field, emphasising the need for diving deeper into the ethical implications of LLMs in gaming and other interactive domains.
Document-level relation extraction (RE) aims to identify the relations between entities across multiple sentences. In real life, new relations constantly emerge in new texts, raising the challenge to continually learn the new relations while avoiding forgetting the learned relations. Previous continual RE works have primarily focused on the continual learning of sentence-level RE, where each entity pair is associated with one single sentence and annotated with one relation. However, emerging relations may exist between entity pairs spanning multiple sentences or between entity pairs with pre-existing relations, necessitating the application of continual learning to document-level RE. To this end, we consider continual document-level RE and propose a novel model named CDRE to alleviate the partial labeling problem that severely degrades the performance of RE models. Specifically, we propose multi-binary knowledge distillation to transfer the knowledge of learned relations from the previously trained model to the current model. We introduce asymmetric training to coordinate the influence of positive samples and samples with learned yet unannotated relations. Furthermore, we explore the correlation between relations to augment label generation for re-annotating the learned and newly emerging relations in current and memorized samples, respectively. To simulate real-world scenarios, we construct two benchmark datasets derived from two widely-used document-level RE datasets. Experimental results on the datasets validate the superiority of our model CDRE in coping with continual document-level RE.
With the rapid development of Large Language Models (LLMs), fine-tuning LLMs with downstream data for better capability transferring has become the mainstream of LLM applications, where Parameter-Efficient Fine-Tuning (PEFT) methods play the most important role. Considering the core architecture of LLMs: transformer block, existing PEFT methods focus on using limited data to fine-tune a small number of parameters of only key components, such as self-attention and feed-forward net. They have achieved impressive performance, where representative works are Low-Rank Adapter (LoRA) and its variances (e.g., AdaLora, GLoRA). However, existing PEFT methods still suffer from severe shortcomings: the sensitivity to the selection of hyper-parameters (e.g., ranks, scales, etc.) and the sensitivity to the initialization of low-rank factors. Inappropriate settings will lead to overfitting or underfitting problem when tuning LLMs, resulting in unstable fine-tuning performance. Meanwhile, searching the optimal hyper-parameters is resource-intensive and experience-dependent. To this end, in this paper, we propose a novel PEFT method: SpecAdapt, which could adapt to various scenarios without sophisticated hyper-parameter tuning. Specifically, to tackle the hyper-parameter sensitivity problem, we design a Singular-guided Weight Decay strategy to control the complexity of fine-tuned parameters. For the table fine-tuning of LLMs, we develop a simple but effective Gradient Normalization module to improve the tuning stability. Extensive experiments on multiple transformer-based pre-trained large models across various benchmarks (i.e., two image benchmarks and one language benchmark) demonstrate the superiority of our proposed SpecAdapt (achieving 75.6% average accuracy and outperforming the state-of-the-art methods with fixed hyper-parameters across 19 datasets). We also release the code to support the community.
Knowledge graph completion aims to predict missing factual triples in knowledge graphs, thereby enhancing their completeness. Recent studies have significantly improved the performance of knowledge graph completion by integrating multi-modal information into knowledge graph representation learning. However, two major challenges remain: first, how to effectively align and integrate embeddings from structural, visual, and textual modalities to improve the quality of entity representations; second, how to strengthen the connections among head entities, relations, and tail entities in correct triples, making their associations more cohesive, thereby more clearly distinguishing between correct and incorrect triples. To address these challenges, we propose a Dual-level Contrastive Learning model (DualCL) for multi-modal knowledge graph completion. Specifically, our model consists of two levels of contrastive learning. (1) At the entity level, we employ a multi-modal contrastive representation method to align structural, visual, and textual information of the same entity into a shared embedding space, ensuring semantic consistency across modalities for more effective multi-modal information integration; (2) At the triple level, we enhance the semantic associations among head entities, relations, and tail entities in correct triples through contrastive learning, while optimizing the model’s ability to distinguish between different “entity-relation-entity” combinations. Experimental results demonstrate that our method outperforms recent strong baseline models on multiple link prediction datasets, thereby validating its effectiveness and advantages in knowledge graph completion.
Traditional source separation methods rely on coarse-grained categorical labels through labeling all vocals collectively without distinguishing individual voices in an audio mixture, which inherently limits the ability to isolate single tracks. While fine-grained annotations could partially address this issue, they demand substantial resources and face challenges in extracting tracks from raw signals. To overcome these limitations, we propose to extract each track through decomposing the patterns of data generation. Specifically, we refine Variational Stochastic Dirichlet Process-VAE, a variational autoencoder framework through replacing the standard variational distribution by a variational stochastic Dirichlet process (VSDP). Among our proposed framework, the encoder, leveraging stick-breaking constructions, adaptively partitions the latent space into clusters, while the decoder designed to recover each component achieves implicit signal separation. Its advantage is that the reconstruction target can be shifted from the raw input to its individual components. Experiments demonstrate our method’s efficacy in two scenarios: (1) Under coarse-grained source definitions, it reaches near-state-of-the-art performance (SDR=10.3); (2) For fine-grained track separation, what’s more, it identifies 83% of individual vocal tracks with an average SDR of 7.8, which cannot be obtained by other SOTA methods without the help of annotations.
Real-world data often exhibit a long-tail class distribution, where a small subset of classes dominate the majority of the training samples, while the remaining classes suffer from severe data scarcity. Long-tail learning (LTL) aims to tackle this extreme data imbalance problem and improve the generalization across both head and tail classes. Although re-sampling offers a straightforward solution to mitigate class imbalance, prior researches have empirically shown its limited effectiveness in modern long-tail learning tasks. To overcome this limitation, we propose Context-Aware RE-sampling (CARE), a novel framework that leverages large pre-trained models to suppress irrelevant contexts as well as enrich the diversity of the training data. Specifically, CARE introduces multiple practical implementations: CARE-DS, which integrates DINO and SAM to segment and transplant objects across images, generating diverse samples while preserving semantic consistency, and CARE-DM, which utilizes diffusion models to synthesize contextually diverse samples conditioned on original images and textual prompts. Extensive experiments demonstrate that CARE effectively mitigates performance deterioration for both head and tail classes, achieving significant generalization improvements over conventional re-sampling methods.
With the advancement of machine learning, domain adaptation has become increasingly important. Traditional research in domain adaptation has primarily focused on Unsupervised Domain Adaptation (UDA) and Semi-Supervised Domain Adaptation (SSDA). However, in many practical applications, it is common to encounter scenarios where both domains have labeled and unlabeled samples, which complicates the handling of domain adaptation. The scarcity of solutions to these scenarios further underscores the necessity of developing new methods to effectively explore the labeled and unlabeled samples. This paper proposes the problem of Bi-directional Semi-Supervised Domain Adaptation (BiSSDA) and a method of Gradient discrepancy minimization and labeled Class Centroid Align (GCCA) to address this problem. In GCCA, labeled and unlabeled samples from both domains are passed through a generator and two classifiers and , the generator opposes with and during training and in which both domains are better aligned via gradient and class centroid alignment. Extensive experiments on three widely used datasets demonstrate that GCCA significantly outperforms CGDM and several previous SSDA methods in terms of exploring the labeled and unlabeled samples in both domains and significantly reduce the reliance on labeled data in bi-directional domain adaptation through cooperation between two domains. The code of the proposed method is available at the website of gitee.com/ymw12345/gcca.
Improving the general capabilities of Large Language Models (LLMs) is an active research topic. As a common data structure in many real-world domains, understanding graph data is a crucial part of advancing general intelligence. To this end, we propose a dynamic benchmark named GraphInstruct in this paper, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed intermediate reasoning steps for each sample. Based on GraphInstruct, we develop GraphSolver via efficient instruction-tuning, which demonstrates prominent graph understanding capability compared to other open-sourced LLMs. To further endow LLMs with multi-step graph reasoning capability, we propose a label-mask training strategy and build GraphSolver+, which leverages masked supervision on intermediate reasoning tokens to emphasize crucial node-identification signals. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphSolver and GraphSolver+ over other LLMs. We sincerely hope GraphInstruct will facilitate further research on applying LLMs to graph-structured data. Our code and data are released publicly at the website of github.com/CGCL-codes/GraphInstruct.