The booming of artificial intelligence (AI) agents has brought about promising business scenarios for sixth-generation (6G) mobile networks, while simultaneously posing significant challenges to network functionalities and infrastructure. These AI agents can be deployed on end devices (e.g., intelligent robots and intelligent cars) or as digital entities (e.g., personal AI assistants). As novel service entities with autonomous decision-making and task execution capabilities, AI agents introduce potential risks of uncontrollable actions and privacy disclosures. AI agents also require new 6G capabilities beyond traditional communication, including multimodality information interaction (e.g., AI models and tokens) and support for service requirements (e.g., computing and sensing of data). In this article, we introduce the concept of AI-agent communication network (ACN), a new paradigm to enable global information interaction and on-demand capability provisioning for single or multiple AI agents. We first introduce the vision and architectural framework of ACN. Then, key technologies and future research directions related to ACN are discussed. Furthermore, we provide potential use cases to elaborate on how ACN can expand the service capabilities of 6G networks.
Inflight broadband connectivity (commonly termed as inflight connectivity) can be considered one of the remaining milestones for ubiquitous Internet provision; therefore, several enabling technologies are being investigated to provide high-capacity, reliable, and affordable Internet access. Multiple-input multiple-output (MIMO), based on the space-time processing (STP) concept, is one of the dominant technologies that consistently appear on the list of inflight connectivity (IFC) enablers. STP shows the potential to significantly increase user throughput, improve spectral/energy efficiencies, and increase the capacity as well as reliability of airborne networks through spatial multiplexing/diversity techniques. This article presents the preliminary outcomes of substantial research on STP techniques for enabling IFC, as the exploratory study on this topic is still in its early stages. We explore the theoretical principles behind different STP techniques and their implementation in airborne networks in direct air-to-ground (A2G) scenarios for the provision of a reliable and high-speed IFC. We also analyze the current technologies and techniques used for IFC and highlight their benefits and limitations. We present a comprehensive review that compares different STP techniques using metrics such as bit error rate (BER), spectral efficiency (SE), and capacity. Last, but not least, we discuss the substantial research challenges encountered and the prospective future research avenues that require special attention for enhancing the deployment of STP systems in forthcoming airborne networks, particularly for enabling IFC. Overall, this research study contributes to the body of knowledge by providing insights into the use of STP techniques in airborne networks for enabling IFC. It emphasizes the theoretical foundations, presents a literature review, discusses challenges and limitations, identifies potential areas for future research, and provides a performance analysis.
This paper describes our investigation of the privacy protection problem of multi-agent systems under cooperative-competitive networks. A node decomposition strategy is used to protect the privacy of the initial node values, in which a node vi is split into ni nodes. By designing inter-node weights, the initial value of each node is protected from honest-but-curious nodes and eavesdroppers without relying on external algorithms. The purpose is to design a privacy-preserving consensus algorithm such that the privacy performance is guaranteed by using the node decomposition strategy, while the bipartite consensus is achieved for the cooperative-competitive multi-agent systems. Two numerical simulations are given to validate the effectiveness of the proposed privacy-preserving bipartite consensus algorithm.
Load frequency control (LFC) is usually managed by traditional proportional-integral-derivative (PID) controllers. Recently, deep reinforcement learning (DRL)-based adaptive controllers have been widely studied for their superior performance. However, the DRL-based adaptive controller exhibits inherent vulnerability due to adversarial attacks. To develop more robust control systems, this study conducts a deep analysis of DRL-based adaptive controller vulnerability under adversarial attacks. First, an adaptive controller is developed based on the DRL algorithm. Subsequently, considering the limited capability of attackers, the DRL-based LFC is evaluated under adversarial attacks using the zeroth-order optimization (ZOO) method. Finally, we use adversarial training to enhance the robustness of DRL-based adaptive controllers. Extensive simulations are conducted to evaluate the performance of the DRL-based PID controller with and without adversarial attacks.
Medical image segmentation is critical for clinical diagnosis, but the scarcity of annotated data limits robust model training, making few-shot learning indispensable. Existing methods often suffer from two issues-performance degradation due to significant inter-class variations in pathological structures, and overreliance on attention mechanisms with high computational complexity (O(n2)), which hinders the efficient modeling of long-range dependencies. In contrast, the state space model (SSM) offers linear complexity (O(n)) and superior efficiency, making it a key solution. To address these challenges, we propose PPFFR (parallel prototype filter and feature refinement) for few-shot medical image segmentation. The proposed framework comprises three key modules. First, we propose the prototype refinement (PR) module to construct refined class subgraphs from encoder-extracted features of both support and query images, which generates support prototypes with minimized inter-class variation. We then propose the parallel prototype filter (PPF) module to suppress background interference and enhance the correlation between support and query prototypes. Finally, we implement the feature refinement (FR) module to further enhance segmentation accuracy and accelerate model convergence with SSM's robust long-range dependency modeling capability, integrated with multi-head attention (MHA) to preserve spatial details. Experimental results on the Abd-MRI dataset demonstrate that FR with MHA outperforms FR alone in segmenting the left kidney, right kidney, liver, and spleen, and in terms of mean accuracy, confirming MHA's role in improving precision. In extensive experiments conducted on three public datasets under the 1-way 1-shot setting, PPFFR achieves Dice scores of 87.62%, 86.74%, and 79.71% separately, consistently surpassing state-of-the-art few-shot medical image segmentation methods. As the critical component, SSM ensures that PPFFR balances performance with efficiency. Ablation studies validate the effectiveness of the PR, PPF, and FR modules. The results indicate that explicit inter-class variation reduction and SSM-based feature refinement can enhance accuracy without heavy computational overhead. In conclusion, PPFFR effectively enhances inter-class consistency and computational efficiency for few-shot medical image segmentation. This work provides insights for few-shot learning in medical imaging and inspires lightweight architecture designs for clinical deployment.
Turbulence, a complex multi-scale phenomenon inherent in fluid flow systems, presents critical challenges and opportunities for understanding physical mechanisms across scientific and engineering domains. Although high-resolution (HR) turbulence data remain indispensable for advancing both theoretical insights and engineering solutions, their acquisition is severely limited by prohibitively high computational costs. While deep learning architectures show transformative potential in reconstructing high-fidelity flow representations from sparse measurements, current methodologies suffer from two inherent constraints:strict reliance on perfectly paired training data and inability to perform multi-scale reconstruction within a unified framework. To address these challenges, we propose HADF, a hash-adaptive dynamic fusion implicit network for turbulence reconstruction. Specifically, we develop a low-resolution (LR) consistency loss that facilitates effective model training under conditions of missing paired data, eliminating the conventional requirement for fully matched LR and HR datasets. We further employ hash-adaptive spatial encoding and dynamic feature fusion to extract turbulence features, mapping them with implicit neural representations for reconstruction at arbitrary resolutions. Experimental results demonstrate that HADF achieves superior performance in global reconstruction accuracy and local physical properties compared to state-of-the-art models. It precisely recovers fine turbulence details for partially unpaired data conditions and diverse resolutions by training only once while maintaining robustness against noise.
Large language models (LLMs) excel in multilingual translation tasks, yet often struggle with culturally and semantically rich Chinese texts. This study introduces the framework of back-translation (BT) powered by LLMs, or LLM-BT, to evaluate Chinese → intermediate language → Chinese translation quality across five LLMs and three traditional systems. We construct a diverse corpus containing scientific abstracts, historical paradoxes, and literary metaphors, reflecting the complexity of Chinese at the lexical and semantic levels. Using our modular NLPMetrics system, including bilingual evaluation understudy (BLEU), character F-score (CHRF), translation edit rate (TER), and semantic similarity (SS), we find that LLMs outperform traditional tools in cultural and literary tasks. However, the results of this study uncover a high-dimensional behavioral phenomenon, the paradox of poetic intent, where surface fluency is preserved, but metaphorical or emotional depth is lost. Additionally, some models exhibit verbatim BT, suggesting a form of data-driven quasi-self-awareness, particularly under repeated or cross-model evaluation. To address BLEU's limitations for Chinese, we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting, improving sensitivity to lexical segmentation and term consistency. Supplementary tests show that in certain semantic dimensions, LLM outputs approach the fidelity of human poetic translations, despite lacking a deeper metaphorical intent. Overall, this study reframes traditional fidelity vs. fluency evaluation into a richer, multi-layered analysis of LLM behavior, offering a transparent framework that contributes to explainable artificial intelligence and identifies new research pathways in cultural natural language processing and multilingual LLM alignment.
Video large language models (video-LLMs) have demonstrated impressive capabilities in multimodal understanding, but their potential as zero-shot evaluators for temporal consistency in video captions remains underexplored. Existing methods notably underperform in detecting critical temporal errors, such as missing, hallucinated, or misordered actions. To address this gap, we introduce two key contributions. (1) TimeJudge:a novel zero-shot framework that recasts temporal error detection as answering calibrated binary question pairs. It incorporates modality-sensitive confidence calibration and uses consistency-weighted voting for robust prediction aggregation. (2) TEDBench:a rigorously constructed benchmark featuring videos across four distinct complexity levels, specifically designed with fine-grained temporal error annotations to evaluate video-LLM performance on this task. Through a comprehensive evaluation of multiple state-of-the-art video-LLMs on TEDBench, we demonstrate that TimeJudge consistently yields substantial gains in terms of recall and F1-score without requiring any task-specific fine-tuning. Our approach provides a generalizable, scalable, and training-free solution for enhancing the temporal error detection capabilities of video-LLMs.
AutoDock Vina (Vina) is a widely adopted molecular docking tool, often regarded as a standard or used as a baseline in numerous studies. However, its computational process is highly time-consuming. The pioneering field-programmable gate array (FPGA)-based accelerator of Vina, known as Vina-FPGA, offers a high energy-efficiency approach to speed up the docking process. However, the computation modules in the Vina-FPGA design are not efficiently used. This is due to Vina exhibiting irregular behaviors in the form of nested loops with changing upper bounds and differing control flows. Fortunately, Vina employs the Monte Carlo iterative search method, which requires independent computations for different random initial inputs. This characteristic provides an opportunity to implement further parallel computation designs. To this end, this paper proposes Vina-FPGA2, an inter-module pipeline design for further accelerating Vina-FPGA. First, we use individual computational task (Task) independence by sequentially filling Tasks into computation modules. Then, we implement an inter-module pipeline parallel design by the Tag Checker module and architectural modifications, named Vina-FPGA2-Baseline. Next, to achieve resource-efficient hardware implementation, we describe it as an optimization problem and develop a reinforcement learning-based solver. Targeting the Xilinx UltraScale XCKU060 platform, this solver yields a more efficient implementation, named Vina-FPGA2-Enhanced. Finally, experiments show that Vina-FPGA2-Enhanced achieves an average 12.6×performance improvement over the central processing unit (CPU) and a 3.3×improvement over Vina-FPGA. Compared to Vina-GPU, Vina-FPGA2 achieves a 7.2×enhancement in energy efficiency.
Track-to-track association (T2TA), which aims at unifying track batch numbers and reducing track redundancy, serves as a precondition and foundation for track fusion and situation awareness. The current problems of T2TA come mainly from two sources:track data and association methods. Ubiquitous problems include errors and inconsistent update periods in track data, as well as suboptimal association results and dependencies on prior information and assumed motion models for association methods. Focusing on these two aspects, we propose a multiple-hypothesis algorithm for multi-sensor T2TA with an intelligent track score (MH-T2TA). A spatial-temporal registration module is designed based on self-attention and a contrastive learning architecture to eliminate errors and unify the distributions of asynchronous tracks. A multiple-hypothesis algorithm is combined with deep learning to estimate the association score of a pair of tracks without relying on prior information or assumed motion models, and the optimal association pairs can be obtained. With three kinds of loss functions, tracks coming from the same targets become closer, tracks coming from different targets become more distant, and the estimated track scores are very similar to the real ones. Experimental results demonstrate that the proposed MH-T2TA can associate tracks in complex scenarios and outperform other T2TA methods.
Robotic navigation in unknown environments is challenging due to the lack of high-definition maps. Building maps in real time requires significant computational resources. Nevertheless, sensor data can provide sufficient environmental context for robots' navigation. This paper presents an interpretable and mapless navigation method using only two-dimensional (2D) light detection and ranging (LiDAR), mimicking human strategies to escape from dead ends. Unlike traditional planners, which depend on global paths or vision-based and learning-based methods, requiring heavy data and hardware, our approach is lightweight and robust, and it requires no prior map. It effectively suppresses oscillations and enables autonomous recovery from local minimum traps. Experiments across diverse environments and routes, including ablation studies and comparisons with existing frameworks, show that the proposed method achieves map-like performance without a map-reducing the average path length by 50.51% when compared to the classical mapless Bug2 algorithm and increasing it by only 17.57% when compared to map-based navigation.
The field of artificial intelligence (AI) in quantitative investment has seen significant advancements, yet it lacks a standardized benchmark aligned with industry practices. This gap hinders research progress and limits the practical application of academic innovations. We present QuantBench, an industrial-grade benchmark platform designed to address this critical need. QuantBench offers three key strengths:(1) standardization that aligns with quantitative investment industry practices; (2) flexibility to integrate various AI algorithms; (3) full-pipeline coverage of the entire quantitative investment process. Our empirical studies using QuantBench reveal some critical research directions, including the need for continual learning to address distribution shifts, improved methods for modeling relational financial data, and more robust approaches to mitigate overfitting in low signal-to-noise environments. By providing a common ground for evaluation and fostering collaboration between researchers and practitioners, QuantBench aims to accelerate progress in AI for quantitative investment, similar to the impact of benchmark platforms in computer vision and natural language processing. The code is open-sourced on GitHub at https://github.com/SaizhuoWang/quantbench.
Large language models (LLMs) have been applied across various domains due to their superior natural language processing and generation capabilities. Nonetheless, LLMs occasionally generate content that contradicts real-world facts, known as hallucinations, posing significant challenges for real-world applications. To enhance the reliability of LLMs, it is imperative to detect hallucinations within LLM generations. Approaches that retrieve external knowledge or inspect the internal states of the model are frequently used to detect hallucinations; however, this requires either white-box access to the LLM or reliable expert knowledge resources, raising a high barrier for end-users. To address these challenges, we propose a black-box zero-resource approach for detecting LLM hallucinations, which primarily leverages multi-perspective consistency checking. The proposed approach mitigates the LLM overconfidence phenomenon by integrating multi-perspective consistency scores from both queries and responses. In comparison to the single-perspective detection approach, our proposed approach demonstrates superior performance in detecting hallucinations across multiple datasets and LLMs. Notably, in one experiment, where the hallucination rate reaches 94.7%, our approach improves the balanced accuracy (B-ACC) by 2.3 percentage points compared with the single consistency approach and achieves an area under the curve (AUC) of 0.832, all without depending on any external resources.
Recently, audio-visual speech recognition (AVSR) has attracted increasing attention. However, most existing works simplify the complex challenges in real-world applications and only focus on scenarios with two speakers and perfectly aligned audio-video clips. In this work, we study the effect of speaker number and modal misalignment in the AVSR task, and propose an end-to-end AVSR framework under a more realistic condition. Specifically, we propose a speaker-number-aware mixture-of-experts (SA-MoE) mechanism to explicitly model the characteristic difference in scenarios with different speaker numbers, and a cross-modal realignment (CMR) module for robust handling of asynchronous inputs. We also use the underlying difficulty difference and introduce a new training strategy named challenge-based curriculum learning (CBCL), which forces the model to focus on difficult, challenging data instead of simple data to improve efficiency.
Information-theoretic principles provide a rigorous foundation for adaptive radar waveform design in contested and dynamically varying environments. This paper addresses the joint optimization of constant modulus waveforms to enhance both target detection and parameter estimation concurrently. A unified design framework is developed by maximizing a mutual information upper bound (MIUB), which intrinsically reconciles the tradeoff between detection sensitivity and estimation accuracy without heuristic weighting. Realistic, potentially non-Gaussian statistics of target and clutter returns are modeled using Gaussian mixture distributions (GMDs), enabling tractable closed-form approximations of the MIUB's Kullback-Leibler divergence and mutual information components. To tackle the ensuing non-convex optimization, a tailored metaheuristic phase-coded dream optimization algorithm (PC-DOA) is proposed, incorporating hybrid initialization and adaptive exploration-exploitation mechanisms for efficient phase-space search. Numerical results substantiate the proposed approach's superiority in achieving favorable detection estimation trade-offs over existing benchmarks.
The coexistence of ultra-reliable low-latency communication (URLLC) and enhanced mobile broadband (eMBB) services in 5G-based industrial wireless networks (IWNs) poses significant resource slicing challenges due to their inherent performance requirement conflicts. To address this challenge, this paper proposes a puncturing method that uses a model-aided deep reinforcement learning (DRL) algorithm for URLLC over eMBB services in uplink 5G networks. First, a puncturing-based optimization problem is formulated to maximize the eMBB accumulated rate under strict URLLC latency and reliability constraints. Next, we design a random repetition coding-based contention (RRCC) scheme for sporadic URLLC traffic and derive its analytical reliability model. To jointly optimize the scheduling parameters of URLLC and eMBB, a DRL solution based on the reliability model is developed, which is capable of dynamically adapting to changing environments. The accelerated convergence of the model-aided DRL algorithm is demonstrated using simulations, and the superiority in resource efficiency of the proposed method over existing approaches is validated.
This study addresses the challenges of near-field interference suppression and resource allocation in extremely large-scale multiple-input multiple-output (XL-MIMO) communication systems, particularly under dense-user scenarios. We propose a quality-of-service (QoS)-aware joint user scheduling and power control scheme. Leveraging the spherical wave (SW) characteristics of near field channels, a dual-domain interference suppression strategy is developed by analyzing the spatial correlation of beam focusing vectors in terms of both angular separation and distance constraints. Based on this, a spatial correlation-based scheduling (SCS) algorithm is designed. By integrating this user selection strategy with a dynamic power allocation mechanism, the proposed approach optimizes the sum spectral efficiency while ensuring the user QoS. This framework is further extended to modular XL-MIMO systems. We show how modular deployment can enhance spatial resolution and develop an adapted QoS-aware user scheduling algorithm, called modular SCS (SCS-mod), for this architecture. Simulation results validate that the proposed algorithms significantly outperform existing schemes in terms of sum spectral efficiency and the number of scheduled users, especially under high user density and high transmission power conditions.
Compared with general multi-target tracking filters, this paper focuses on multi-target trajectories in scenarios where the detection probability of the sensor is unknown. In this paper, two trajectory Poisson multi-Bernoulli (TPMB) filters with unknown detection probability are proposed:one for alive trajectories and the other for all trajectories. First, an augmented trajectory state with detection probability is constructed, and then two new state transition models and a new measurement model are proposed. Then, this paper derives the recursion of TPMB filters with unknown detection probability. Furthermore, the detailed beta-Gaussian implementations of TPMB filters for alive trajectories and all trajectories are presented. Finally, simulation results demonstrate that the proposed TPMB filters with unknown detection probability can achieve robust tracking performance and effectively estimate multi-target trajectories.
We present a dynamically reconfigurable spoof surface plasmon polariton (SSPP) waveguide capable of bidirectional switching between perfect absorption and perfect transmission through active control. Nonlinear varactor diodes are integrated into the waveguide, enabling degenerate phase matching between pump and signal waves via voltage-tuned dispersion engineering. Three-wave mixing processes are established, allowing bidirectional phase-controlled transitions from destructive to constructive interference. The proposed SSPP waveguide overcomes traditional pumping constraints with its bidirectional configuration, supporting both forward- and backward-propagating pump-signal configurations and permitting signal amplitude modulations at both the transmitter and receiver ends. Experimental characterization demonstrates remarkable signal gain tunability:the forward pumping configuration achieves a dynamic range spanning from -69.50 to +1.04 dB, while the backward configuration spans from -70.49 to +1.45 dB. This work provides new design paradigms for microwave coherent systems and advances the development of reconfigurable electromagnetic devices for adaptive energy harvesting and high-speed signal processing applications.