2026-04-15 2026, Volume 35 Issue 2

  • Select all
  • This paper presents a robust multitask diffusion average bias compensation least mean square (RM-DABC-LMS) algorithm for distributed estimation in noisy input and communication link noise. The algorithm utilizes a robust cost function based on the maximum Versoria criterion, incorporates bias compensation, and applies adaptive combination coefficients to reduce noise impacts. Theoretical analysis demonstrates the stability of the algorithm, providing closed-form expressions for the steady-state mean square deviation (MSD). A compression diffusion strategy is introduced to reduce communication cost of the RM-DABC-LMS algorithm, ensuring fast convergence and accurate estimation. Simulation results indicate that the proposed algorithm outperforms existing methods in noisy environments, achieving faster convergence and lower steady-state error.
  • In this work, we investigate the problem of multi-task learning (MTL) in ultra-high voltage direct current (UHVDC) monitoring systems. Considering the measurements are affected by wireless channel impairments, typically characterized by block fading and link noise. Such channel imperfections significantly degrade the performance of distributed estimation in real-world power system environments. Based on the graph signal processing method, we propose the multi-task robust decoupled diffusion least mean square algorithm (MT-RDDLMS). Specifically, a decoupled adapt-then-combine strategy is introduced to reduce the influence of wireless channels on data exchange among measurement units. Moreover, an average estimation method with an adaptive smoothing factor is developed to further suppress link noise and enhance estimation accuracy. Simulation results confirm the robustness and effectiveness of the proposed algorithm under realistic wireless channel conditions.
  • In the field of multi-view three-dimensional (3D) human pose estimation, there are primarily two approaches: heatmap-based and regression-based models. Regression-based models require less computational effort than heatmap-based models but are less accurate. This study proposes a regression-based model called multi-view 3D human pose estimation based on regression with multivariate joint distribution (MRM), which achieves accuracy comparable to heatmap-based models while using lower computational resources in multi-view 3D human pose estimation. Specifically, this model employs a flow-based method to learn the multivariate joint distribution of human pose data, enabling the regression-based model to capture nonlinear dependencies across different perspectives. Experimental results on two public datasets validate the accuracy and efficiency of the proposed model. Compared with heatmap-based methods, MRM reduces multiply-add operations by 32.3% while maintaining comparable prediction accuracy.
  • Unmanned aerial vehicle (UAV) aerial images often feature rapidly changing perspectives, extremely small target scales, and significant occlusions and background interference, posing dual challenges to accuracy and stability in real-time detection. To address these issues, this paper proposes frequency-affine-lightweight detection (FALDet) within the you only look once version 8 (YOLOv8) framework, systematically improving detection through three main approaches: First, replacing spatial pyramid pooling-fast (SPPF) with intra-scale feature interaction-high-low frequency attention (AIFI-HiLo) to model high-frequency local and low-frequency global attention in parallel, balancing edge details and long-range semantics while maintaining low computational overhead. Second, it replaces part of C2f with local affine deformable convolution (LA-DCN), introducing a unified local affine sampling grid to reduce degrees of freedom and enhance stability against rotational, scaling, and translational deformations. Third, it designs lightweight cross-scale dynamic detection head (LiteX-DyHead), which effectively improves recall and localization consistency for dense small objects through lightweight preprocessing, dynamic/deformable alignment, and multi-scale fusion. Using VisDrone2019 as the primary evaluation dataset, ablation and comparative experiments were conducted under unified training strategies and input resolutions. Results demonstrate that FALDet achieves stable improvements over YOLOv8s in both mAP@0.5 and mAP@0.5:0.95 while maintaining high frames per second, validating the effectiveness and practicality of the proposed method. The method’s effectiveness was further validated on the SIMD dataset.
  • In fault diagnosis, feature extraction from vibration signals is widely recognized as the most critical step, as it directly influences the accuracy and reliability of the diagnostic outcomes. To address the limited capability of single-view feature extraction in complex vibration signals and the high economic cost associated with multi-source information fusion, this paper proposes a novel fault diagnosis method based on the Gramian angular field (GAF) and a self-feedback spiking neural network (SF-SNN). The method not only streamlines the network architecture but also enhances the biological plausibility of the SNN model. Initially, GAF is employed to transform the one-dimensional vibration signals collected by sensors into two-dimensional images, effectively preserving the temporal dependencies inherent in the signals. Subsequently, conventional spiking neurons are replaced with self-feedback neurons to enable faster and more precise feature recognition, thereby improving diagnostic performance and classification accuracy. This method inherits the low power consumption and high bionic properties of SNNs while enhancing the efficiency, performance, and robustness of SNN models. It achieved high accuracies of 99.92% and 99.77% on the Case Western Reserve University bearing dataset and the Jiangnan University bearing dataset, respectively. Simultaneously, under noisy conditions (signal-to-noise ratio of –4 dB), it attained accuracies of 80.35% and 80.12%, significantly outperforming other methods. These results fully demonstrate the high accuracy and robust performance of the proposed method on bearing fault diagnosis.
  • An adaptive path planning algorithm was proposed, which improves upon traditional A* by integrating an improved A* algorithm with the Dynamic Window Approach (DWA). This addresses the problems of slow search speed, unsmooth paths, and poor dynamic obstacle avoidance capability. Through an “8+5” neighborhood screening, a 16-neighborhood evaluation function, and a second-order then third-order Bézier curve optimization process, a Jetson Nano+ROS(Robot Operating System) is deployed to meet the requirements of efficient and safe navigation for fire inspection robots in complex environments. The results show that, compared with the original algorithm, the proposed algorithm reduces the average number of traversed nodes by 49.23%, the number of turns in the optimized path has decreased by approximately 28.82%, decreases curvature by 66.6%, and eliminates path tangency with obstacles. This also supports real-time obstacle avoidance with integration DWA, and outperforms traditional methods.
  • Infrared and visible image fusion aims to combine the complementary information from both modalities into a single image that simultaneously retains salient thermal targets and rich texture details. However, current fusion approaches mainly emphasize visual quality of the fused images, overlooking the compatibility with the downstream tasks. To address this issue, this paper proposes a foreground-guided fusion framework that adaptively enhances target regions while preserving global contextual information. Specifically, we design a two-branch network where the fusion branch aims to reconstructs high quality fused images while the foreground extraction branch captures semantic representations of salient objects to guide the fusion process toward target-related regions. To validate the effectiveness of the proposed framework, we build an aircraft key-point dataset named VIRcraft to assess the performance. The fused images are also applied to semantic segmentation and object detection to verify the generalization of the proposed framework. The experimental results on different tasks demonstrate the superiority and generalization of the proposed fusion framework.
  • In an unmanned aerial vehicle (UAV) strike scenario, vision-based tracking control for a gimbal with a monocular camera is studied in this paper. First, a vision-based localization method is proposed. Image coordinates are converted into geodetic coordinates, and a multi-frame transformation model is established to account for UAV attitude, gimbal angles, and camera parameters. Then, to address the inaccuracy of single-frame observations, a data-driven adaptive covariance Kalman filter (DAC-KF) is introduced to achieve accurate and robust estimation of target positions. After that, a feedforward-proportional velocity gimbal controller based on damped Jacobian inversion is proposed to ensure the camera remains locked on the moving target. Furthermore, a physics-based trajectory model is established to guide the release of unguided projectiles, facilitating accurate strikes on moving targets. Finally, field experiments show that the proposed method effectively locks and strikes moving targets, with a final deviation within one meter.
  • To address the challenges of dusty, foggy and other complex construction site environments leading to the failure of visible light imaging and difficulties in small target detection, as well as the high resource consumption hindering model deployment, an enhanced and lightweight algorithm is proposed. This algorithm employs a hybrid architecture, integrating red green blue (RGB) (visible light) and thermal infrared (RGBT) multi-modal images through a fusion framework based on you only look once (YOLO) version 8 and Mamba-Transformer (MT). We refer to this integrated model as YOLOv8-RGBT-MT. In terms of network improvements, a frequency enhancement module is first employed to enhance visible light and infrared images. And then, a module integrating Mamba and Transformer components is designed to replace base convolutional blocks in the backbone network, thereby expanding the receptive field of the model and improving feature extraction in complex backgrounds. Finally, a multi-modal feature fusion mechanism is introduced, through which complementary information from visible and infrared images is effectively integrated via an adaptive weighting strategy, so that both the detection accuracy and robustness for small targets are enhanced. Experimental results demonstrate that, compared to YOLOv8-RGBT, the enhanced algorithm achieves an improvement of 18.7 % in mAP50, while reducing the number of inference time by 79.7 %