2025-10-15 2025, Volume 34 Issue 5

  • Select all
  • Gaze estimation, a crucial non-verbal communication cue, has achieved remarkable progress through convolutional neural networks. However, accurate gaze prediction in unconstrained environments, particularly in extreme head poses, partial occlusions, and abnormal lighting, remains challenging. Existing models often struggle to effectively focus on discriminative ocular features, leading to suboptimal performance. To address these limitations, this paper proposes dual-branch gaze estimation with Gaussian mixture distribution heatmaps and dynamic adaptive loss function (DMGDL), a novel dual-branch gaze estimation algorithm. By introducing Gaussian mixture distribution heatmaps centered on pupil positions as spatial attention guides, the model is enabled to prioritize ocular regions. Additionally, a dual-branch network architecture is designed to separately extract features for yaw and pitch angles, enhancing flexibility and mitigating cross-angle interference. A dynamic adaptive loss function is further formulated to address discontinuities in angle estimation, improving robustness and convergence stability. Experimental evaluations on three benchmark datasets demonstrate that DMGDL outperforms state-of-the-art methods, achieving a mean angular error of 3.98° on the Max-Planck institute for informatics face gaze (MPIIFaceGaze) dataset, 10.21° on the physically unconstrained gaze estimation in the wild (Gaze360) dataset and 6.14° on the real-time eye gaze estimation in natural environments (RT-Gene) dataset, exhibiting superior generalization and robustness.
  • Non-cooperative communication detection is a key technology for locating radio interference sources and conducting reconnaissance on adversary radiation sources. To meet the requirements of wide-area monitoring, a single interception channel often contains mixed multi-source signals and interference, resulting in generally low signal-to-noise ratio (SNR) of the received signals; meanwhile, improving detection quality urgently requires either high frequency resolution or high-time resolution, which poses severe challenges to detection techniques based on time-frequency representations (TFR). To address this issue, this paper proposes a fixed-frame-structure signal detection algorithm that integrates image enhancement and multi-scale template matching: first, the Otsu-Sauvola hybrid thresholding algorithm is employed to enhance TFR features, suppress noise interference, and extract time-frequency parameters of potential target signals (such as bandwidth and occurrence time); then, by exploiting the inherent time-frequency characteristics of the fixed-frame structure, the signal is subjected to multi-scale transformation (with either high-frequency resolution or high-time resolution), and accurate detection is achieved through the corresponding multi-scale template matching. Experimental results demonstrate that under 0 dB SNR conditions, the proposed algorithm achieves a detection rate greater than 87%, representing a significant improvement over traditional methods.
  • This study introduces a novel method for reconstructing the 3D model of aluminum foam using cross-sectional sequence images. Combining precision milling and image acquisition, high-quality cross-sectional images are obtained. Pore structures are segmented by the U-shaped network (U-Net) neural network integrated with the Canny edge detection operator, ensuring accurate pore delineation and edge extraction. The trained U-Net achieves 98.55% accuracy. The 2D data are superimposed and processed into 3D point clouds, enabling reconstruction of the pore structure and aluminum skeleton. Analysis of pore 01 shows the cross-sectional area initially increases, and then decreases with milling depth, with a uniform point distribution of 40 per layer. The reconstructed model exhibits a porosity of 77.5%, with section overlap rates between the 2D pore segmentation and the reconstructed model exceeding 96%, confirming high fidelity. Equivalent sphere diameters decrease with size, averaging 1.95 mm. Compression simulations reveal that the stress-strain curve of the 3D reconstruction model of aluminum foam exhibits fluctuations, and the stresses in the reconstruction model concentrate on thin cell walls, leading to localized deformations. This method accurately restores the aluminum foam’s complex internal structure, improving reconstruction precision and simulation reliability. The approach offers a cost-efficient, high-precision technique for optimizing material performance in engineering applications.
  • Rapid and high-precision speed bump detection is critical for autonomous driving and road safety, yet it faces challenges from non-standard appearances and complex environments. To address this issue, this study proposes a you only look once (YOLO) algorithm for speed bump detection (SPD-YOLO), a lightweight model based on YOLO11s that integrates three core innovative modules to balance detection precision and computational efficiency: it replaces YOLO11s’ original backbone with StarNet, which uses ‘star operations’ to map features into high-dimensional nonlinear spaces for enhanced feature representation while maintaining computational efficiency; its neck incorporates context feature calibration (CFC) and spatial feature calibration (SFC) to improve detection performance without significant computational overhead; and its detection head adopts a lightweight shared convolutional detection (LSCD) structure combined with GroupNorm, minimizing computational complexity while preserving multi-scale feature fusion efficacy. Experiments on a custom speed bump dataset show SPD-YOLO achieves a mean average precision (mAP) of 79.9%, surpassing YOLO11s by 1.3% and YOLO12s by 1.2% while reducing parameters by 26.3% and floating-point operations per second (FLOPs) by 29.5%, enabling real-time deployment on resource-constrained platforms.
  • Although significant progress has been made in micro-expression recognition, effectively modeling the intricate spatial-temporal dynamics remains a persistent challenge owing to their brief duration and complex facial dynamics. Furthermore, existing methods often suffer from limited generalization, as they primarily focus on single-dataset tasks with small sample sizes. To address these two issues, this paper proposes the cross-domain spatial-temporal graph convolutional network (GCN) (CDST-GCN) model, which comprises two primary components: a siamese attention spatial-temporal branch (SASTB) and a global-aware dynamic spatial-temporal branch (GDSTB). Specifically, SASTB utilizes a contrastive learning strategy to project macro- and micro-expressions into a shared, aligned feature space, actively addressing cross-domain discrepancies. Additionally, it integrates an attention-gated mechanism that generates adaptive adjacency matrices to flexibly model collaborative patterns among facial landmarks. While largely preserving the structural paradigm of SASTB, GDSTB enhances the feature representation by integrating global context extracted from a pretrained model. Through this dual-branch architecture, CDST-GCN successfully models both the global and local spatial-temporal features. The experimental results on CASME II and SAMM datasets demonstrate that the proposed model achieves competitive performance. Especially in more challenging 5-class tasks, the accuracy of the model on CASME II dataset is as high as 80.5%.
  • In hyperspectral image classification (HSIC), accurately extracting spatial and spectral information from hyperspectral images (HSI) is crucial for achieving precise classification. However, due to low spatial resolution and complex category boundary, mixed pixels containing features from multiple classes are inevitable in HSIs. Additionally, the spectral similarity among different classes challenge for extracting distinctive spectral features essential for HSIC. To address the impact of mixed pixels and spectral similarity for HSIC, we propose a central-pixel guiding sub-pixel and sub-channel convolution network (CP-SPSC) to extract more precise spatial and spectral features. Firstly, we designed spatial attention (CP-SPA) and spectral attention (CP-SPE) informed by the central pixel to effectively reduce spectral interference of irrelevant categories in the same patch. Furthermore, we use CP-SPA to guide 2D sub-pixel convolution (SPConv2d) to capture spatial features finer than the pixel level. Meanwhile, CP-SPE is also utilized to guide 1D sub-channel convolution (SCConv1d) in selecting more precise spectral channels. For fusing spatial and spectral information at the feature-level, the spectral feature extension transformation module (SFET) adopts mirror-padding and snake permutation to transform 1D spectral information of the center pixel into 2D spectral features. Experiments on three popular datasets demonstrate that ours outperforms several state-of-the-art methods in accuracy.
  • When a fire breaks out in a high-rise building, the occlusion of smoke and obstacles results in dearth of crucial information concerning people in distress, thereby creating a challenge in their detection. Given the restricted sensing range of a single unmanned aerial vehicle (UAV) camera, enhancing the target recognition rate becomes challenging without target information. To tackle this issue, this paper proposes a multi-agent autonomous collaborative detection method for multi-targets in complex fire environments. The objective is to achieve the fusion of multi-angle visual information, effectively increasing the target’s information dimension, and ultimately addressing the problem of low target recognition rate caused by the lack of target information. The method steps are as follows: first, the you only look once version5 (YOLOv5) is used to detect the target in the image; second, the detected targets are tracked to monitor their movements and trajectories; third, the person re-identification (ReID) model is employed to extract the appearance features of targets; finally, by fusing the visual information from multi-angle cameras, the method achieves multi-agent autonomous collaborative detection. The experimental results show that the method effectively combines the visual information from multi-angle cameras, resulting in improved detection efficiency for people in distress.