2025-02-05 2025, Volume 5 Issue 1

  • Select all
  • Research Article
    Zuoxun Wang, Changkun Guo, Jinxue Sui, Chuanyu Cui

    The deployment of wireless sensor networks (WSNs) in extreme environments such as nuclear fusion devices and the aerospace industry is crucial for real-time monitoring of critical parameters. However, it faces many challenges. In this paper, we propose the desert golden mole optimization algorithm (DGMOA), a novel algorithm inspired by the survival strategy of the desert golden mole and combined with the Dingo optimization algorithm (DOA). DGMOA addresses these challenges through two core mechanisms: the sand swimming strategy enhances the global search capability, and the hiding strategy is used for fine-grained local optimization. Through simulation tests, DGMOA shows excellent performance. It can quickly explore a large range of solution space in the initial search phase and adjust the position of individuals to avoid local optimal traps, resulting in a more uniform sensor layout and higher coverage. In convergence speed, it outperforms existing algorithms with faster convergence. Regarding energy consumption, the reasonable node layout reduces unnecessary waste and prolongs the service life of the sensor network. The results show that DGMOA is a highly effective solution for sensor layout in complex and extreme environments, with significant improvements in performance and energy consumption over traditional methods.

  • Research Article
    Xinqiang Chen, Chen Chen, Huafeng Wu, Octavian Postolache, Yuzheng Wu

    As global maritime transport rapidly advances, the demands for intelligent, safe, and efficient automated container ports have significantly increased. In this evolving landscape, multi-automated guided vehicle (AGV) systems have emerged as a critical element of port automation, playing an essential role. Within automated container terminals, quay cranes, AGVs, and yard cranes are the primary equipment for loading and unloading operations on ships. However, the complexity of simultaneously considering numerous practical factors and the intricate relationships among them has made optimization modeling in this area a challenging task. To tackle this challenge, we have developed a path optimization model for multi-AGV systems in port environments, based on an enhanced artificial potential field (APF) algorithm. This algorithm utilizes the initial states of AGVs, target locations, and obstacle information as inputs. It creates attractive forces near the target locations and repulsive forces around static obstacles. Moreover, a minimum safety distance between AGVs is established; when AGVs approach closer than this threshold, the algorithm introduces repulsive forces between them to prevent collisions. The algorithm dynamically recalculates the repulsive potential field in response to real-time feedback and changes in the environment, enabling continuous adjustment to the AGV paths and action plans. This iterative process continues until all AGVs reach their designated targets. The effectiveness of this algorithm has been validated through port environment simulations, demonstrating clear advantages in enhancing the safety and smoothness of multi-AGV path planning.

  • Research Article
    Qian Jiang, Tao Zhou, Youwei He, Wenjun Ma, Jingyu Hou, Ahmad Shahrizan Abdul Ghani, Shengfa Miao, Xin Jin

    Thermal infrared (TIR) images remain unaffected by variations in light and atmospheric conditions, which makes them extensively utilized in diverse nocturnal traffic scenarios. However, challenges pertaining to low contrast and absence of chromatic information persist. The technique of image colorization emerges as a pivotal solution aimed at ameliorating the fidelity of TIR images. This enhancement is conducive to facilitating human interpretation and downstream analytical tasks. Because of the blurred and intricate features of TIR images, extracting and processing their feature information accurately through image-based approaches alone becomes challenging for networks. Hence, we propose a multi-modal model that integrates text features from TIR images with image features to jointly perform TIR image colorization. A vision transformer (ViT) model will be employed to extract features from the original TIR images. Concurrently, we manually observe and summarize the textual descriptions of the images, and then input these descriptions into a pretrained contrastive language-image pretraining (CLIP) model to capture text-based features. These two sets of features will then be fed into a cross-modal interaction (CI) module to establish the relationship between text and image. Subsequently, the text-enhanced image features will be processed through a U-Net network to generate the final colorized images. Additionally, we utilize a comprehensive loss function to ensure the network's ability to generate high-quality colorized images. The effectiveness of the methodology put forward in this study is evaluated using the KAIST datasets. The experimental results vividly showcase the superior performance of our CMMF-Net method in comparison to other methodologies for the task of TIR image colorization.

  • Research Article
    Yujun Chen, Xiuli Zhu, Peng Wang, Kuangrong Hao, Kairui Sheng

    Due to the strong noise, high dimensionality and time-varying characteristics of industrial process data, data-driven modeling faces challenges in feature extraction and model interpretability. To address these issues, this paper proposes a new prediction model based on adaptive variational empirical mode decomposition-guided (AVEMDG) graph convolutional networks (GCNs). First, each sensor signal is decomposed into high-frequency and low-frequency features using empirical mode decomposition (EMD) to effectively capture multi-band information. Second, the weights of these features are adaptively updated through variational inference (Ⅵ) combined with Bayesian reasoning to handle the importance and uncertainty of features. Next, the GCN is used to model the spatiotemporal dependencies in the sensor network and is trained using the reweighted feature data. Last, the proposed method is applied to the prediction of the melt viscosity index (MVI), a key performance indicator (KPI) of the actual polyester fiber polymerization process. Ablation study and comparative experiment are conducted to evaluate the contribution of each component and the generality of the proposed model. Experimental results show that this method can effectively improve the model prediction accuracy, thereby enhancing the interpretability of the soft sensor model and providing guidance for the production of industrial processes.

  • Research Article
    Haibo Duan, Fanrong Shi, Bo Gao, Yingyue Zhou, Qiushi Cui

    The live-line operation of 10 kV distribution networks is critical for ensuring uninterrupted and high-quality power supply. However, operational sites face challenges such as insufficient intelligent monitoring and suboptimal realtime performance. To address these issues, this study proposes the FEM-YOLOv8 algorithm, specifically designed for protective equipment detection in live-line operation scenarios. The proposed algorithm is deployed on edge devices compatible with unmanned aerial vehicles (UAVs), enabling remote, autonomous, and intelligent monitoring. Key improvements include the introduction of an enhanced FAST-C2f module, replacing the original C2f module in the Backbone to improve feature extraction efficiency while reducing model complexity. Additionally, a lightweight efficient channel attention (ECA) mechanism is incorporated into the Backbone and Neck to enhance target feature detection and representation capabilities. The bounding box regression loss function is replaced with metric preserving distance intersection over union (MPDIoU) to further boost detection accuracy and robustness. The FEM-YOLOv8 model is implemented on the Atlas 200I DK A2 edge device, which is suitable for UAV deployment. Experimental results demonstrate that the improved FEM-YOLOv8 model achieves 93.1% precision (P), 85.9% recall (R), and 92.3% mean average precision (mAP), surpassing the baseline model by 2.8, 3.2, and 2.2 percentage points, respectively. With a detection speed of 83 frames per second (FPS) and a power consumption of only 10.2 W, the model satisfies real-time performance and detection accuracy requirements, providing significant contributions to grid intelligence and power operation safety.

  • Research Article
    Yizhen Meng, Chun Liu, Jing Zhao, Jing Huang, Guanbo Jing

    In the navigation of unmanned surface vessels (USVs), external disturbances, particularly ocean waves, frequently induce deviations from the desired trajectory. To mitigate these challenges, we propose a novel disturbance rejection control strategy based on Stackelberg game theory, designed to address unmodeled system dynamics, complex environmental conditions, and other external perturbations. This approach incorporates several key innovations. First, we introduce a velocity error dynamic system coupled with a non-cooperative Stackelberg game model, where the USV's control inputs (as the leader) and external disturbances (as the follower) interact within an alternating update framework. This leader-follower interaction facilitates the joint optimization of both the disturbance rejection and performance-optimal control strategies, enhancing the USV's tracking accuracy while maximizing its disturbance rejection capacity. Second, we rigorously verify the existence of a cooperative optimal solution through an analysis of the Nash equilibrium under sequential decision-making between the leader and follower. Building on this, integral reinforcement learning and neural networks are employed to approximate the optimal Stackelberg solution. The boundedness and convergence of the proposed approach are validated using Lyapunov functions, ensuring stability and optimal performance under dynamic operating conditions. Finally, simulation results confirm the efficacy of the proposed strategy, demonstrating its ability to concurrently optimize control robustness and performance - such as minimizing tracking error and energy consumption - when confronted with unmodeled dynamics and external disturbances.

  • Research Article
    Yudi Ruan, Di Wang, Yijing Yuan, Shixin Jiang, Xianyi Yang

    As the demands for ensuring bridge safety continue to rise, crack detection technology has become more crucial than ever. In this context, deep learning methods have been widely applied in the field of intelligent crack detection for bridges. However, existing methods are often constrained by complex backgrounds and computational limitations, struggling with issues such as weak crack continuity and insufficient detail representation. Inspired by biological mechanisms, a dynamic snake convolution (DSC) with tubular offsets is incorporated to tackle these challenges effectively. Additionally, a channel-wise self-attention (CWSA) mechanism is introduced to efficiently fuse multi-scale features in U-Net, significantly enhancing the ability of the model to capture fine details. In the classification head, the traditional linear layer is replaced with a Kolmogorov-Arnold network (KAN) structure, which strengthens the robustness and generalization capacity of the model. Experimental results demonstrate that the proposed model improves detection accuracy, achieving a mean intersection over union (mIoU) of 0.877, while maintaining almost the same number of parameters, showcasing exceptional performance and practical applicability. Our project is released at https://github.com/ruanyudi/KanSeg-Bi.

  • Review
    Yongcheng Cui, Ying Zhang, Cui-Hua Zhang, Simon X. Yang

    With the rapid development of artificial intelligence and robotics, service robots are increasingly becoming a part of our daily lives to provide domestic services. For robots to complete such services intelligently and with high quality, the prerequisite is that they can recognize and plan tasks to discover task requirements and generate executable action sequences. In this context, this paper systematically reviews the latest research progress in task cognition and planning for domestic service robots, covering key technologies such as command text parsing, active task cognition (ATC), multimodal perception, and action sequence generation. Initially, the challenges traditional rule-based command parsing methods face are analyzed, and the enhancement of robots’ understanding of complex instructions through deep learning methods is explored. Subsequently, the research trends in ATC are introduced, discussing the ability of robots to autonomously discover tasks by perceiving the surrounding environment through visual and semantic features. The discussion then moves to the current typical methods in task planning, comparing and analyzing four common approaches to highlight their advantages and disadvantages in this field. Finally, the paper summarizes the challenges of existing research and the future directions for development, providing references for further enhancing the task execution capabilities of domestic service robots in complex home environments.

  • Review
    Guina Wang, Zhen Li, Guirong Weng, Yiyang Chen

    Image segmentation plays a vital role in artificial intelligence and computer vision with major applications such as industrial picking, defect detection, scene understanding and video surveillance. As parallel computing technologies develop, numerous deep learning (DL)-based segmentation algorithms have demonstrated practical performance with increased efficiency and accuracy. With the concept of DL image segmentation, a comprehensive review on recent literature is introduced in detail, including traditional image segmentation algorithms, DL schemes and the fusion of the former two algorithms. The seminal efforts of DL in image segmentation are elaborated in accordance with the quantity and quality of annotated labels, covering supervised, weakly-supervised, and unsupervised frameworks. Numerous methods on industrial benchmark datasets are compared and analyzed in standard evaluation indicators. Finally, the challenges and opportunities of DL image segmentation are discussed for further research.

  • Research Article
    Xu Tian, Lyuwen Huang, Mengqun Zhai, Mengyi Zhang, Pengju Hu, Mingjun Li, Liehong Ren

    The physical and biochemical indices of apple fruit serve as crucial phenotypic parameters in genomic cultivation. Among them, the soluble solids content (SSC), titratable acid content (TAC), and firmness are the three most paramount parameters that directly reflect the inner quality of apples. To achieve a more accurate prediction of the internal physicochemical indicators, a novel non-destructive detection approach fused with nonlinear and multi-features using a multilayer autoencoder (MAE) was proposed. For non-destructive detection of internal physicochemical indicators, a dielectric spectrum device was employed to gather the electrical parameters of 300 Fuji genomic sample apples. These measurements were taken at nine distinct frequencies, spanning from 0.158 to 3,980 kHz. For the normal control group for validation, to precisely detect its physical and biochemical parameters, special physicochemical analysis apparatuses were utilized to collect data on firmness, SSCs, and TACs. To predict key genomic parameters such as firmness and SSC/TAC, three classical regression models were implemented and subject to comprehensive analysis. The experimental results reveal that the nonlinear feature variable selection based on MAE and multilayer perceptron (MLP) achieved the best prediction performance. Specifically, the correlation coefficients (R2) for predicting firmness and SSC/TAC reached up to 0.88 and 0.82, respectively, with root mean square errors (RMSEs) of 0.66 and 2.08. Regarding state-of-the-art dimensionality reduction, MAE can be validated as a nonlinear feature extraction methodology for complex electrical parameters. It demonstrates robust applicability in predicting a diverse array of other genomic parameters.

  • Review
    Junfei Li, Simon X. Yang

    Embodied artificial intelligence (AI) is reshaping the landscape of intelligent robotic systems, particularly by providing many realistic solutions to execute actions in complex and dynamic environments. However, Embodied AI requires a huge data generation for training and evaluation to ensure safe interaction with physical environments. Therefore, it is necessary to build a cost-effective simulated environment that can provide enough data for training and optimization from the physical characteristics, object properties, and interactions. Digital twins (DTs) are vital issues in Industry 5.0, which enable real-time monitoring, simulation, and optimization of physical processes by mirroring the state and action of their real-world counterparts. This review explores how integrating DTs with Embodied AI can bridge the sim-to-real gap by transforming virtual environments into dynamic and data-rich platforms. The integration of DTs offers real-time monitoring and virtual simulations, enabling Embodied AI agents to train and adapt in virtual environments before deployment in real-world scenarios. In this review, the main challenges and the novel perspective of the future development of integrating DTs and Embodied AI are discussed. To the best of our knowledge, this is the first work to comprehensively review the synergies between DTs and Embodied AI.

  • Research Article
    Wasif Feroze, Muhammad Shahid, Shaohuan Cheng, Elias Lemuye Jimale, Yi Yang, Hong Qu, Yulin Wang

    Understanding and capturing temporal relationships between time-related events expressed in text is a crucial aspect of natural language understanding (NLU). Although transformer-based pre-trained language models such as bidirectional encoder representations from transformers (BERT) have achieved significant success in various natural language processing (NLP) tasks, they are still believed to underperform in temporal commonsense tasks due to the limitation of vanilla self-attention. This paper proposes a methodology for developing language models to understand temporal commonsense reasoning over several tasks better. The proposed framework integrates a multi-data hybrid curation approach for dataset preparation, a collaborative synthetic dataset generation process involving chat agents and human domain experts, and a multi-stage fine-tuning strategy that leverages curated, intermediate, and target datasets to enhance temporal commonsense reasoning capabilities. The models we use in our proposed methodology are superior due to the use of an advanced attention mechanism and effective utilization of our framework. These models utilize disentangled attention, which is relative encoding position, which proved crucial for temporal commonsense by understanding temporal cues and indicators efficiently. Our extensive experiments show that models built with our proposed methodology enhance results on several temporal commonsense categories. Our results show that we achieved better performance than the previous published work by utilizing a disentangled attention mechanism and hybrid data framework. Most impressively, our approach has demonstrated state-of-the-art (SOTA) results, surpassing all previous studies on temporal commonsense for the MC-TACO dataset.