Intelligence & Robotics

2025-04-11 2025, Volume 5 Issue 2

Previous Next

Select all

Research Article

Retrieve-then-compare mitigates visual hallucination in multi-modal large language models

Dingchen Yang, Bowen Cao, Sanqing Qu, Fan Lu, Shangding Gu, Guang Chen

2025, 5(2): 248-75. https://doi.org/10.20517/ir.2025.13

Multi-modal large language models (MLLMs) demonstrate remarkable success in a range of vision-language tasks. However, they are prone to visual hallucinations, where their textual responses diverge from the provided image. Inaccurate visual understanding poses risks to the practical applications of MLLMs. Are MLLMs oblivious to accurate visual cues when they hallucinate? Our investigation indicates that the visual branch of MLLMs may advocate both erroneous and accurate content equally, highlighting a high level of uncertainty. To address this issue, we propose retrieval contrastive decoding (RCD), a training-free method that leverages analogous visual hallucinations, which are induced by images sharing common semantic and appearance characteristics, to mitigate visual hallucinations. Specifically, RCD retrieves relevant images to serve as references for MLLMs, and compares their visual content with the test image through confidence score subtraction. Additionally, RCD coordinates the correction of hallucinations from both the visual and textual branches of MLLMs by adaptively scaling the subtracted scores. Experiments on public hallucination benchmarks demonstrate the efficacy of RCD in mitigating visual hallucinations for three state-of-the-art MLLMs, surpassing other advanced decoding strategies. Furthermore, we validate the effectiveness of RCD in enhancing the capability of MLLMs to comprehend complex and potentially hazardous situations in real-world traffic scenarios. RCD enhances the accuracy of MLLMs in understanding real-world scenes and improves their capability for reasoning, thereby enhancing the reliability of MLLMs in real-world applications.

Research Article

An effective fault-tolerant control with slime mold algorithm for unmanned underwater vehicle

Tianhong Zeng, Daqi Zhu, Chunhua Gu, Simon X. Yang

2025, 5(2): 276-91. https://doi.org/10.20517/ir.2025.14

The proposed fault-tolerant control strategy based on the slime mold algorithm (FTC-SMA) enhances the resilience of multi-thruster unmanned underwater vehicles against thrust loss. In the event of a propulsion system failure, the strategy enables rapid thrust redistribution to restore the original torque and sailing direction, even in the event of a catastrophic thruster failure. This strategy follows the physical limits of the thruster and can effectively solve the over-actuated problem. The effectiveness, efficiency, and stability of FTC-SMA are confirmed through simulation experiments under various fault conditions, demonstrating significant improvements over other algorithms such as particle swarm optimization and grasshopper optimization algorithm.

Research Article

CAM-MR-MS based gesture recognition method using sEMG

Lina Tong, Yunbo Li, Yixia Liang, Chen Wang

2025, 5(2): 292-312. https://doi.org/10.20517/ir.2025.15

With the continuous concern for the disabled and the elderly, intelligent prosthetics and service robots have been widely applied. This paper provides a method for gesture recognition using forearm surface electromyography (sEMG), including an adaptive channel selection method to simplify the sEMG measurement. Based on the forearm muscle groups corresponding to different movements, surface skin areas are divided, and the Myo bracelet is used to collect sEMG signals from these areas. A method combined with channel attention module, multi-channel relationship feature extraction module and multi-scale skip connection module is built to adaptively select the signals from certain skin areas and recognize the seven gestures during experiment. The comparative experimental results indicate that this method can adaptively extract the optimal channel combination and show effective recognition results. It improved the practicability for the sEMG-based gesture recognition.

Research Article

MSAFNet: a novel approach to facial expression recognition in embodied AI systems

Huifang He, Runbin Liao, Yating Li

2025, 5(2): 313-32. https://doi.org/10.20517/ir.2025.16

In embodied artificial intelligence (EAI), accurately recognizing human facial expressions is crucial for intuitive and effective human-robot interactions. We introduce multi-scale attention and convolution-transformer fusion network, a deep learning framework tailored for EAI, designed to dynamically detect and process facial expressions, facilitating adaptive interactions based on the user's emotional state. The proposed network comprises three distinct components: a local feature extraction module that utilizes attention mechanisms to focus on key facial regions, a global feature extraction module that employs Transformer-based architectures to capture comprehensive global information, and a global-local feature fusion module that integrates these insights to enhance facial expression recognition accuracy. Our experimental results on prominent datasets such as FER2013 and RAF-DB indicate that our data-driven approach consistently outperforms existing state-of-the-art methods.

Research Article

Disturbance observer-based terminal sliding mode control for the training safety improvement in robot-assisted rehabilitation

Yaqi Zhang, Weiyi Xie, Renjie Ma

2025, 5(2): 333-54. https://doi.org/10.20517/ir.2025.17

Existing control methods for exoskeletons often face challenges in adapting to individual differences, ensuring robustness in dynamic environments, and achieving real-time performance. For instance, certain approaches fail to balance rehabilitation efficacy with wearer comfort, while others suffer from issues such as chattering and limited disturbance rejection capabilities. This paper proposes a control framework for exoskeleton rehabilitation robots, emphasizing the strict implementation of safety protocols to guarantee that patients in the early stages of rehabilitation can accurately follow normal human activity postures. Firstly, an interpolating polynomial is optimized to generate the desired trajectory, with consideration of the minimum jerk principle. Secondly, a motion-dependent function is proposed for smooth switching between two modes of normal training and safe stopping. Thirdly, a non-singular fast terminal sliding mode method based on a nonlinear disturbance observer is proposed to accurately track the desired joint angles, with the objective of achieving a tracking error that tends to zero in a finite time. Furthermore, the stability of the closed-loop system is demonstrated through the application of the Lyapunov method. Ultimately, the simulation results demonstrate the efficacy and resilience of the proposed control framework.

Research Article

Intelligent augmented reality application for personalised rhinoplasty using machine learning

Mohammad Saeid Heydari, Mahyar Kolivand, Montadar Al-Azzawi, Hoshang Kolivand

2025, 5(2): 355-77. https://doi.org/10.20517/ir.2025.18

Rhinoplasty, a common yet complex cosmetic surgery, often results in patient dissatisfaction due to the reliance on subjective surgeon evaluations. This study introduces an intelligent augmented reality (AR) application for personalised nose surgery, integrating three core innovations: (1) Preoperative 3D Modelling; (2) Machine Learning (ML) Analysis; and (3) AR Visualisation. The system employs advanced computer vision algorithms to extract precise facial measurements from high-resolution 3D scans or photographs. These measurements are analysed using ML techniques to calculate key facial ratios and recommend optimal nose shapes tailored to individual facial structures. AR further enhances the surgical process by providing real-time visualisations and guidance, enabling surgeons to implement data-driven decisions with greater precision. This novel approach addresses key challenges in rhinoplasty by automating critical steps of the surgical planning process, reducing subjectivity, and significantly improving surgical accuracy. The application’s contribution extends beyond the operating room, offering surgeons a powerful educational tool with real-time feedback and interactive visualisations to support continuous skill development. This study represents a transformative step in leveraging AR and ML for enhanced precision, patient satisfaction, and surgical outcomes in cosmetic surgery.

Research Article

Intelligent bridge monitoring system operational status assessment using analytic network-aided triangular intuitionistic fuzzy comprehensive model

Chen Wang, Qizhi Tang, Bo Wu, Yan Jiang, Jingzhou Xin

2025, 5(2): 378-403. https://doi.org/10.20517/ir.2025.19

The extensive construction of bridge health monitoring (BHM) systems has made it challenging for the authorities to manage them centrally. The reliable operational status of BHM systems is vital to obtaining accurate monitoring data and evaluating the condition of bridges. To evaluate the operational status of these systems, this study established an assessment model that integrates the triangular intuitionistic fuzzy analytic network process (TIFANP) and the triangular intuitionistic fuzzy comprehensive evaluation (TIFCE) method. Firstly, an evaluation index system was established for the operational status of a BHM system. Factors such as system stability, data reliability, system maintenance, early warning, and human-computer interaction were comprehensively considered. Secondly, the evaluation indicator weights were assigned using TIFANP. The system evaluation rating levels were divided into four grades, and the membership and non-membership functions of the evaluation indicators for these rating levels were constructed based on TIFCE. Finally, the effectiveness of the proposed method was verified based on a case study. This is the first time that an operational status assessment method suitable for in-service BHM systems has been proposed. The results show that the TIFANP better accounts for the relationships for non-independence and interactions among the evaluation indicators. Hesitations in the decision-making process were quantified, making the weight allocations more accurate. The proposed method outperforms other comparison methods and can be used to evaluate the operational status of BHM systems in a more scientific and objective manner.

Research Article

A phase search-enhanced Bi-RRT path planning algorithm for mobile robots

Yuhao Sun, Huazhong Zhu, Zhaocheng Liang, Andong Liu, Hongjie Ni, Ye Wang

2025, 5(2): 404-18. https://doi.org/10.20517/ir.2025.20

The proposed improvement to the Rapidly-exploring Random Tree (RRT) path planning algorithm is aimed at addressing the issue of slow convergence speed caused by boundary information in the original algorithm, by introducing a phase search approach. The initial approach involves employing a three-stage search strategy to generate sampling points that are specifically oriented toward real-time sampling failure rate, thereby significantly reducing the number of redundant nodes. Simultaneously, a balanced exploration strategy is introduced, enhancing the algorithmos convergence speed by constructing two randomly growing trees for searching. Secondly, a path-pruning strategy is implemented, effectively reducing the path length. Finally, the bidirectional exploration technique from the improved algorithm is applied to the traditional RRT algorithm based on boundary information, and comparative experiments are conducted. The experimental results demonstrate that, compared to the traditional boundary-based RRT method, the proposed improved algorithm reduces the running time by 13.4% and decreases the path length by 9.51%.

Research Article

Extended fault-pair Boolean table based test points selection for robotic systems

Xiuli Wang, Dongdong Xie, Yang Li, Jun Tian, Kai Li

2025, 5(2): 419-32. https://doi.org/10.20517/ir.2025.21

Analog circuit fault isolation is crucial for ensuring the reliability and performance of robotic systems. Test point selection plays a key role in enabling effective fault isolation, yet traditional methods often struggle to balance the number of test points with fault isolation accuracy. This paper proposes a novel test point selection method by extending the fault-pair Boolean table into a distributional framework. The approach enhances test point selection by employing the Bhattacharyya Coefficient to quantify distributional overlap and using kernel density estimation (KDE) to model circuit response distributions without assuming normality. To further improve estimation accuracy, the Grey Wolf optimization algorithm is applied for optimal KDE bandwidth selection. Experimental results on a negative feedback circuit show that the proposed method successfully isolates all 11 faults, demonstrating strong isolation capability. Further validation on an active filter circuit confirms its effectiveness, achieving successful isolation of 16 out of 20 faults. Compared to other methods, the proposed approach consistently yields higher fault isolation across various thresholds.

Review

Research and application progress of electronic ear tags as infrastructure for precision livestock industry: a review

Wei Peng, Zhengxu Liu, Jiazhu Cai, Yunxiang Zhao

2025, 5(2): 433-49. https://doi.org/10.20517/ir.2025.22

With the rapid development of modern livestock farming, animal electronic ear tags (AEET), as animal identification and tracking tools based on radio frequency identification technology, are playing an increasingly important role in precision livestock management. This review summarizes the latest technological advancements in AEET, including material innovations, design improvements, and manufacturing process enhancements, and explores their wide-ranging applications in production management, food traceability, breeding optimization, behavior recognition, and disease monitoring. Additionally, the article highlights the main challenges faced by AEET, such as durability in harsh environments, data security, and cost-effectiveness. Furthermore, it looks ahead to future development trends, including the integration of Internet of Things and blockchain technologies to further enhance the precision and sustainability of livestock farming. By reviewing the current status and future directions of AEET, this review provides references for researchers and practitioners aiming to improve the efficiency and sustainability of modern livestock industry.

Research Article

Smooth and efficient motion planning of large-scale and cooperative multi-arm tunnel drilling robot

Yuming Cui, Jiajun Pu, Ningning Hu, Menghao Cui

2025, 5(2): 450-73. https://doi.org/10.20517/ir.2025.23

To address the motion planning challenges in multi-arm cooperative operations of tunnel rock drilling robots, we establish forward/inverse kinematics models for drilling arms using an enhanced Denavit-Hartenberg method combined with radial basis function neural networks. An improved genetic algorithm (IGA) is developed, integrating heuristic crossover operators, adaptive mutation operations, and local neighborhood search mechanisms to optimize multi-arm trajectories with the objective of minimizing end-effector travel distance. A joint-space collision avoidance strategy is proposed using an enhanced artificial potential field (IAPF) method that incorporates both attractive potential fields and repulsive potential functions. Simultaneously, quintic B-spline-based trajectory planning ensures smooth motion continuity during collaborative drilling operations. Experimental validation demonstrates that the IGA-IAPF integration achieves 37.2% reduction in collision probability compared to conventional methods, while maintaining joint angular accelerations below 0.25 rad/s² for all manipulators.

Research Article

An intelligent fault detection algorithm for power transmission lines based on multi-scale fusion

Tianyi Wu, Liming Wang, Xiangyi Xu, Lei Su, Wenjing He, Xinting Wang

2025, 5(2): 474-87. https://doi.org/10.20517/ir.2025.24

With the rapid expansion of modern power grids, automated defect detection in high-voltage transmission lines has become a critical engineering challenge for preventing catastrophic failures and ensuring reliable electricity supply. While automated inspection has revolutionized power infrastructure maintenance, current vision-based methods still face three practical limitations in field applications: (1) susceptibility to complex background interference; (2) insufficient recognition accuracy for small-sized components; and (3) delayed response in real-time inspection scenarios. To address these industry pain points, this study develops a multi-scale fusion enhanced detection algorithm specifically optimized for power transmission line components. In response to these issues, this paper proposes an intelligent power transmission line defect detection algorithm based on multi-scale fusion, which introduce Coordinate Convolution, optimized decoupled detection head and improved loss function to solves the problems of low precision, poor robustness, and slow detection speeds faced by defect detection in power transmission network scenarios, laying a necessary theoretical foundation for subsequent practical applications.

About the journal

Aims & scope

Description

Editorial board

Cover gallery

Contact us

Browse

Latest issue

All volumes and issues

Most accessed

Most cited

Authors & reviewers

Online submisson

Guidelines for authors

Please choose a citation manager