Multimodal image registration is a crucial prerequisite for the automation and intelligence of interventional surgical medical robots. In endovascular aneurysm repair, due to limitations in imaging principles and hemodynamic effects, single-frame DSA images often fail to provide a complete representation of the vascular structure. This is particularly true for blood vessels that run parallel to the X-ray beam, as they are difficult to visualize in the DSA images. To address this issue, this study proposes an abdominal aortic vessel registration network, HDCAR, based on preoperative CTA 3D vascular models and intraoperative DSA images, aiming to enhance vascular completeness and spatial consistency in intraoperative imaging. The HDCAR network integrates multiple optimization modules to improve registration accuracy and robustness. First, the K-Sample module is employed to filter DSA images, enhancing the uniformity of intra-vascular structures and improving contrast between vessels and surrounding tissues. Second, depth information is incorporated to strengthen cross-dimensional spatial feature fusion, thereby optimizing the alignment between preoperative 3D models and intraoperative 2D images. Additionally, the network utilizes a dual-rectangular-window-based cross-attention mechanism and the RankC module to enhance both global contextual relationships and local feature representations. The ASPP module is further employed to extract multi-scale feature information, improving the model’s ability to capture vascular structures. Finally, a two-stage hybrid loss function is applied to optimize network parameters, ensuring precise and stable image registration. Experimental results demonstrate that the HDCAR network achieves high-precision vascular registration across multi-modal images, significantly improving the completeness and accuracy of intraoperative vascular imaging. This provides more precise imaging support for endovascular aneurysm repair procedures and holds great potential for clinical applications.
In fiber optic communication, as networks expand, precise detection and alignment of fiber optic adapters are crucial for enhancing system stability and transmission quality. Traditional target detection algorithms face two main issues in fiber optic adapter detection: inability to handle arbitrarily oriented targets and difficulty in efficient deployment on embedded devices. To tackle these issues, this paper introduces a lightweight rotating target detection algorithm, YOLO-FOA, for fiber optic communication scenarios. The algorithm is based on the YOLO model, which significantly reduces the computational and parametric quantities of the model by introducing Dynamic Head and Dynamic ATSS, and the C2f_MViTBv3, C2f_GhostBlockv2 modules and Angle DFL Loss are designed to improve the detection accuracy. In addition, the dynamic alignment correction mechanism can be effectively applied to intelligent calibration and real-time deviation correction in fiber optic communication networks. Experiments show YOLO-FOA achieves 97.1% detection accuracy on a self-constructed dataset, outperforming the baseline model by 1.3%, with a 4.5% reduction in parameters and 7.2% in computation. Suitable for embedded devices due to its high accuracy and low resource demands, YOLO-FOA offers a new approach to enhancing fiber optic communication system stability and transmission quality.
With the rapid advancement of large language models (LLMs) and robotics, service robots are increasingly becoming an integral part of daily life, offering a wide range of services in complex environments. To deliver these services intelligently and efficiently, robust and accurate task planning capabilities are essential. This paper presents a comprehensive overview of the integration of LLMs into service robotics, with a particular focus on their role in enhancing robotic task planning. First, the development and foundational techniques of LLMs, including pre-training, fine-tuning, retrieval-augmented generation (RAG), and prompt engineering, are reviewed. We then explore the application of LLMs as the cognitive core—“brain”—of service robots, discussing how LLMs contribute to improved autonomy and decision-making. Furthermore, recent advancements in LLM-driven task planning across various input modalities are analyzed, including text, visual, audio, and multimodal inputs. Finally, we summarize key challenges and limitations in current research and propose future directions to advance the task planning capabilities of service robots in complex, unstructured domestic environments. This review aims to serve as a valuable reference for researchers and practitioners in the fields of artificial intelligence and robotics.
Autonomous driving systems face challenges from perception degradation and kinematic coupling in adverse weather. This paper introduces an end-to-end trajectory prediction framework integrating multi-weather continual learning with kinematic constraint optimization. Traditional weather-specific models suffer from fragmented experience and catastrophic forgetting, impacting control in low-visibility, high-curvature scenarios. We propose a multi-weather adaptive replay mechanism (MWARM) with entropy-weighted sampling for cross-weather knowledge transfer, paired with a bird’s eye view (BEV)-based perception-planning architecture using multi-objective model predictive control (MO-MPC) to adjust weights based on real-time curvature and weather data. Evaluated in CARLA with a multi-weather dataset, the framework provides a robust solution for complex conditions.
Continuum robots have been widely utilized in various fields, such as medical surgery, industrial manufacturing, and aerospace, due to their flexibility and compliance. However, their high structural compliance also presents significant challenges in achieving precise control. Although many existing continuum robots feature multiple degrees-of-freedom (DOFs) and complex control systems, such sophistication is often unnecessary for simple, repetitive, and task-specific applications where task-specific structures are more efficient. To address this issue, this paper proposes a parametric optimization-based automated design framework to generate structural models for multi-section 1-DOF flexure-joint-based continuum robots capable of achieving any two predefined end-effector poses. The proposed methodology employs a constant curvature assumption to simulate the bending characteristics of the continuum robot. MATLAB is used to optimize and solve the structural parameters, followed by the generation of 3D-printable models using the Solid Geometry Library Toolbox. Experimental results demonstrate that, under certain geometric boundary conditions for structural parameters, the robot’s end-effector can reach any two predefined poses with high accuracy. This approach significantly reduces the structural and control complexity of task-specific continuum robots, lowers manufacturing costs, and expands their range of applications.
Large language models (LLMs) have been widely adopted in robotic applications in recent years, but their ability in task planning of long-horizon and complex tasks remains a challenge. In this work, we present a gradual learning method to address this challenge and explore its usability in surgical training tasks that require high levels of reasoning, such as peg transfer and the sliding puzzle task. Experiments were conducted using the da Vinci Research Kit (dVRK), with environment feedback initiating follow-up prompts for the LLM when necessary, as well as in a simulation environment. Results showed that for complex tasks, the gradual learning method outperformed the direct approach in the LLM’s task and motion planning, requiring fewer follow-up prompts and leading to higher success rates with faster execution. This suggests that for complex pseudo-surgical tasks, it is more efficient to have the LLM solve simpler versions of the task while incrementally increasing complexity, rather than tackling the full complex task at once. The approach shows promise for enhancing robot-assisted surgery where tasks are complex, long-horizon, and demand high-reasoning abilities.
Tendon-sheath mechanisms (TSMs) are widely used for position transmission in robotic systems that require compactness and adaptability to complex environments. However, friction-induced tendon-elongation disrupts the alignment between input and output positions, preventing the robotic end-effector from accurately following motion commands. Since tendon-elongation depends on the configuration of the transmission route, resolving position transmission misalignment in TSMs becomes even more challenging. Building upon the tendon-elongation compensator developed in the author’s recent work, this study presents a technical note aiming to align the actual output position with the desired position. The improved compensator operates without relying on any distal sensory feedback, thereby preserving the compactness of the system. Notably, it is applicable to TSMs with arbitrary and time-varying transmission routes in three-dimensional (3-D) space, fulfilling the adaptability requirement. Preliminary experimental results demonstrate the potential of the presented technique, achieving 96.44%–97.56% accuracy in distal position tracking. By tackling a long-standing challenge in TSM research, this study lays a technical foundation for future advancements in the field.
Deformity correction has positive significance for limb function reconstruction. To reduce the workload of the physicians and enhance the intelligence level of deformity correction, a wearable orthopedic robot based on traditional deformity correction devices is proposed, and the methodology for deformity correction is studied. By utilizing the homogeneous coordinate transformation method, the inverse kinematics of the robot is derived. Based on the approach of PointNet++, the small-batch gradient descent optimization method is used to train the model, achieving effective segmentation of the point cloud of the robotic system. The mirrored registration strategy based on the healthy contralateral bone is adopted, with the SAC-IA (Sample Consensus Initial Alignment) method for coarse registration and the ICP (Iterative Closest Point) method for fine registration, to measure the deformity parameters of the bone accurately. A physical prototype of the orthopedic robot is constructed, relevant experimental parameters are obtained through the optical measurement equipment, and the robot is driven to perform the deformity correction task based on the inverse kinematics solution. Experimental results confirm the clinical viability of the proposed orthopedic robot.
Autonomously completing a contact-rich task for multiple manipulation objects remains a challenging problem for robots. To achieve this goal, learning from demonstration has emerged as an efficient method for transferring human-like skills to robots. Existing works primarily focus on trajectory or impedance learning to design force-impedance controllers for specific tasks, which require precise force sensing. However, visual perception plays a critical role in enabling humans to perform dexterous manipulation. To bridge the gap between vision and learning in the control loop, this work proposes a vision-based humanoid compliant skill transfer (VHCST) framework. Considering the lack of vision-impedance mapping, a hybrid tree is introduced as a planning bridge to encode skill parameters across multiple objects. To simplify skill transfer, an observation-wearable demonstration method is employed to capture the position and stiffness of human’s arm. The decoupled learning model incorporates the geometric properties of stiffness ellipsoids, which reside on Riemannian manifolds. Finally, the proposed approach is validated through robotic cutting experiments involving multiple objects. Comparative experimental results demonstrate the effectiveness of the proposed framework.
Snake-like robots leverage their slender bodies to navigate confined spaces by coordinating the multiple actuated joints, which enable effective movement through constrained pathways. However, their high degrees of freedom in fully actuated systems engender significant challenges in reducing energy consumption. To address these challenges, this paper derives insights from the muscle functions of biological snakes and investigates the integration of compliance passive joints into snake-like robots, with the aim of enhancing locomotion efficiency. Passive joints, equipped with torsional springs, facilitate indirect actuation through energy storage and release. Under such background, we propose a dynamic model to investigate the influence of passive joints on locomotion performance. Simulations are utilized to analyze the effects of varying spring stiffness beyond experimental constraints. To facilitate systematic validation, a modular snake-like robot is designed. It allows flexible joint configurations, reassembly, and adjustable joint placements. Additionally, passive joint mechanism is refined to eliminate the requirements for motor gear reconfiguration, thereby improving experimental adaptability. The proposed model is evaluated through simulations and experiments to investigate the effects of joint stiffness on locomotion speed, while energy efficiency is analyzed experimentally. The results reveal that appropriate stiffness parameters significantly enhance motion efficiency. Moreover, the placement of passive joints plays a key role in the robot’s motion performance. Among all configurations, a compliant passive tail joint with an appropriate spring setup achieves the best performance. It increases motion speed by 26.8% and reduces energy consumption by 52.2%. These findings provide insights into the role of passive joints in snake-like robots, potentially contributing to future design improvements in locomotion efficiency and adaptability.
This paper presents a single-drive bio-inspired intestine robot that leverages the coupled rotational-contraction dynamics of Kresling origami structures and unidirectional valves to replicate intestinal peristalsis for directional transport. The minimalist design, comprising a servo motor, antagonistic chiral Kresling units, and check valves, enables continuous peristaltic wave propagation through reciprocal torsional input. Experimental validation demonstrates exceptional performance: transport speeds up to 20.91 mm/s, load capacity exceeding 97 g, and adaptability to objects spanning 32–46 mm in diameter across inclinations of 0°–90°. Key innovations include: (1) Biomechanical mimicry through antagonistic chiral units that convert rotation into radial contractions, replicating segmented intestinal propulsion; (2) Performance breakthroughs in speed and payload, enabled by efficient energy transfer from torsional kinematics; and (3) Valve-enabled directionality ensuring net forward displacement. Theoretical analysis establishes geometric constraints for valve-mediated transport, explaining the optimized operating range via valve aperture dynamics and material compliance. This work advances gastrointestinal robotics by addressing critical limitations in existing simulators: complex actuation, slow transport, and directional instability, providing a robust platform for medical applications such as segment artificial intestines replacement.
The Probabilistic Roadmap (PRM) algorithm has been widely employed in robotic manipulator path planning tasks due to its rapid exploration capabilities, particularly in high-dimensional configuration spaces with complex kinematic and environmental constraints. However, the efficiency of PRM is inherently constrained by the distribution of sampling points. In scenarios involving narrow passages, the sparsity of samples within such regions may significantly increase the likelihood of planning failure. In view of this, this paper proposes an improved PRM algorithm that is suitable for narrow channels with obstacles and can significantly improve the efficiency of path planning. First, a non-uniform partitioning strategy based on obstacle density is proposed to dynamically divide the sampling area to reduce the connection of redundant edges. Second, to address the sampling failure often encountered in narrow passages due to insufficient sample points, a weighted sampling adjustment strategy is proposed, which adaptively modifies the sampling density between narrow and open regions based on a comprehensive distance metric. Third, an adaptive variable step-size strategy is developed to dynamically adjust the connection steps between obstacle boundaries and open areas, further enhancing roadmap connectivity. By integrating the aforementioned strategies, the improved PRM algorithm proposed was applied in both two-dimensional and three-dimensional environments. The simulation results demonstrate that the method is capable of finding feasible paths in complex scenarios. Compared to the Lazy PRM and the OBPRM algorithms, the proposed approach achieves reductions of approximately 8.77% and 7.44% in path length and 9.00% and 5.74% in planning time, respectively. Finally, its effectiveness and superiority in robotic manipulator path planning were further validated through application to a 7-DOF manipulator.
Jumping is a critical capability for quadruped robots, especially for navigating obstacles and gaps in complex environments. For successful jump, accurate trajectory tracking and robust feedback mechanism are essential, as cumulative deviations from the desired jumping trajectory can lead to instability or landing failure. Existing controllers often rely on fixed joint-level PD control or simplified inverse dynamics, which often fall short in tracking accuracy and robustness. In this paper, we propose a phase-aware iterative Linear Quadratic Regulator (iLQR) framework tailored for dynamic quadruped jumping tasks. By segmenting the jumping motion into distinct phases, we define phase-wise optimal control problem that respects the unique characteristics and requirements of each stage. Moreover, by leveraging a planar full-body dynamics of quadruped in each iLQR sub-problem, we derive a tracking controller consisting time-varying, full-state feedback gains, which shows better performance in tracking accuracy and disturbances rejection over traditional baseline controllers. Extensive simulation and hardware experiments on the Deeprobotics Lite3 quadruped validate the effectiveness and reliability of our proposed method in a number of dynamic jumping scenarios.
Recent years have witnessed many successful trials in the robot learning field. For contact-rich robotic tasks, it is challenging to learn coordinated motor skills by reinforcement learning. Imitation learning solves this problem by using a mimic reward to encourage the robot to track a given reference trajectory. However, imitation learning is not so efficient and may constrain the learned motion. In this paper, we propose instruction learning, which is inspired by the human learning process and is highly efficient, flexible, and versatile for robot motion learning. Instead of using a reference signal in the reward, instruction learning applies a reference signal directly as a feedforward action, and it is combined with a feedback action learned by reinforcement learning to control the robot. Besides, we propose the action bounding technique and remove the mimic reward, which is shown to be crucial for efficient and flexible learning. We compare the performance of instruction learning with imitation learning, indicating that instruction learning can greatly speed up the training process and guarantee learning the desired motion correctly. The effectiveness of instruction learning is validated through a bunch of motion learning examples for a biped robot and a quadruped robot, where skills can be learned typically within several million steps. Besides, we also conduct sim-to-real transfer and online learning experiments on a real quadruped robot. Instruction learning has shown great merits and potential, making it a promising alternative for imitation learning.
Semantic segmentation methods based on RGB images exhibit notable limitations in complex industrial scenarios, particularly in addressing interference factors such as dynamic lighting variations and polymorphic weld seam morphologies, which lead to insufficient feature extraction capabilities and reduced segmentation accuracy and robustness. To address these limitations, this study proposes a polymorphic weld seam semantic segmentation model (PWSM) based on multi-level feature fusion, which effectively integrates the informational advantages of RGB and depth images to enhance perceptual capabilities in complex environments. The proposed model introduces a Dual-Stream Dual-modal Fusion (DSDF) module that employs channel selection and spatial selection strategies to extract and enhance complementary features from RGB and depth images. Concurrently, a Multi-Level Feature Fusion Module (ML-FFM) is developed to progressively integrate low-level and high-level semantic information through a multi-scale mechanism, refining boundary features while preserving the integrity of feature representation. Experimental results demonstrate that the model achieves superior segmentation performance on a complex multi-form weld seam dataset, particularly showing enhanced accuracy and robustness in challenging scenarios involving occlusions and illumination variations. Compared with existing single-modal and multi-modal models, the proposed model achieves performance improvements of 1.52% and 0.65%, respectively, providing effective technical support for intelligent perception of polymorphic weld seams.