Sun coral (Tubastraea spp.) is an invasive species that poses a considerable threat to coastal ecosystems. Therefore, early detection is essential for effective monitoring and mitigation of its negative impacts on marine biodiversity. This study presents a novel computer vision approach for automated early detection of invasive Tubastraea species in underwater images. We used the YOLOv8 object detection model, which was trained and validated on a manually annotated dataset augmented with synthetic images. The data augmentation addressed the challenge of limited training data that is prevalent in underwater environments. The model achieved performance metrics (in terms of precision accuracy, recall, mAP50, and F1 score) of over 90% and detected both open and closed coral stage classes. Test phase results were compared with expert validation, demonstrating the model’s effectiveness in rapid detection (16 ms) and its limitations in areas highly covered by Tubastraea. This study demonstrates the potential of deep learning with data augmentation to facilitate the rapid assessment of large image datasets in monitoring sun coral bioinvasion. This approach has the potential to assist managers, taxonomists, and other professionals in the control of invasive alien species.
Detailed mapping of seafloor topography is essential for understanding seafloor evolution, ensuring navigational safety, and discovering mineral resources. As global environmental conditions continue to deteriorate, various international and regional initiatives have been launched to accelerate seafloor topography mapping, yielding valuable data. Currently, only about a quarter of the seafloor has been directly mapped, observed, and explored due to limitations in traditional detection techniques. However, artificial intelligence, particularly machine learning, is progressively overcoming these constraints with its advanced data processing and analysis capabilities. In recent years, machine learning has increasingly emerged as an alternative to traditional methods, particularly for mapping both open-ocean and shallow-sea topography. This paper first introduces traditional seafloor topography detection techniques and the global topography models developed using them. It then examines the application of machine learning in seafloor mapping before concluding with the challenges and future prospects of intelligent seafloor mapping, along with relevant recommendations.
Underwater object detection faces significant challenges, including ambiguity and occlusion, which greatly undermine the accuracy of traditional algorithms. To address these issues, we propose Bi2F-YOLO, an algorithm, specifically designed for underwater environments. Bi2F-YOLO integrates the BiFormer module into the YOLOv7 backbone, utilizing Bi-Level routing attention (BRA) to focus on key features such as object edges and textures. This effectively addresses the problem of object ambiguity. In the detection head, we replace the conventional ELAN component with the FasterNet module. This update enhances detection efficiency and accuracy through the use of partial convolution (PConv), which redistributes the convolution kernel weights based on the sparsity of the input feature map. By doing so, it prevents the dilution of critical underwater object features caused by interference from irrelevant data. This effectively resolves the occlusion problem in underwater target detection while simultaneously reducing model parameters and computational costs. The experimental results show that Bi2F-YOLO achieves 87.3%
Sea surface temperature (SST) is a critical parameter in understanding Arctic amplification of climate change. In this study, SST in the Arctic was estimated based on data from the Medium Resolution Spectral Imager (MERSI) II on board the Fengyun-3D (FY-3D) satellite and in-situ measurements. To improve the quality of the MERSI thermal data, an optimization model for stripe noise removal based on the alternating direction multiplier method was employed. Clear-sky SST was estimated based on the nonlinear SST (NLSST) algorithm and tripe NLSST algorithm. When compared with the SST product retrieved from the Visible Infrared Imaging Radiometer Suite (VIIRS) in September 2019, the mean difference between VIIRS SST and MERSI II SST is −0.21℃ with a standard deviation of 0.29℃ in the daytime, while the mean difference is −0.15℃ with a standard deviation of 0.34℃ at nighttime. Results indicate that the accuracy of MERSI II SST meets the requirements for high-accuracy SST retrieval. Furthermore, these algorithms demonstrate the potential for long-term SST estimation in the Arctic using the FY-3D/MERSI II data.
Because of complex marine environments and scarce data, underwater acoustic target classification (UATC) is challenging. To improve model generalization ability, data augmentation methods, particularly data synthesis methods based on generative adversarial networks (GANs), are widely adopted. However, the training of GANs is usually slow and unstable. To address these issues, this paper proposes the adaptive stable deep convolutional GAN (AS-DCGAN). We introduce an adaptive controller that controls the learning progress based on the network training performance, thereby avoiding redundant training and accelerating the process. Additionally, we propose a progressive learning strategy that forces the network to gradually learn from low to high frequencies, stabilizing the training. We evaluated AS-DCGAN on two public datasets. The results show that our proposed method achieves state-of-the-art performance, with an accuracy of 81.14% on the DeepShip dataset and 86.11% on the ShipsEar dataset. Therefore, data augmentation and multimodel fusion methods can generate higher-quality data and effectively enhance the performance of UATC classification models.
The increasing research on sailing simulators has facilitated the advancement and widespread adoption of land-based sailing training. The somatosensory interaction offers benefits such as ease of deployment, low cost, and natural interactivity. This paper introduces a sailing simulator based on somatosensory interaction, wherein a standard camera and 3D human pose estimation (3DHPE) technology are utilized for sailing simulation. The aim of the study is to strike a balance between interactivity and deployment complexity. Despite the recent advances in 3DHPE, its use in real-time deployment scenarios has not been fully explored. The use of 3DHPE for pose estimation of dynamic users, wherein achieving a balance between real-time efficiency and accuracy is challenging. To address this, we propose a 3D pose estimation approach that integrates a graph-guided state space (STGJMamer). By using the lightweight transformer model PoseformerV2 as a baseline, the proposed method yields good real-time efficiency while integrating spatiotemporal extended graph convolutions and hierarchical joint enhancement Mamba. Furthermore, the model can efficiently capture both global and local features, and this ultimately enhances the pose estimation accuracy for dynamic users. The experimental results demonstrate that our approach achieves a frame-wise inference speed of 52 FPS, satisfying real-time constraints. Furthermore, it achieves an average mean per-joint position error (MPJPE) of 29.5 mm on the MPI-INF-3DHP dataset, outperforming most existing methods. Finally, we deploy the STGJMamer in a somatosensory interaction-based sailing simulator system and study its feasibility in real-world applications. The code is available at https://gitee.com/chemingmei/stgjmamer.
Marine biomass composition analysis traditionally requires time-consuming processes and domain expertise. This study demonstrates the effectiveness of rapid evaporative ionization mass spectrometry (REIMS) combined with advanced machine learning (ML) techniques for accurate marine biomass composition determination. Using fish species and body parts as model systems representing diverse biochemical profiles, we investigate various ML methods, including unsupervised pretraining strategies for transformers. The deep learning approaches consistently outperformed traditional machine learning across all tasks. For fish species classification, the pretrained transformer achieved 99.62% accuracy, and for fish body parts classification, the transformer achieved 84.06% accuracy. We further explored the explainability of the best-performing and predominantly black box models using local interpretable model-agnostic explanations and gradient-weighted class activation mapping to identify the important features driving the decisions behind each of the best performing classifiers. REIMS analysis with ML can be an accurate and potentially explainable technique for automated marine biomass composition analysis. Thus, REIMS analysis with ML has potential applications in quality control, product optimization, and food safety monitoring in marine-based industries.
Underwater visual object tracking (UVOT) is of great importance to marine applications; however, it remains understudied within mainstream computer vision research. Although existing approaches that leverage the prompt information to enhance the performance of single object tracking approaches primarily rely on auxiliary modal data, the inherent semantic misalignment persists across modalities, with unavoidable feature redundancy and cross-modality noise. To address these issues, we propose a self-prompt single target tracking network, namely, SPTrack, on top of intrinsic image cues. The proposed network extracts global features from raw images as scene-aware prompts and is coupled with a feature-pruning mechanism to eliminate multiscale feature redundancy. Ultimately, the perception capability of the tracker in dynamic scenarios is improved. The experimental results derived from a recent underwater object tracking data set demonstrated that the proposed SPTrack achieved area under the curve (AUC) values of 0.545, with a real-time inference speed of 38.5 FPS. We also performed experiments on two open-air object tracking data sets, and a remarkable performance was also obtained. These promising results are attributed to our proposed solution for object tracking in complex underwater scenarios, which specifically addresses challenges (such as occlusion and light scattering) through scene-adaptive feature learning.
In underwater environments, transparent organisms with low visibility and minimal visual features, lacking distinctive shadows or silhouettes, can blend seamlessly into their surroundings. Existing deep learning methods for detecting such organisms have shown unsatisfactory performance. This study proposes a multimodal fusion network, UTNet, which combines event-based and red-green-blue (RGB)-based vision for the underwater transparent camouflaged organism detection task. UTNet introduces a two-stage enhanced representation aggregation module comprising a multi-feature aggregation component (MFAC) and a deep fusion component (DFC) to facilitate the synergy between frame-based and event-based vision. First, MFAC aggregates the high dynamic range features from events with the static details from RGB images. Then, the edge information from the edge clue search module is used to guide the fusion process, reducing background interference. Next, DFC further extracts depth information from the MFAC output using five parallel branches. Additionally, a submanifold sparse convolution-modified ResNet50 backbone network is employed to extract features from event frames, preserving event sparsity and improving computational efficiency. Extensive experiments on our custom underwater transparent organism dataset, captured using the DAVIS346 event camera, demonstrate the effectiveness of UTNet. The results show that UTNet achieves 75.2% accuracy and 37.8 frames per second, providing the best trade-off between speed and accuracy compared to other detectors.
The deep-sea mobile platform system represents a new-generation universal deep-sea cable-controlled integrated geological survey equipment independently developed by the Qingdao Institute of Marine Geology. This system meets diverse marine geological survey requirements, including seabed core drilling, high-precision geological sampling, precise deployment of seabed equipment, and in-situ detection near the seabed. It integrates multiple functions such as visual control, precise sampling, accurate deployment, seabed salvage, and multiparameter in-situ detection. Compared with traditional deep-sea remotely operated vehicles, it offers important advantages, including compact size, lightweight design, low requirements for carrying vessels, and flexible operation. The system has a maximum operating depth of 6000 m, which makes it suitable for various applications, such as seabed resource exploration, marine environmental investigation, and research on marine geological scientific issues. This article offers a comprehensive introduction to the overall design, key technologies, and main functions of this equipment, demonstrates its excellent performance in practical applications, and proposes specific optimization directions for future intelligent upgrades.
Remotely operated vehicles (ROVs) are playing indispensable roles in the ongoing exploration and utilization of ocean resources as they offer flexibility and efficiency. Deep reinforcement learning (DRL) algorithms have been widely used to enhance ROV autonomy, reduce operator workload, and minimize human errors in operations. However, traditional DRL methods rely on a well-crafted reward function specific to the task, which is often challenging to design precisely. Learning from demonstration offers an alternative way, as it enables agents to imitate expert trajectories and refine their policies without relying on reward functions. However, although most existing studies assume that detailed action or control information is available from expert demonstrations, such data are typically hard to obtain in practice. To overcome this limitation, we propose and implement an imitation learning from the observation method for ROV path tracking. In our approach, policy learning is derived solely from observed expert trajectories without the need for explicit action data. We evaluated our method on both straight-line and sinusoidal tracking tasks, and compared the results to those of proximal policy optimization (PPO), a traditional DRL algorithm, using predefined rewards. The experimental results demonstrate that our approach achieves a performance comparable to that of PPO, while offering a faster learning rate and enhanced adaptability to different tasks.
Sea ice, as a complex and highly variable medium covering the ocean surface, exhibits intricate internal structural variations that significantly affect the ocean’s thermodynamic, dynamic, and acoustic properties. However, the extreme conditions of the polar environment pose major challenges for conducting in-situ surface experiments, resulting in limited research on the acoustic structure of Arctic sea ice. This paper presents the results obtained from cross-hole acoustic measurements conducted during the 13th Chinese National Arctic Research Expedition. A traveltime tomography method is employed to invert the acoustic velocity distribution within sea ice. The Dijkstra algorithm is used for ray tracing, and damped least-squares inversion is applied to enhance solution stability and control ray path curvature to model acoustic wave propagation paths accurately. Additionally, the attenuation characteristics of sea ice are analyzed by monitoring changes in signal amplitude. This study also investigates the influence of temperature on sound speed and amplitude. The findings of this work contribute valuable insights to future studies on sea ice-climate interactions and acoustic wave propagation in polar ocean environments.
Accurate depth estimation is essential for unmanned underwater vehicles to effectively perceive their environment during target tracking tasks. Therefore, we propose a self-supervised monocular depth estimation framework tailored for underwater scenes, incorporating multi-attention mechanisms and the unique optical characteristics of underwater imagery. To address issues such as color distortion in underwater images, primarily caused by light attenuation in underwater scenes, we design an adaptive underwater light attenuation loss function to improve the model’s adaptability and generalization across diverse underwater scenes. The inherent blurriness of underwater images poses considerable challenges for feature extraction and semantic interpretation. We use dilated convolutions and linear space reduction attention (CDC Joint Linear SRA) to capture local and global features of underwater images, which are then integrated through feature map fusion. Subsequently, we use a multi-attention feature enhancement module to further enhance the spatial and semantic information of the extracted features. To address fusion interference arising from discrepancies in semantic information between feature maps, we introduce a progressive fusion module that balances cross-module features using a two-step feature refinement strategy. Comparative, ablation, and generalization experiments were conducted on the FLSea dataset to verify the superiority of the proposed model.
The partial derivatives obtained through the difference approximation in the finite-difference method for solving the scalar acoustic wave equation may give rise to computational errors, which have the potential to induce numerical dispersion. Typically, the temporal or spatial higher-order difference format is employed, whereby the difference order between the computational region and the perfectly matched layer (PML) boundaries can result in boundary reflections. In this study, we derive the acoustic wave equation and its PML boundary conditions in the finite difference format of the temporal fourth-order and the spatial 2Nth-order, based on the Lax-Wendroff method. Subsequently, the stability conditions of the two finite difference formats are presented and analyzed under different parameters. This effectively addresses the issue of temporal dispersion. Furthermore, the high-order PML temporal boundary conditions effectively suppress the boundary reflection phenomenon generated by the computational regions and the different difference orders of the PML boundaries. Moreover, the time-space dispersion relation of the acoustic wave equation is employed to globally optimize the difference coefficients via the least-squares method, thereby suppressing the spatial dispersion. The numerical solution experiments of the acoustic wave equation for the horizontal laminar model and the Marmousi model demonstrate the efficacy of the presented algorithm.
Monitoring water quality in turbid coastal waters is critical for ecological security, yet traditional sampling methods are often inefficient and spatially limited. Remote sensing, with its multiscale coverage and real-time capabilities, has emerged as a pivotal tool. However, the complex optical properties of turbid coastal waters, such as the strong absorption-scattering coupling effects of suspended particulate matter (SPM), chlorophyll-a (
This study presents an Eddy LSIN (LSTM-Informer) framework for the accurate prediction of mesoscale eddy trajectories in the northeastern South China Sea. This comprehensive framework is designed to improve mesoscale eddy trajectory prediction, and it employs a multimodal spatiotemporal eddy representation (STER) dataset, a temporal Informer (T-Informer) network for long-sequence modeling, and a geographically aware combined mean squared error (CMSE) loss function. Mesoscale eddies are critical dynamic processes in oceans, and accurately predicting their trajectories is of great importance in oceanographic research. Leveraging the physical characteristics of mesoscale eddies, the framework is based on the construction of a high-resolution dataset that integrates sea level anomalies, relative vorticity, and kinetic energy features. An autoencoder is employed for feature extraction, resulting in the STER dataset, which reduces prediction errors by approximately 70% (mean squared error) and 82% (mean absolute error)compared to the unprocessed sla-t-u-v dataset. The framework incorporates a long short-term memory model for short-term prediction and a temporally encoded T-Informer model for long-term prediction. To balance numerical accuracy and spatial precision, CMSE is introduced by combining the mean squared error with the Haversine distance-based Geographic error. Experimental results show that the framework achieves average trajectory center errors of approximately 8 km for 1-day predictions, 18 km for 7-day predictions, and 39 km for 20-day predictions. Compared with conventional models, the T-Informer improves the long-sequence prediction accuracy by around 40% and effectively mitigates cumulative errors. Overall, the Eddy LSIN framework demonstrates high accuracy and robustness in short- and long-term mesoscale eddy trajectory prediction and offers a promising approach for advancing predictive oceanography.
Vision serves as a crucial information source for underwater observation and operations; however, the quality of underwater imaging is often compromised by remarkable color distortion and detail loss, which are further exacerbated under nonuniform lighting conditions. The existing traditional nonlearning solutions often struggle to adapt to diverse underwater degradation, while purely data-driven learning strategies are often limited by scarce and low-quality samples, making it difficult to achieve satisfactory results. In contrast to existing joint learning frameworks, we propose a unified yet decoupled framework for effectively addressing the challenges of color correction and illumination enhancement in underwater images. Our proposed method employs distinct prediction and learning strategies to tackle these two key issues individually, thereby overcoming the limitations associated with the reference of learning samples that neglect lighting conditions. Consequently, the proposed approach yields enhanced overall visual effects for underwater image enhancement. Comparative experiments and ablation experiments on publicly available datasets have validated the effectiveness of the proposed self-attention-driven adaptive luminance transfer and multiple color space feature encoding. The source code and pretrained models are available on the project home page: https://github.com/OUCVisionGroup/MCAL-Net.
Altimeters can be used to observe global ocean dynamic environment parameters, including sea surface 10-m wind speed (U10) and significant wave height (SWH). Many altimeters are working in two frequency bands and thus have two independent sets of U10 and SWH observations. Their accompanying nadir-viewing microwave radiometers can also retrieve U10. Merging these independent retrievals can be helpful to achieve high accuracy. In this study, the U10 and SWH data from the Jason-1 satellite were calibrated against the buoy observations from the National Data Buoy Center (NDBC). A deep learning technique was used to merge the data from the Ku and C bands and the Jason microwave radiometer. This algorithm employs multiple altimeter-observed parameters, including SWH, backscatter cross-section, and brightness temperature, as inputs to effectively enhance the retrieval accuracy. The overall root mean square error was reduced from 1.42 m/s to 1.18 m/s for U10 and from 0.31 m to 0.26 m for SWH. A pronounced improvement was observed in wind speed data accuracy under rainfall conditions. The principles underlying this model can be further applied to other altimeter satellites, thereby enhancing their precision for wind speed and SWH.
Marine ecosystems and coastal economies are seriously threatened by marine heatwaves (MHWs), which are defined as extended periods of abnormally high sea surface temperatures (SSTs). Accurate and early MHW forecasting has become essential because climate change has increased the frequency and severity of such phenomena. In this review, we examine the application of traditional machine learning (ML) and deep learning (DL) methods for MHW detection and prediction. Specifically, we investigate the algorithms (neural networks, ensemble methods, and hybrid architectures) as well as the input variables, datasets, and evaluation metrics employed. Additionally, we review previous studies conducted on different ocean basins to highlight regional patterns and model transferability. Furthermore, we identify the emerging trends in DL, such as the use of explainable artificial intelligence and physics-guided learning for MHW prediction, and outline key challenges and limitations. Finally, we discuss future directions for improving the accuracy, generalization, and interpretability of MHW forecasting systems.
Monitoring coastal ecosystems is essential for mitigating pollution, preserving biodiversity, and understanding the impacts of climate change. However, existing approaches, such as fully convolutional network (FCN) and Transformer-based models, often struggle with challenges such as low-class variance, difficulty in detecting small targets, and loss of boundary information. To handle large variations in target scales, we propose a semantic segmentation framework, SDFSeg, which integrates three key modules: the scale aware conv, dynamic deformable sample, and fusion perceiver. The scale aware conv is designed to improve multiscale feature extraction by incorporating convolutional layers with varying dilation rates; the dynamic deformable sample precisely aligns target boundaries, focuses on small features, and enables adaptive dynamic sampling for improved small target detection and boundary segmentation; and the fusion perceiver effectively fuses local and global information. Extensive experiments on benchmark datasets demonstrate that our method achieves a superior performance while reducing the computational overhead, confirming its practical applicability.
Accurate prediction of significant wave height is crucial for maritime safety, offshore engineering, and disaster mitigation. Recent advancements in deep learning have improved wave height prediction by utilizing historical data. However, ocean waves exhibit highly complex and distinct dynamic characteristics across various temporal and spatial scales, resulting in intricate nonlinear couplings between spatial and frequency components. Since conventional methods primarily rely on feature extraction in the space-time domains, they cannot capture multiscale physical variations in wave dynamics accurately, ultimately limiting predictive accuracy. To address this limitation, we propose an effective wave height prediction method based on multiscale physical space-time-frequency (PSTF) fusion (MPST). The proposed method integrates two key modules: the multiscale feature extraction (MSFE) module, which utilizes atrous spatial pyramid pooling to enhance local-global wave characteristic extraction, and the PSTF information fusion module, which combines a Fourier neural operator and a Transformer to fuse frequency and spatial features dynamically. A gated fusion mechanism further optimizes feature weighting, improving accuracy and robustness. Experimental results on the ERA5 dataset show that MPST reduces overall prediction error by 7.43% compared to state-of-the-art methods. Notably, in high-wave regions (above 2, 4, and 8 m), prediction accuracy improves by 5.86%, 3.46%, and 8.84%, respectively. These findings highlight better stability and adaptability of the MPST in complex ocean environments. Thus, MPST offers a robust solution for precise wave height prediction and supports maritime safety and sustainable ocean engineering.
Domain-adaptive zero-shot learning is an emerging and challenging task that extends zero-shot learning to scenarios where the source and target domains have different distributions. To address this issue, existing methods typically rely on category prototypes learned from the semantic embeddings of class labels and leverage pairwise semantic relationships to align the source and target samples in a shared feature space. While promising, these methods still face two key limitations: (1) they focus solely on pairwise relationships, limiting their ability to capture the complex, high-order structural correlations among category prototypes; and (2) they align samples only with the category prototypes, overlooking the intrinsic correlations among samples, which undermines alignment efficacy. To tackle these issues, we propose a new method, neighbourhood clustering for enhancing (NCE) learning framework, which fully exploits correlations among both category prototypes and samples. For category prototypes, we adopt a hypergraph-based approach to capture high-order correlations that go beyond simple pairwise relationships. During the alignment process, we incorporate both intraclass and interclass correlations among samples. Experimental results on the I2AwA and I2WebV datasets demonstrate that our method significantly outperforms state-of-the-art methods in terms of performance. Furthermore, to validate the effectiveness of our method in more challenging scenarios, we use it in underwater image scenarios. Experimental results show that our method significantly improves the accuracy and robustness of underwater image recognition.