PDF
(5518KB)
Abstract
Realtime analyzing the feeding behavior of fish is the premise and key to accurate guidance on feeding. The identification of fish behavior using a single information is susceptible to various factors. To overcome the problems, this paper proposes an adaptive deep modular co-attention unified multi-modal transformers (DMCA-UMT). By fusing the video, audio and water quality parameters, the whole process of fish feeding behavior could be identified. Firstly, for the input video, audio and water quality parameter information, features are extracted to obtain feature vectors of different modalities. Secondly, deep modular co-attention (DMCA) is introduced on the basis of the original cross-modal encoder, and the adaptive learnable weights are added. The feature vector of video and audio joint representation is obtained by automatic learning based on fusion contribution. Finally, the information of visual-audio modality fusion and text features are used to generate clip-level moment queries. The query decoder decodes the input features and uses the prediction head to obtain the final joint moment retrieval, which is the start-end time of feeding the fish. The results show that the mAP Avg of the proposed algorithm reaches 75.3%, which is 37.8% higher than that of unified multi-modal transformers (UMT) algorithm.
Keywords
aquaculture
/
multi-modal fusion
/
deep modular co-attention (DMCA)
/
unified multi-modal transformers (UMT)
/
video moment retrieval
Cite this article
Download citation ▾
null.
Fish Feeding Behavior Recognition Using Adaptive DMCA-UMT Algorithm.
Journal of Beijing Institute of Technology, 2023, 32(3): 285-297 DOI:10.15918/j.jbit1004-0579.2023.008