A detection-regression based framework for fish keypoints detection
Junyu Dong , Xinyu Shangguan , Kaiming Zhou , Yanhai Gan , Hao Fan , Long Chen
Intelligent Marine Technology and Systems ›› 2023, Vol. 1 ›› Issue (1) : 9
Applying computer vision technology in aquaculture can improve the efficiency of fish detection and health monitoring as well as optimize aquaculture management and profit. Keypoints on fish bodies are important biological indicators that can be used to calculate the individual size, mass, and behavior. However, only a few relevant studies have been conducted in this regard, and they mainly focus on detecting keypoints for stereo matching. Traditional keypoint detection methods exhibit low efficiency, poor accuracy, and weak robustness in underwater environments. Accordingly, this study proposes a new method based on object detection and point regression models to locate fish keypoints. First, individual fish are detected by employing a commonly used object detection model, YOLOv5. The detection accuracy is further improved by enhancing the network neck. In the second stage, a deep learning model for locating fish keypoints is constructed by implementing weight allocation and distribution-aware strategy in the matched left and right bounding boxes to improve on the previous work of Lite-HRNet, which was originally designed for capturing human body keypoints. The experimental results show that the proposed method can effectively detect individual underwater fish and accurately estimate the keypoints. The source code and the labeled datasets for fish detection and keypoint location are provided. The code is available at https://github.com/oucvisionlabsanya/fish_keypoint_detection.git.
Object detection / Fish keypoint detection / YOLOv5 / Lite-HRNet / Copy-paste
| [1] |
Bulat A, Kossaifi J, Tzimiropoulos G, Pantic M (2020) Toward fast and accurate human pose estimation via soft-gated skip connections. 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, Buenos Aires, Argentina, 16-20 November 2020, pp 8–15 |
| [2] |
Ghiasi G, Cui Y, Srinivas A, Qian R, Lin TY, Cubic ED (2020) Simple copy-paste is a strong data augmentation method for instance segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, pp 2918–2928 |
| [3] |
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 07-13 December 2015, pp 1026–1034 |
| [4] |
|
| [5] |
Li ZF, Lin DY, Peng XF (2021) Template matching based rutting machine gap detection algorithm. J China Railw Soc 43(08):88–96 |
| [6] |
|
| [7] |
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. European Conference on Computer Vision. Springer, Cham, pp 483–499 |
| [8] |
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Chris B et al (2017) Towards accurate multi-person pose estimation in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 9 November 2017, pp 3711–3719 |
| [9] |
Ronchi MR, Perona P (2017) Benchmarking and error diagnosis in multi-instance pose estimation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22-29 October 2017, pp 369–378 |
| [10] |
|
| [11] |
Sun K, Xiao B, Liu D, Wang JD (2019) Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15-20 June 2019, pp 5693–5703 |
| [12] |
Suo F, Huang K, Ling G, Li Y, Xiang J (2020) Fish keypoints detection for ecology monitoring based on underwater visual intelligence. 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp 542–547 |
| [13] |
Tompson J, Jain A, Lecun Y, Berger C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. arXiv:1406.2984 |
| [14] |
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23-28 June 2014, pp 1653–1660 |
| [15] |
Wu XP, Guan YP (2019) Multi-pose face recognition based on face keypoints and incremental clustering. Laser and Optoelectronics Progress 56(14):62–70 |
| [16] |
Yang TY, Nguyen D K, Heijnen H (2020) UR2KiD: unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision. arXiv e-prints arXiv:2001.07252 |
| [17] |
Zhang F, Zhu X, Dai HB, Mao Y, Ce, Z (2019) Distribution-aware coordinate representation for human pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 7093–7102 |
| [18] |
Zhang XH, Li B, Yang D (2007) A novel harris multi-scale corner detection algorithm. J Electron Inf Technol 29(7):1735–1738 |
| [19] |
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 June 2018, 6848–6856 |
/
| 〈 |
|
〉 |