Hook, line, and spectra: machine learning for fish species identification and body part classification using rapid evaporative ionization mass spectrometry

Jesse Wood; Bach Nguyen; Bing Xue; Mengjie Zhang; Daniel Killeen

doi:10.1007/s44295-025-00066-3

Intelligent Marine Technology and Systems ›› 2025, Vol. 3 ›› Issue (1) :16 DOI: 10.1007/s44295-025-00066-3

Research Paper

Hook, line, and spectra: machine learning for fish species identification and body part classification using rapid evaporative ionization mass spectrometry

Author information +

History +

PDF

Abstract

Marine biomass composition analysis traditionally requires time-consuming processes and domain expertise. This study demonstrates the effectiveness of rapid evaporative ionization mass spectrometry (REIMS) combined with advanced machine learning (ML) techniques for accurate marine biomass composition determination. Using fish species and body parts as model systems representing diverse biochemical profiles, we investigate various ML methods, including unsupervised pretraining strategies for transformers. The deep learning approaches consistently outperformed traditional machine learning across all tasks. For fish species classification, the pretrained transformer achieved 99.62% accuracy, and for fish body parts classification, the transformer achieved 84.06% accuracy. We further explored the explainability of the best-performing and predominantly black box models using local interpretable model-agnostic explanations and gradient-weighted class activation mapping to identify the important features driving the decisions behind each of the best performing classifiers. REIMS analysis with ML can be an accurate and potentially explainable technique for automated marine biomass composition analysis. Thus, REIMS analysis with ML has potential applications in quality control, product optimization, and food safety monitoring in marine-based industries.

Keywords

AI applications / Explainable AI / Machine learning / Marine biomass / Mass spectrometry / Multidisciplinary AI

Cite this article

Download citation ▾

Jesse Wood, Bach Nguyen, Bing Xue, Mengjie Zhang, Daniel Killeen. Hook, line, and spectra: machine learning for fish species identification and body part classification using rapid evaporative ionization mass spectrometry. Intelligent Marine Technology and Systems, 2025, 3(1): 16 DOI:10.1007/s44295-025-00066-3

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	AbdiH, WilliamsLJ. Principal component analysis. Wiley Interdiscip Rev-Comput Stat, 2010, 2(4): 433-459

[2]	Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. Preprint at arXiv:1607.06450

[3]	BalakrishnamaS, GanapathirajuA. Linear discriminant analysis-a brief tutorial. Inst Signal Inf Proc, 1998, 18(1998): 1-8

[4]	BalogJ, SzaniszloT, SchaeferKC, DenesJ, LopataA, GodorhazyL, et al.. Identification of biological tissues by rapid evaporative ionization mass spectrometry. Anal Chem, 2010, 82(17): 7343-7350

[5]	Bettjeman BI, Hofman KA, Burgess EJ, Perry NB, Killeen DP (2018) Seafood phospholipids: extraction efficiency and phosphorous nuclear magnetic resonance spectroscopy (³¹P NMR) profiles. J Am Oil Chem Soc 95(7):779–786. https://doi.org/10.1002/aocs.12086

[6]	BlackC, ChevallierOP, CooperKM, HaugheySA, BalogJ, TakatsZ, et al.. Rapid detection and specific identification of offals within minced beef samples utilising ambient mass spectrometry. Sci Rep, 2019, 9(1): 1-9

[7]	BlackC, ChevallierOP, HaugheySA, BalogJ, SteadS, PringleSD, et al.. A real time metabolomic profiling approach to detecting fish fraud using rapid evaporative ionisation mass spectrometry. Metabolomics, 2017, 13(12): 1-13

[8]	BoccardJ, RutledgeDN. A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock omics data fusion. Anal Chim Acta, 2013, 769: 30-39

[9]	BreimanL Classification and regression trees, 2017 Routledge

[10]	BylesjöM, RantalainenM, CloarecO, NicholsonJK, HolmesE, TryggJ. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. J Chemometr, 2006, 20(8–10): 341-351

[11]	CortesC, VapnikV. Support-vector networks. Mach Learn, 1995, 20(3): 273-297

[12]	Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv:1810.04805

[13]	FAO. The state of world fisheries and aquaculture, 2020. FAO, Rome., 2020

[14]	Fix E, Hodges JL (1989) Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev 57(3):238–247. https://doi.org/10.2307/1403797

[15]	GhalyA, RamakrishnanV, BrooksM, BudgeS, DaveD. Fish processing wastes as a potential source of proteins. J Microb Biochem Technol, 2013, 5(4): 107-129

[16]	Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press. Accessed 2 Jan 2025. http://www.deeplearningbook.org

[17]	Gu A, Dao T (2023) Mamba: linear-time sequence modeling with selective state spaces. Preprint at arXiv:2312.00752

[18]	HandDJ, YuK. Idiot’s bayes-not so stupid after all?. Int Stat Rev, 2001, 69(3): 385-398

[19]	HansenLK, SalamonP. Neural network ensembles. IEEE Trans Pattern Anal Mach Intell, 1990, 12(10): 993-1001

[20]	He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

[21]	Hendrycks D, Gimpel K (2016) Gaussian error linear units (GELUs). Preprint at arXiv:1606.08415

[22]	Ho TK (1995) Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. IEEE, pp 278–282. https://doi.org/10.1109/ICDAR.1995.598994

[23]	HochreiterS, SchmidhuberJ. Long short-term memory. Neural Comput, 1997, 9(8): 1735-1780

[24]	JhaSN Rapid detection of food adulterants and contaminants: theory and practice, 2015 Academic Press

[25]	Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint at arXiv:1412.6980

[26]	Kingma DP, Welling M (2013) Auto-encoding variational bayes. Preprint at arXiv:1312.6114

[27]	KleinbaumDG, DietzK, GailM, KleinM, KleinM Logistic regression, 2002 New York Springer

[28]	Köppen M (2000) The curse of dimensionality. In: 5th Online World Conference on Soft Computing in Industrial Applications (WSC5). pp 4–8

[29]	KozaJR, et al. Genetic programming II, 1994 Cambridge MIT Press 17

[30]	LeCunY. Generalization and network design strategies. Connectionism Perspect, 1989, 19: 143-155

[31]	LeCunY, BoserB, DenkerJS, HendersonD, HowardRE, HubbardW TouretzkyD. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 1989 Morgan Kaufmann 1-9

[32]	LeCunY, BoserB, DenkerJS, HendersonD, HowardRE, HubbardW, et al.. Backpropagation applied to handwritten zip code recognition. Neural Comput, 1989, 1(4): 541-51

[33]	LeCunY, BottouL, BengioY, HaffnerP. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86(11): 2278-2324

[34]	Liu ZM, Wang YX, Vaidya S, Ruehle F, Halverson J, Soljačić M et al (2024) KAN: Kolmogorov-Arnold networks. Preprint at arXiv:2404.19756

[35]	Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. Preprint at arXiv:1711.05101

[36]	McCann S, Lowe DG (2012) Local naive bayes nearest neighbor for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3650–3656

[37]	Ministry for Primary Industries (2024) Hoki: New Zealand’s largest fishery. https://www.mpi.govt.nz/fishing-aquaculture/fisheries-management/fish-stock-status/hoki-new-zealands-largest-fishery/. Accessed 6 Jan 2025.

[38]	Morgan N, Bourlard H (1989) Generalization and parameter estimation in feedforward nets: some experiments. In: Proceedings of the 3rd International Conference on Neural Information Processing Systems. MIT Press, pp 630–637

[39]	Panse ML, Phalke SD (2016) World market of omega-3 fatty acids. In: Hegde M et al (eds) Omega-3 fatty acids. Springer, Cham, pp 79–88. https://doi.org/10.1007/978-3-319-40458-5_7

[40]	PardoMÁ, JiménezE, Pérez-VillarrealB. Misdescription incidents in seafood sector. Food Control, 2016, 62: 277-283

[41]	PaulyD, ZellerD. Catch reconstructions reveal that global marine fisheries catches are higher than reported and declining. Nat Commun, 2016, 7(1): 10244

[42]	Pearl H (2016) Melbourne restaurant hunky dory accused of serving catfish to customers instead of dory. In: Daily Mail Australia, May 2016. https://www.dailymail.co.uk/news/article-3611999/Melbourne-restaurant-Hunky-Dory-accused-serving-catfish-customers-instead-dory.html. Accessed 4 Jan 2025.

[43]	PedregosaF, VaroquauxG, GramfortA, MichelV, ThirionB, GriselO, et al.. Scikit-learn: machine learning in Python. J Mach Learn Res, 2011, 12: 2825-2830

[44]	Plant and Food Research (2020) New research to maximise value from seafood resources - plant & food research. https://www.plantandfood.com/en-nz/article/new-research-to-maximise-value-from-seafood-resources. Accessed 2 Jan 2025.

[45]	Ribeiro MT, Singh S, Guestrin C (2016) "Why should I trust you?": explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 1135–1144. https://doi.org/10.1145/2939672.2939778

[46]	Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 618–626. https://doi.org/10.1109/ICCV.2017.74

[47]	SimopoulosAP. Evolutionary aspects of diet: the omega-6/omega-3 ratio and the brain. Mol Neurobiol, 2011, 44(2): 203-215

[48]	SrivastavaN, HintonG, KrizhevskyA, SutskeverI, SalakhutdinovR. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 2014, 15: 1929-1958

[49]	Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

[50]	TranB, XueB, ZhangMJ. Genetic programming for feature construction and selection in classification on high-dimensional data. Memet Comput, 2016, 8(1): 3-15

[51]	TranB, XueB, ZhangMJ. Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognit, 2019, 93: 404-417

[52]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017). pp 1–11

[53]	Wang J, Ji T, Wu YB, Yan H, Gui T, Zhang Q et al (2024) Length generalization of causal transformers without position encoding. Preprint at arXiv:2404.12224

[54]

Wood J, Nguyen BH, Xue B, Zhang MJ, Killeen D (2022) Automated fish classification using unprocessed fatty acid chromatographic data: a machine learning approach. In: Aziz H (eds) AI 2022: Advances in Artificial Intelligence. Lecture notes in computer science, vol 13728. Springer, Cham, pp 516–529. https://doi.org/10.1007/978-3-031-22695-3_36

[55]	Xiong RB, Yang YC, He D, Zheng K, Zheng SX, Xing C et al (2020) On layer normalization in the transformer architecture. In: Proceedings of the 37th International Conference on Machine Learning. PMLR, pp 10524–10533