Identification of Cenozoic Ostracods in the Qaidam Basin Using Convolutional and Transformer-Based Neural Networks
Wenqiang Tang , Hanting Zhong , Zhisong Cao , Kunyu Wu , Dangpeng Xi , Xingxing Zhang , Ping Yang , Yuxuan Zhou , Chao Ma
Journal of Earth Science ›› 2026, Vol. 37 ›› Issue (3) : 968 -984.
Microfossils play a crucial role in biostratigraphy and paleoenvironmental reconstructions, as the first appearance datum (FAD) and last appearance datum (LAD) of specific microfossils enable precise stratigraphic correlations and age determinations. However, traditional identification methods are often time-intensive and heavily dependent on expert knowledge. To overcome these limitations, we propose a dual-path deep learning model, MicroViT, which integrates convolutional neural networks (CNNs) and vision transformers (ViTs) to automate the identification of Cenozoic ostracods (Microlimnocythere, Cyprideis, Qaidamocythere, Hemicyprinotus, Qaibeigouia, Austrocypris, and Candoniella) from the Qaidam Basin. MicroViT achieves an accuracy of 95.34%, demonstrating superior performance across all classification metrics. Furthermore, we utilized Gradient-weighted Class Activation Mapping (Grad-CAM) to visualize the decision-making process of the model, revealing that DL models focus on morphological features such as reticulation and honeycomb-like spots. We also investigated the potential for extending this approach to other microfossil groups, such as charophytes and sporopollen, as well as to diverse ostracod populations. These results highlight the significant potential of deep learning techniques for rapid and accurate microfossil classification, offering promising applications in micropaleontology and stratigraphic studies.
Ostracods identification / deep learning / transformer-based neural networks / convolutional neural networks
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al., 2020. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929. https://arxiv.org/abs/2010.11929 |
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
Howard, A. G., Zhu, M. L., Chen, B., et al., 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861. https://arxiv.org/abs/1704.04861 |
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
Miele, V., Dussert, G., Cucchi, T., et al., 2020. Deep Learning for Species Identification of Modern and Fossil Rodent Molars. BioRxiv, 2020–08. https://doi.org/10.1101/2020.08.20.259176 |
| [45] |
Perez, L., Wang, J., 2017. The Effectiveness of Data Augmentation in Image Classification Using Deep Learning. arXiv: 1712.04621. https://arxiv.org/abs/1712.04621 |
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
Simonyan, K., Zisserman, A., 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556. https://arxiv.org/abs/1409.1556 |
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
Wei, J., Tay, Y., Bommasani, R., et al., 2022. Emergent Abilities of Large Language Models. arXiv preprint:2206.07682 |
| [77] |
|
| [78] |
|
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
|
| [87] |
|
China University of Geosciences (Wuhan) and Springer-Verlag GmbH Germany, Part of Springer Nature
/
| 〈 |
|
〉 |