Selecting text classification model through maximizing posterior evidence over informative sub-space
Zhiwei SUN , Jun BAI , Zhuofan CHEN , Chen LI , Wenge RONG , Zhang XIONG
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912377
Selecting text classification model through maximizing posterior evidence over informative sub-space
Text classification is a pivotal task in natural language understanding, and its performance has seen remarkable advancements with the rise of Pre-trained Language Models (PLMs). Recently, the proliferation of PLMs has made it increasingly challenging to choose the most suitable model for a given dataset. Since fine-tuning the sheer number of models is impractical, Transferability Estimation (TE) has become a promising solution to efficient model selection. Unlike current TE methods that focus solely on fixed and hard class assignments to evaluate the quality of model-encoded features, our approach further takes into account the inter-sample and inter-model variations represented by soft class assignments. We achieve this by utilizing class embeddings to predict posterior class assignments, with the logarithm of the maximum posterior evidence serving as the transferability score. Moreover, we found that the informative sub-space of the dataset can lead to more accurate calculation of soft class assignments, where we achieve efficient annotation of informative samples by eliciting the powerful judging ability of large language model. The resulting posterior evidence over the informative sub-space, LogIPE, enables us to capture subtle differences between models, enhancing the accuracy of model selection and validated by extensive experiments conducted on a wide range of text classification datasets as well as candidate PLMs.
text classification / model selection / posterior evidence / informative sub-space
| [1] |
Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171–4186 |
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
Bai J, Chen Z, Li Z, Hong H, Zhang J, Li C, Lin C, Rong W. Leveraging estimated transferability over human intuition for model selection in text ranking. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 12356–12374 |
| [9] |
|
| [10] |
|
| [11] |
Pándy M, Agostinelli A, Uijlings J, Ferrari V, Mensink T. Transferability estimation using Bhattacharyya class separability. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 9162–9172 |
| [12] |
|
| [13] |
|
| [14] |
Su T, Zhang J, Wang G, Liu X. Self-supervised learning with explorative knowledge distillation. In: Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. 2023, 1–5 |
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
Li Z, Zhang J, Yin C, Ouyang Y, Rong W. ProCQA: a large-scale community-based programming question answering dataset for code search. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 13057–13067 |
| [28] |
|
| [29] |
|
| [30] |
Liu N F, Gardner M, Belinkov Y, Peters M E, Smith N A. Linguistic knowledge and transferability of contextual representations. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 1073–1094 |
| [31] |
|
| [32] |
Wang L, Ma C, Feng X, Zhang Z, Yang H, Zhang J, Chen Z, Tang J, Chen X, Lin Y, Xin Zhao W, Wei Z, Wen J. A survey on large language model based autonomous agents. Frontiers of Computer Science, 2024, 18(6): 186345 |
| [33] |
|
| [34] |
Tang J, Song R, Huang Y, Gao S, Yu Z. Semantic-aware entity alignment for low resource language knowledge graph. Frontiers of Computer Science, 2024, 18(4): 184319 |
| [35] |
Li X, Hu Z, Ge Y, Shan Y, Duan L Y. Exploring model transferability through the lens of potential energy. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 5406–5415 |
| [36] |
Tran A, Nguyen C, Hassner T. Transferability and hardness of supervised classification tasks. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 1395–1405 |
| [37] |
|
| [38] |
Li Y, Jia X, Sang R, Zhu Y, Green B, Wang L, Gong B. Ranking neural checkpoints. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 2662–2672 |
| [39] |
Bao Y, Li Y, Huang S L, Zhang L, Zheng L, Zamir A, Guibas L J. An information-theoretic approach to transferability in task transfer learning. In: Proceedings of 2019 IEEE International Conference on Image Processing. 2019, 2309–2313 |
| [40] |
Ibrahim S, Ponomareva N, Mazumder R. Newer is not always better: rethinking transferability metrics, their peculiarities, stability and performance. In: Proceedings of 2022 European Conference on Machine Learning and Knowledge Discovery in Databases. 2022, 693–709 |
| [41] |
Kumari N, Zhang R, Shechtman E, Zhu J Y. Ensembling off-the-shelf models for GAN training. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 10641–10652 |
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
Gholami M, Akbari M, Wang X, Kamranian B, Zhang Y. ETran: energy-based transferability estimation. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 18567–18576 |
| [47] |
|
| [48] |
|
| [49] |
Li B, Zhou H, He J, Wang M, Yang Y, Li L. On the sentence embeddings from pre-trained language models. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 9119–9130 |
| [50] |
|
| [51] |
|
| [52] |
Wang Z, Wu Y. Investigating the effectiveness of whitening post-processing methods on modifying LLMs representations. In: Proceedings of the 35th IEEE International Conference on Tools with Artificial Intelligence. 2023, 813–820 |
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
Ait A, Izquierdo J L C, Cabot J. HFCommunity: a tool to analyze the hugging face hub community. In: Proceedings of 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering. 2023, 728–732 |
| [60] |
|
| [61] |
Chen R, Hao B, Paschalidis I C. Distributionally robust multiclass classification and applications in deep image classifiers. In: Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. 2023, 1–5 |
Higher Education Press
/
| 〈 |
|
〉 |