MAL: multilevel active learning with BERT for Chinese textual affective structure analysis
Shufeng XIONG , Guipei ZHANG , Xiaobo FAN , Wenjie TIAN , Lei XI , Hebing LIU , Haiping SI
Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (6) : 833 -846.
MAL: multilevel active learning with BERT for Chinese textual affective structure analysis
Chinese textual affective structure analysis (CTASA) is a sequence labeling task that often relies on supervised deep learning methods. However, acquiring a large annotated dataset for training can be costly and time-consuming. Active learning offers a solution by selecting the most valuable samples to reduce labeling costs. Previous approaches focused on uncertainty or diversity but faced challenges such as biased models or selecting insignificant samples. To address these issues, multilevel active learning (MAL) is introduced, which leverages deep textual information at both the sentence and word levels, taking into account the complex structure of the Chinese language. By integrating the sentence-level features extracted from bidirectional encoder representations from Transformers (BERT) embeddings and the word-level probability distributions obtained through a conditional random field (CRF) model, MAL comprehensively captures the Chinese textual affective structure (CTAS). Experimental results demonstrate that MAL significantly reduces annotation costs by approximately 70% and achieves more consistent performance compared to baseline methods.
Sentiment analysis / Sequence labeling / Active learning (AL) / Bidirectional encoder representations from Transformers (BERT)
Zhejiang University Press
/
| 〈 |
|
〉 |