Automatic text summarization framework for multi-text and multilingual documents using an ensemble of HIN-MELM-AE and improved DePori model
Sunil Upadhyay , Hemant Kumar Soni
International Journal of Systematic Innovation ›› 2025, Vol. 9 ›› Issue (6) : 27 -43.
Automatic text summarization (ATS) has gained increasing significance in recent years due to the rapid growth of textual data across digital platforms. The main objective of ATS is to generate a concise, informative summary from a lengthy document. Multi-document and multilingual summarization has been largely underexplored in previous research. This study presents an improved ensemble learning-based ATS system with slang filtering, using the Hyperfan-IN multilayer extreme learning machine-based autoencoder (HIN-MELM-AE) and the improved Dehghani poor-and-rich optimization algorithm (DePori). The original text undergoes comprehensive preprocessing, after which slang is detected and removed using DePori. Subsequently, the clean text is processed through info-squared C-means clustering, latent Dirichlet allocation-based topic modeling, term frequency-inverse document frequency weighting, and frequent-term extraction. Next, part-of-speech (POS) tagging is performed using a sememe similarity-induced hidden Markov model, and key entities are extracted from the transformed and POS-tagged data. Distilled bidirectional encoder representations from transformers (DBERT) are used to convert these entities into vectors. The final summary is generated through a combination of HIN-MELM-AE, stack autoencoder, variational autoencoder, and DBERT models, followed by cosine similarity calculation, voting-based fusion, re-ranking, and selection of the optimal sentences. Experimental results indicate that the proposed framework achieves superior performance 97.92% of the time, outperforming existing ATS methods.
Hyperfan-IN Multilayer Extreme Learning Machine Auto Encoder / Info-Squared Fuzzy C-Means Clustering / Latent Dirichlet Allocation / Parts of Speech / Sentence Bidirectional Encoder Representations from Transformers / Sememe Similarity-Induced Hidden Markov Model / Term Frequency-Inverse Document Frequency / Variational Auto Encoder
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
El-Kassas, |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
/
| 〈 |
|
〉 |