Automatic text summarization framework for multi-text and multilingual documents using an ensemble of HIN-MELM-AE and improved DePori model

Sunil Upadhyay; Hemant Kumar Soni

doi:10.6977/IJoSI.202512_9(6).0003

International Journal of Systematic Innovation ›› 2025, Vol. 9 ›› Issue (6) :27 -43. DOI: 10.6977/IJoSI.202512_9(6).0003

ARTICLE

research-article

Automatic text summarization framework for multi-text and multilingual documents using an ensemble of HIN-MELM-AE and improved DePori model

Author information +

Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Madhya Pradesh, Gwalior, Madhya Pradesh, India

^* supadhyay@gwa.amity.edu

Sunil Upadhyay received his M.TECH. Degree from the Department of Computer Science and Engineering at RGEC Meerut, Gautam Buddh Technical University, formerly known as UPTU, Lucknow, U.P., India, in 2012. He is pursuing his Ph.D. degree in Computer Science and Engineering from Amity University, Madhya Pradesh. His current research interest includes NLP, Data analytics, Artificial Intelligence, and Machine Learning.

Hemant Kumar Soni completed his M Tech (IT) at Bundelkhand University, Jhansi, Uttar Pradesh, India, and a Doctoral degree in Computer Science and Engineering from Amity University, Madhya Pradesh, Gwalior, India. He has 26 years of teaching experience for UG and PG courses in Computer Science and is presently working as a Professor in the Department of Computer Science and Engineering at Amity University, Madhya Pradesh, India. His research interests are in Natural Language Processing, Data Science, Machine Learning, and Data Mining. He published many research papers in Web of Science, SCI, and Scopus Indexed Journals. His research articles received high levels of citations in Scopus and Google Scholar. He is a Reviewer of many referred journals including IEEE Transactions.

Show less

History +

PDF (1803KB)

Abstract

Automatic text summarization (ATS) has gained increasing significance in recent years due to the rapid growth of textual data across digital platforms. The main objective of ATS is to generate a concise, informative summary from a lengthy document. Multi-document and multilingual summarization has been largely underexplored in previous research. This study presents an improved ensemble learning-based ATS system with slang filtering, using the Hyperfan-IN multilayer extreme learning machine-based autoencoder (HIN-MELM-AE) and the improved Dehghani poor-and-rich optimization algorithm (DePori). The original text undergoes comprehensive preprocessing, after which slang is detected and removed using DePori. Subsequently, the clean text is processed through info-squared C-means clustering, latent Dirichlet allocation-based topic modeling, term frequency-inverse document frequency weighting, and frequent-term extraction. Next, part-of-speech (POS) tagging is performed using a sememe similarity-induced hidden Markov model, and key entities are extracted from the transformed and POS-tagged data. Distilled bidirectional encoder representations from transformers (DBERT) are used to convert these entities into vectors. The final summary is generated through a combination of HIN-MELM-AE, stack autoencoder, variational autoencoder, and DBERT models, followed by cosine similarity calculation, voting-based fusion, re-ranking, and selection of the optimal sentences. Experimental results indicate that the proposed framework achieves superior performance 97.92% of the time, outperforming existing ATS methods.

Keywords

Hyperfan-IN Multilayer Extreme Learning Machine Auto Encoder / Info-Squared Fuzzy C-Means Clustering / Latent Dirichlet Allocation / Parts of Speech / Sentence Bidirectional Encoder Representations from Transformers / Sememe Similarity-Induced Hidden Markov Model / Term Frequency-Inverse Document Frequency / Variational Auto Encoder

Cite this article

Download citation ▾

Sunil Upadhyay, Hemant Kumar Soni. Automatic text summarization framework for multi-text and multilingual documents using an ensemble of HIN-MELM-AE and improved DePori model. International Journal of Systematic Innovation, 2025, 9(6): 27-43 DOI:10.6977/IJoSI.202512_9(6).0003

登录浏览全文

4963

注册一个新账户忘记密码

Funding

None.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Abo-Bakr, H., & Mohamed, S.A. (2023). Automatic multi-documents text summarization by a large-scale sparse multi-objective optimization algorithm. Complex and Intelligent Systems, 9(4), 4629-4644 https://doi.org/10.1007/s40747-023-00967-y

[2]	Alami Merrouni, Z., Frikh, B., & Ouhbi, B. (2023). EXABSUM: A new text summarization approach for generating extractive and abstractive summaries. Journal of Big Data, 10(1), 1-34. https://doi.org/10.1186/s40537-023-00836-y

[3]	Alomari, A., Idris, N., Sabri, A.Q.M., & Alsmadi, I. (2022). Deep reinforcement and transfer learning for abstractive text summarization: A review. Computer Speech and Language, 71, 1-43. https://doi.org/10.1016/j.csl.2021.101276

[4]	Awasthi, I., Gupta, K., Bhogal, P.S., Anand, S.S., & Soni, P.K. (2021). Natural Language Processing (NLP) Based Text Summarization-A Survey. In: Proceedings of the 6th International Conference on Inventive Computation Technologies, p1310-1317. https://doi.org/10.1109/icict50816.2021.9358703

[5]	Belwal, R.C., Rai, S., & Gupta, A. (2021). Text summarization using topic-based vector space model and semantic measure. Information Processing and Management, 58(3), 102536. https://doi.org/10.1016/j.ipm.2021.102536

[6]	Divya, S., & Sripriya, N. (2025). Semantic based extractive document summarization using deep learning model. ICTACT Journal on Soft Computing, 15(4), 3669-3681. https://doi.org/10.21917/ijsc.2025.0509

[7]	El-Kassas, W.S., Salama, C.R., Rafea, A.A., & Mohamed, H.K. (2021). Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165, 1-46. https://doi.org/10.1016/j.eswa.2020.113679

[8]	Gupta, H., & Patel, M. (2021). Method of Text Summarization Using Lsa and Sentence Based Topic Modelling with Bert. In: International Conference on Artificial Intelligence and Smart Systems, p511-517. https://doi.org/10.1109/ICAIS50930.2021.9395976

[9]	Haider, M.M., Hossin, M.A., Mahi, H.R., & Arif, H. (2020). Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm. In: IEEE Region 10 Symposium, TENSYMP 2020, p283-286. https://doi.org/10.1109/TENSYMP50017.2020.9230670

[10]	Hailu, T.T., Yu, J., & Fantaye, T.G. (2020). A framework for word embedding based automatic text summarization and evaluation. Information (Switzerland), 11(2), 1-23. https://doi.org/10.3390/info11020078

[11]	Hark, C., & Karci, A. (2020). Karci summarization: A simple and effective approach for automatic text summarization using Karcı entropy. Information Processing and Management, 57(3), 1-16. https://doi.org/10.1016/j.ipm.2019.102187

[12]	Hassan, A.A., Al-Onazi, B.B., Maashi, M., Darem, A.A., Abunadi, I., & Mahmud, A. (2024). Enhancing extractive text summarization using natural language processing with an optimal deep learning model. AIMS Mathematics, 9(5), 12588-12609.

[13]	Hernandez-Castaneda, A., Garcia-Hernandez, R.A., & Ledeneva, Y. (2023). Toward the automatic generation of an objective function for extractive text summarization. IEEE Access, 11, 51455-51464. https://doi.org/10.1109/access.2023.3279101

[14]	Hosseinabadi, S., Kelarestaghi, M., & Eshghi, F. (2022). ISSE: A new iterative sentence scoring and extraction scheme for automatic text summarization. International Journal of Computers and Applications, 44(6), 1-6. https://doi.org/10.1080/1206212X.2020.1829844

[15]	El-Kassas, Wafaa, S., Cherif, R. Salama, Ahmed, A. Mohamed. Rafea, and Hoda, K.(2020). EdgeSumm: Graph-Based framework for automatic text summarization. Information Processing and Management, 57(6), 102264. https://doi.org/10.1016/j.ipm.2020.102264.

[16]	Jain, M., & Rastogi, H. (2020). Automatic Text Summarization using Soft-Cosine Similarity and Centrality Measures. In: Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology, p. 1021-1028. https://doi.org/10.1109/iceca49313.2020.9297583

[17]	Jiang, J., Zhang, H., Dai, C., Zhao, Q., Feng, H., Ji, Z., et al. (2021). Enhancements of attention-based bidirectional LSTM for hybrid automatic text summarization. IEEE Access, 9, 123660-123671. https://doi.org/10.1109/access.2021.3110143

[18]	Kouris, P., Alexandridis, G., & Stafylopatis, A. (2024). Text summarization based on semantic graphs: An abstract meaning representation graph-to-text deep learning approach. Journal of Big Data, 11(1), 1-39. https://doi.org/10.1186/s40537-024-00950-5

[19]	Lamsiyah, S., Mahdaouy, E., Espinasse, B., & Ouatik, A. (2021). An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings. Expert Systems with Applications, 167, 114152.

[20]	Liu, W., Sun, Y., Yu, B., Wang, H., Peng, Q., Hou, M., et al. (2024). Automatic text summarization method based on improved TextRank algorithm and K-Means clustering. Knowledge-Based Systems, 287, 111447. https://doi.org/10.1016/j.knosys.2024.111447

[21]	Mandale-Jadhav, A. (2025). Text summarization using natural language processing. Journal of Electrical Systems, 20(11), 3410-3417. https://doi.org/10.52783/jes.8095

[22]	Manjari, K.U., Rousha, S., Sumanth, D., & Devi, J.S. (2020). Extractive Text Summarization from Web pages using Selenium and TF-IDF algorithm. In: Proceedings of the 4th International Conference on Trends in Electronics and Informatics, p648-652. https://doi.org/10.1109/icoei48184.2020.9142938

[23]	Muniraj, P., Sabarmathi, K.R., Leelavathi, R., & Balaji, B.S. (2023). HNTSumm: Hybrid text summarization of transliterated news articles. International Journal of Intelligent Networks, 4, 53-61. https://doi.org/10.1016/j.ijin.2023.03.001

[24]	Onah, D.F.O., Pang, E.L.L., & El-Haj, M. (2022). A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling. IEEE International Conference on Big Data (Big Data), 2771-2780. https://doi.org/10.1109/BigData55660.2022.10020259

[25]	Onan, A., & Alhumyani, H.A. (2024a). DeepExtract: Semantic-driven extractive text summarization framework using LLMs and hierarchical positional encoding. Journal of King Saud University - Computer and Information Sciences, 36(8), 1-19. https://doi.org/10.1016/j.jksuci.2024.102178

[26]	Onan, A., & Alhumyani, H.A. (2024b). FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks. Journal of King Saud University - Computer and Information Sciences, 36(6), 102080.

[27]

Payak, A., Rai, S., Shrivastava, K., & Gulwani, R. (2020). Automatic Text Summarization and Keyword Extraction using Natural Language Processing. In: Proceedings of the International Conference on Electronics and Sustainable Communication Systems, p98-103. https://doi.org/10.1109/icesc48915.2020.9155852

[28]	Prasad, C., Kallimani, J.S., Harekal, D., & Sharma, N. (2020). Automatic Text Summarization Model using seq2seq Technique. In: Proceedings of the 4th International Conference on IoT in Social, Mobile, Analytics and Cloud, p599-604. https://doi.org/10.1109/I-SMAC49090.2020.9243572

[29]	Syed, A.A., Gaol, F.L., & Matsuo, T. (2021). A survey of the state-of-the-art models in neural abstractive text summarization. IEEE Access, 9, 13248-13265. https://doi.org/10.1109/access.2021.3052783

[30]	Tomer, M., & Kumar, M. (2022). Multi-document extractive text summarization based on firefly algorithm. Journal of King Saud University - Computer and Information Sciences, 34(8), 6057-6065. https://doi.org/10.1016/j.jksuci.2021.04.004

[31]	Wahab, M.H.H., Ali, N.H., Hamid, N.A.W.A., Subramaniam, S.K., Latip, R., & Othman, M.(2024). A review on optimization-based automatic text summarization approach. IEEE Access, 12, 4892-4909. https://doi.org/10.1109/access.2023.3348075

[32]	Widyassari, A.P., Rustad, S., Shidik, G.F., Noersasongko, E., Syukur, A., Affandy, A., et al. (2022). Review of automatic text summarization techniques and methods. Journal of King Saud University - Computer and Information Sciences, 34(4), 1029-1046. https://doi.org/10.1016/j.jksuci.2020.05.006

[33]	Yadav, D., Katna, R., Yadav, A.K., & Morato, J. (2022). Feature based automatic text summarization methods: A comprehensive state-of-the-art survey. IEEE Access, 10, 133981-134003. https://doi.org/10.1109/access.2022.3231016

[34]	Yang, J., Wang, H., Qin, H., Sun, Y., Khan, A.A., Por, L.Y., et al. (2025). A generative adversarial network-based extractive text summarization using transductive and reinforcement learning. IEEE Access, 13, 65490-65509. https://doi.org/10.1109/access.2025.3558266

[35]	Zhang, M., Zhou, G., Yu, W., & Liu, W. (2020). A Survey of Automatic Text Summarization Technology Based on Deep Learning. In: Proceedings - International Conference on Artificial Intelligence and Computer Engineering, p211-217. https://doi.org/10.1109/icaice51518.2020.00047

[36]	Zhong, J., & Wang, Z. (2022). MTL-DAS: Automatic text summarization for domain adaptation. Computational Intelligence and Neuroscience, 2022, 1-10. https://doi.org/10.1155/2022/4851828