Hybrid Approaches for Advanced Medical Text Summarization: Combining TF-IDF, BERT, and Seq2Seq Models

Matimpati Chitra Rupa; Kasarapu  Ramani

doi:10.26877/reh2an46

Authors

Matimpati Chitra Rupa Jawaharlal Nehru Technological University Anantapur
Kasarapu Ramani Mohan babu University

DOI:

https://doi.org/10.26877/reh2an46

Keywords:

Medical NLP, Hybrid Summarization, Text Mining, Extractive Summary, Abstractive summary, AutoModelForSeq2SeqLM, BERT, BART-Large-CNN, Text Rank

Abstract

Clinicians, researchers, and healthcare professionals are confronted with the challenge of efficiently extracting relevant knowledge from vast amounts of textual data. Medical text summarization emerges as a crucial tool to address this challenge by condensing lengthy medical documents into concise, informative summaries. A comprehensive hybrid approach is proposed to address the challenges in medical text summarization by combining both extractive and abstractive methods, by integrating Term Frequency-Inverse Document Frequency (TF-IDF) of Natural Language Processing (NLP) and AutoModelForSeq2SeqLM of Large Language Model. The performance this proposed approach is compared with existing methods such as Bidirectional Encoder Representations from Transformers (BERT), Text Rank, K-means, face book BART-Large-CNN, GPT2 using ROUGE-1, ROUGE-2 and ROUGE-L metrics. The experimental results show that hybrid approach is outperforming other existing methods. Medical text summarization helps extract important information from large medical documents. This work combines two methods, TF-IDF and AutoModelForSeq2SeqLM, to create better summaries, performing better than existing techniques like BERT and GPT-2 based on ROUGE scores.

Author Biographies

Matimpati Chitra Rupa, Jawaharlal Nehru Technological University Anantapur

Department of CSE, JNTU Anantapur, Andhra Pradesh, India.
Kasarapu Ramani, Mohan babu University

Department of Data Science, Mohan babu University, Andhra Pradesh, India.

References

[1] Sudarshan, R., Sasikala, D. and Kalavathi, S., “Advancing Clinical Text Summarization through Extractive Methods using BERT-Based Models on the NBME Dataset”in 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS), IEEE, pp. 1288-1294, 2023.

[2] Tristam Dacayan, Daniel Ojeda, “Summarizing Behavioral Electronic Health Records using a Natural Language Processing Pipeline”, International Conference on Computational Science and Computational Intelligence (CSCI),pp,1635-1639, IEEE, 2022.

[3] Kasarapu Ramani, E.Lalitha, Text Summarization of Medical Documents using Abstractive Techniques,2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC),IEEE,2023.

[4] Deepika, S. and Shridevi, S., “Extractive Text Summarization for COVID-19 Medical Records” in innovations in Power and Advanced Computing Technologies (i-PACT), IEEE,2021.

[5] Mukesh Kumar Rohil, Varun Magotra, “An exploratory study of automatic text summarization in biomedical and healthcare domain”, Healthcare Analytics, Elsevier, 2022.

[6] Abhishek Kuber, Soham Kulthe, “Extensive Study of Automatic Text Summarization on Biomedical Texts”, 6th International Conference on Computing, Communication, Control and Automation (ICCUBEA), IEEE, 2022.

[7] Loredana Caruccio, Stefano Cirillo, “Can ChatGPT provide intelligent diagnoses? A comparative study between predictive models and ChatGPT to define a new medical diagnostic bot”, Expert Systems with Applications, Elvesier, 2023.

[8] Deepika S, Lakshmi Krishna N, “Extractive Text Summarization for COVID-19 Medical Records”, Innovations in Power and Advanced Computing Technologies (i-Pact), IEEE, 2021.

[9] Surabhi Dattaa, Elmer V. Bernstam, “A frame semantic overview of NLP-based information extraction for cancer related EHR notes”, Journal of Biomedical Informatics, Elsevier, 2019.

[10] Snehal Sameer Patil, Vaishnavi Moorthy, “Extraction of Unstructured Electronic Healthcare Records using Natural Language Processing”, International Conference on Networking and Communication (ICNWC), IEEE, 2023.

[11] Yiheng Liu, Tianle Han, Siyuan Ma, Jiayue Zhang, Yuanyuan Yang, Jiaming Tian, Hao He, Antong Li, Mengshen He, Zhengliang Liu, Zihao Wu, Dajiang Zhu, Xiang Li, Ning Qiang, Dingang Shen, Tianming Liu, and Bao Ge, “Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models”, Shanghai, China, IEEE 2023.

[12] Mayank Soni, Vincent Wade, “Comparing Abstractive Summaries Generated by ChatGPT to Real Summaries Through Blinded Reviewers and Text Classification Algorithms”, ADAPT Centre, Trinity College Dublin, ResearchGate, 2023.

[13] Zhu, Y., Yang, X., Wu, Y. and Zhang, W, “Leveraging Summary Guidance on Medical Report Summarization. IEEE,2023.

[14] Kasarapu Ramani, K. Bhavana,” Extractive Text Summarization through k-means, LSA, and Text Rank, International Conference on Wireless Communications Signal Processing and Networking (WiSPNET),IEEE,2023.

[15] Pooja Vinod, Seema Safar Fine-tuning the BERTSUMEXT model for Clinical Report Summarization2020 International Conference for Emerging Technology (INCET) Belgaum, India. Jun 5-7, 2020.

[16] Zhimeng Luo, Yuelyu Ji, “Towards Accurate and Clinically Meaningful Summarization of Electronic Health Record Notes: A Guide Approach”, EMBS International Conference on biomedical and Health Informatics (BHI), IEEE, 2023.

[17] Azam, M., Khalid, S., Almutairi, S., Khattak, H. A., Namoun, A., Ali, A., & Bilal, H. S. M. (2025). Current Trends and Advances in Extractive Text Summarization: A Comprehensive Review. IEEE Access.

[18] Shahade, A. K., & Deshmukh, P. V. (2025). A Unified Approach to Text Summarization: Classical, Machine Learning, and Deep Learning Methods. Ingénierie des Systèmes d'Information, 30(1).

[19] Rhazzafe, S., Caraffini, F., Colreavy-Donnelly, S., Dhassi, Y., Kuhn, S., & Nikolov, N. S. (2024). Hybrid summarization of medical records for predicting length of stay in the intensive care unit. Applied Sciences, 14(13), 5809.

[20] Agilandeeswari, L., Dagar, A., Deepthi, A., & Arangasakthivel, R. (2023, December). Automatic Text Summarization for Medical Dataset-An Analysis. In International Conference on Intelligent Systems Design and Applications (pp. 336-352). Cham: Springer Nature Switzerland.

Hybrid Approaches for Advanced Medical Text Summarization: Combining TF-IDF, BERT, and Seq2Seq Models

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

Menu

tools

Sertifikat

tools