Improving Medical Abstract Classification Using PEFT-LoRA Fine-Tuned Large and Small Language Models
DOI:
https://doi.org/10.47941/ijce.2374Keywords:
Machine Learning, Natural Language Processing, Medical Text Analysis, Language Models.Abstract
Designing intelligent systems to classify text in the medical domain is a challenging task. There is a shortage of openly available medical datasets (due to HIPPA-related strict regulations on protected health information for patients). In this paper, we explore the application of Open Source Medical LLMs (such as Meditron LLM), generic Large Language Models (such as LLAMA2), and Small Language Models (such as Phi2) on medical text classification (medical abstract dataset). We show that PEFT approaches such as LoRA can perform very well in classifying medical text, which involves interpreting patient conditions and symptoms and determining what medical problems the patients have. These approaches (based on Large and Small Language Models) have outperformed the current state of the results on medical abstracts corpus. In addition to medical LLMs, the open-source generic LLMs can be adapted to solving classification tasks on medical text and perform nearly as well as the specialized medical LLMs. SLMs can be a serious contender for solving domain-specific classification tasks (e.g., medical literature). This shows that carefully selecting the training data and fine-tuning positively impacts classification accuracy, precision, and recall. Generic Language Models such as LLAMA2 (LLM) and Phi2 (SLM) weren’t specifically trained with medical text. Medical LLMs such as Meditron outperform LLAMA2 and Phi2 in precision and accuracy. This is quite evident as Meditron was originally trained on medical text. The (micro averaged) F1 score for the fine-tuned Meditron model is 0.64. This is superior compared to fined-tuned LLAMA2 of 0.58 and Phi2 of 0.62. We see that Phi2 can outperform LLAMA2 with fewer number of parameters. The approaches used in this work can be extended to other medical text classification problems in the medical domain.
Downloads
References
Gasmi, K. (2022, September). Improving bert-based model for medical text classification with an optimization algorithm. In International Conference on Computational Collective Intelligence (pp. 101-111). Cham: Springer International Publishing.
Khadhraoui, M., Bellaaj, H., Ammar, M. B., Hamam, H., & Jmaiel, M. (2022). Survey of BERT-base models for scientific text classification: COVID-19 case study. Applied Sciences, 12(6), 2891.
Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., ... & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1), 1-23.
Mascio, A., Kraljevic, Z., Bean, D., Dobson, R., Stewart, R., Bendayan, R., & Roberts, A. (2020). Comparative analysis of text classification approaches in electronic health records. arXiv preprint arXiv:2005.06624.
Qasim, R., Bangyal, W. H., Alqarni, M. A., & Ali Almazroi, A. (2022). A Fine‐Tuned BERT‐Based Transfer Learning Approach for Text Classification. Journal of healthcare engineering, 2022(1), 3498123.
Lenivtceva, I., Slasten, E., Kashina, M., & Kopanitsa, G. (2020). Applicability of machine learning methods to multi-label medical text classification. In Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, June 3–5, 2020, Proceedings, Part IV 20 (pp. 509-522). Springer International Publishing.
Gema, A. P., Minervini, P., Daines, L., Hope, T., & Alex, B. (2023). Parameter-efficient fine-tuning of llama for the clinical domain. arXiv preprint arXiv:2307.03042.
Xu, L., Xie, H., Qin, S. Z. J., Tao, X., & Wang, F. L. (2023). Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. arXiv preprint arXiv:2312.12148.
Hu, Z., Wang, L., Lan, Y., Xu, W., Lim, E. P., Bing, L., ... & Lee, R. K. W. (2023). Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933.
Clarke, C., Heng, Y., Tang, L., & Mars, J. (2024). PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization. arXiv preprint arXiv:2407.18078.
Wu, C., Lin, W., Zhang, X., Zhang, Y., Xie, W., & Wang, Y. (2024). PMC-LLaMA: toward building open-source language models for medicine. Journal of the American Medical Informatics Association, ocae045.
Zhu, Y., Zhu, M., Liu, N., Ou, Z., Mou, X., & Tang, J. (2024). LLaVA-$phi $: Efficient Multi-Modal Assistant with Small Language Model. arXiv preprint arXiv:2401.02330.
Valade, F. (2024). Accelerating Large Language Model Inference with Self-Supervised Early Exits. arXiv preprint arXiv:2407.21082.
Microsoft Research. Phi-2: The surprising power of small language models. Microsoft Research Blog, December 2023.
Pu, G., Jain, A., Yin, J., & Kaplan, R. (2023). Empirical analysis of the strengths and weaknesses of PEFT techniques for LLMs. arXiv preprint arXiv:2304.14999.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
Han, Z., Gao, C., Liu, J., & Zhang, S. Q. (2024). Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608.
Schopf, T., Braun, D., & Matthes, F. (2022, December). Evaluating unsupervised text classification: zero-shot and similarity-based approaches. In Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval (pp. 6-15).
Freestone, M., & Santu, S. K. K. (2024). Word Embeddings Revisited: Do LLMs Offer Something New?. arXiv preprint arXiv:2402.11094.
National Library of Medicine. PubMed. https://pubmed.ncbi.nlm.nih.gov/. Accessed October 8, 2024.
World Health Organization. Clinical guidelines. https://www.who.int/guidelines. Accessed October 8, 2024.
García-Ferrero, I., Agerri, R., Salazar, A. A., Cabrio, E., de la Iglesia, I., Lavelli, A., ... & Zaninello, A. (2024). Medical mT5: an open-source multilingual text-to-text LLM for the medical domain. arXiv preprint arXiv:2404.07613.
Song, Y., Zhang, J., Tian, Z., Yang, Y., Huang, M., & Li, D. (2024). LLM-based privacy data augmentation guided by knowledge distillation with a distribution tutor for medical text classification. arXiv preprint arXiv:2402.16515.
Chen, Z., Cano, A. H., Romanou, A., Bonnet, A., Matoba, K., Salvi, F., ... & Bosselut, A. (2023). Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., ... & Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
Jin, D., Pan, E., Oufattole, N., Weng, W. H., Fang, H., & Szolovits, P. (2021). What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14), 6421.
Pal, A., Umapathi, L. K., & Sankarasubbu, M. (2022, April). Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Conference on health, inference, and learning (pp. 248-260). PMLR.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Dr. Rahul Kavi, Jeevan Anne
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.