Document Type
Article
Publication Date
4-16-2024
Abstract
The advent of patient access to complex medical information online has highlighted the need for simplification of biomedical text to improve patient understanding and engagement in taking ownership of their health. However, comprehension of biomedical text remains a difficult task due to the need for domain-specific expertise. We aimed to study the simplification of biomedical text via large language models (LLMs) commonly used for general natural language processing tasks involve text comprehension, summarization, generation, and prediction of new text from prompts. Specifically, we finetuned three variants of large language models to perform substitutions of complex words and word phrases in biomedical text with a related hypernym. The output of the text substitution process using LLMs was evaluated by comparing the pre- and post-substitution texts using four readability metrics and two measures of sentence complexity. A sample of 1,000 biomedical definitions in the National Library of Medicine's Unified Medical Language System (UMLS) was processed with three LLM approaches, and each showed an improvement in readability and sentence complexity after hypernym substitution. Readability scores were translated from a pre-processed collegiate reading level to a post-processed US high-school level. Comparison between the three LLMs showed that the GPT-J-6b approach had the best improvement in measures of sentence complexity. This study demonstrates the merit of hypernym substitution to improve readability of complex biomedical text for the public and highlights the use case for fine-tuning open-access large language models for biomedical natural language processing.
Recommended Citation
Swanson, Karl; He, Shuhan; Calvano, Josh; Chen, David; Telvizian, Talar; Jiang, Lawrence; Chong, Paul; Schwell, Jacob; Mak, Gin; and Lee, Jarone, "Biomedical Text Readability After Hypernym Substitution with Fine-Tuned Large Language Models" (2024). SKMC Student Presentations and Publications. Paper 13.
https://jdc.jefferson.edu/skmcstudentworks/13
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Language
English
Included in
Medicine and Health Sciences Commons, Psycholinguistics and Neurolinguistics Commons, Psychology Commons
Comments
This article is the author's final published version in PLoS Digital Health, Volume 3, Issue 4, April 2024, Article number e0000489.
The published version is available at https://doi.org/10.1371/journal.pdig.0000489 .
Copyright © 2024 Swanson et al.