Document Type

Article

Publication Date

1-1-2026

Comments

This article is the author's final published version in JVS-Vascular Insights, Volume 4, 2026, Article Number 100318.

The published version is available at https://doi.org/10.1016/j.jvsvi.2025.100318. Copyright © 2025 The Author(s).

Abstract

Objective: Patient education materials frequently exceed the recommended sixth-grade reading level. Although large language models (LLMs) have shown inconsistent accuracy in medical query responses, they have demonstrated promise in simplifying complex text. This capability has not yet been studied in vascular patient education materials. This study evaluates whether ChatGPT-4o and Gemini 1.5 Pro can improve the readability of Society for Vascular Surgery (SVS) patient education flyers. Methods: SVS health flyers were selected based on five common vascular conditions: abdominal aortic aneurysm, carotid artery disease, deep vein thrombosis, peripheral artery disease, and varicose veins. Each flyer was submitted to ChatGPT-4o and Gemini 1.5 Pro, which generated simplified versions using structured Extensible Markup Language prompts to guide consistent editing. Vascular surgeons, who were blinded to the source of each flyer, independently scored the original and LLM-modified flyers on accuracy, comprehensiveness, and understandability using a 0 to 10 Likert scale. Readability was assessed using the Average Reading Level Consensus tool, and textual features—including word count, sentence count, syllables per word, and percentage of complex words—were quantified. Paired t-tests were used to analyze differences in readability scores. Analysis of variance with Tukey honestly significant difference post hoc testing was used to assess textual characteristics. Results: The original SVS flyers had an average reading grade level of 10.61 (standard deviation [SD], 0.88). Gemini and ChatGPT-4o significantly reduced the reading level to 8.18 (SD, 1.24; P = .012) and 8.37 (SD, 0.88; P = .00013), respectively. SVS flyers averaged 605 words, 29.8 sentences, 1.7 syllables per word, and 20.4% complex words. Both LLMs significantly reduced syllables per word (Gemini: 1.52; P < .0001; ChatGPT: 1.53; P < .0001) and the proportion of complex words (Gemini: 12.7%; P < .0001; ChatGPT: 13.6%; P < .0001). There were no significant differences between the Gemini and ChatGPT outputs in readability or textual metrics. Physician scores for accuracy, comprehensiveness, and understandability showed no significant differences between SVS and either LLM model, nor between the two LLMs. Conclusions: LLMs significantly improved the readability of SVS patient education materials by approximately two grade levels without compromising content accuracy. These findings support the use of LLMs to enhance the accessibility of medical information when grounded in trusted source material, rather than relying on unprompted content generation.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Language

English

Included in

Surgery Commons

Share

COinS