SKMC Student Presentations and Publications

Assessment of Correctness, Content Omission, and Risk of Harm in Large Language Model Responses to Ophthalmology Continuing Medical Education Questions

Document Type

Article

Publication Date

5-1-2026

Comments

This article is the author's final published version in Ophthalmology Science, Volume 6, Issue 5, May 2026, Article Number 101130.

The published version is available at https://doi.org/10.1016/j.xops.2026.101130. Copyright © 2026 American Academy of Ophthalmology. Published by Elsevier Inc. This is an open access article under the CC BY license. http://creativecommons.org/licenses/by/4.0/

Abstract

PURPOSE: To evaluate the accuracy and prose responses of 2 large language models (LLMs) to ophthalmology continuing medical education questions.

DESIGN: Question prompts and multiple choice (MC) answer options were input into the 2 LLMs, and responses were analyzed for accuracy and assessed for evidence of correctness, completeness, bias, and potential harm using a previously reported standardized rubric.

SUBJECTS: Basic and Clinical Science Course questions and MC answer options from the American Academy of Ophthalmology question bank were used as inputs into the 2 LLMs (ChatGPT-4 and Google Vertex's Gemini Pro 1.5).

METHODS: The MC responses were assessed for accuracy in comparison to the question bank's designated corrected answer. The free-text prose responses from the 2 LLMs were assessed by 3 board-certified ophthalmologists.

MAIN OUTCOME MEASURES: Accuracy and assessment of correct and incorrect reasoning, inappropriate content, missing content, possibility of bias, or possibility of harm.

RESULTS: The MC accuracy rates of ChatGPT-4 and Gemini Pro 1.5 were 82.5% (99/120) and 49.2% (59/120) (

CONCLUSIONS: Though ChatGPT-4 was able to perform well in MC accuracy, both LLMs contained inaccuracies, missing content, and material that could lead to harm in their prose responses. Our findings suggest that provider-guided auditing in ophthalmology is required before the use of the technology in direct patient-facing settings.

FINANCIAL DISCLOSURES: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Recommended Citation

Chen, Jacqueline L.; Lu, Amanda J.; Verma, Rohan; Wang, Li; Koch, Douglas D.; and Chen, Allison J., "Assessment of Correctness, Content Omission, and Risk of Harm in Large Language Model Responses to Ophthalmology Continuing Medical Education Questions" (2026). SKMC Student Presentations and Publications. Paper 87.
https://jdc.jefferson.edu/skmcstudentworks/87

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Language

English

Download

Available for download on Friday, May 01, 2026

Find in your library

Included in

Ophthalmology Commons

COinS

SKMC Student Presentations and Publications

Assessment of Correctness, Content Omission, and Risk of Harm in Large Language Model Responses to Ophthalmology Continuing Medical Education Questions

Document Type

Publication Date

Comments

Abstract

Recommended Citation

Creative Commons License

Language

Included in

Browse

Search

Author Corner

Questions ?

SKMC Student Presentations and Publications

Assessment of Correctness, Content Omission, and Risk of Harm in Large Language Model Responses to Ophthalmology Continuing Medical Education Questions

Authors

Document Type

Publication Date

Comments

Abstract

Recommended Citation

Creative Commons License

Language

Included in

Share

Browse

Search

Author Corner

Questions ?