Files

Download

Download Full Text (2.7 MB)

Download Narration (9.6 MB)

Description

Undergraduate medical education (UME) encounters several challenges, including the need for adequate practice items replicating standardized medical exams like the United States Medical Licensing Examination (USMLE).1 This demand suggests a role for innovative, efficient approaches to item generation. Artificial intelligence (AI) large language models (LLM), like those employed by ChatGPT, present an attractive solution.

Previous authors have investigated ChatGPT’s ability to “pass” high-stakes assessments, such as the USMLE,2–4 the ophthalmology and radiology board examinations,5,6 and other nations’ certification examinations.7 Less literature has been published on ChatGPT’s ability to construct vignette-based single best answer multiple choice items similar to those employed by these assessments,8–10 and these studies employ broad categories of item flaws and scant comparative psychometric analysis of item performance.

This study investigated the utility and feasibility of ChatGPT as an author of USMLE-style questions, with the following research questions:

  1. Once fine-tuned, can ChatGPT successfully generate factually accurate questions that adhere to predetermined style and content guidelines?
  2. How efficient is ChatGPT at writing questions, compared to human subject matter experts?
  3. Do the psychometric characteristics of ChatGPT’s items differ from human-written ones?

Publication Date

3-30-2025

Keywords

artificial intelligence, AI, ChatGPT, assessment, USMLE

Disciplines

Medical Education | Medicine and Health Sciences | Pathology

Comments

Presented at the Anatomy Connected 2025 (American Association for Anatomy, AAA).

“Coherent Nonsense”: Lessons Learned from Utilizing ChatGPT for USMLE-Style Anatomy and Pathology Questions

Share

COinS