Document Type

Article

Publication Date

12-18-2024

Comments

This article, first published by Frontiers Media, is the author's final published version in Frontiers in Oncology, Volume 14, 2024, Article number 1513608.

The published version is available at 2024 Huang, Yang, Huang, Zeng, Liu, Luo, Lyshchik and Lu.

Copyright © 2024 Huang, Yang, Huang, Zeng, Liu, Luo, Lyshchik and Lu

Abstract

BACKGROUND: Large language models (LLMs) offer opportunities to enhance radiological applications, but their performance in handling complex tasks remains insufficiently investigated.

PURPOSE: To evaluate the performance of LLMs integrated with Contrast-enhanced Ultrasound Liver Imaging Reporting and Data System (CEUS LI-RADS) in diagnosing small (≤20mm) hepatocellular carcinoma (sHCC) in high-risk patients.

MATERIALS AND METHODS: From November 2014 to December 2023, high-risk HCC patients with untreated small (≤20mm) focal liver lesions (sFLLs), were included in this retrospective study. ChatGPT-4.0, ChatGPT-4o, ChatGPT-4o mini, and Google Gemini were integrated with imaging features from structured CEUS LI-RADS reports to assess their diagnostic performance for sHCC. The diagnostic efficacy of LLMs for small HCC were compared using McNemar test.

RESULTS: The final population consisted of 403 high-risk patients (52 years ± 11, 323 men). ChatGPT-4.0 and ChatGPT-4o demonstrated substantial to almost perfect intra-agreement for CEUS LI-RADS categorization (κ values: 0.76-1.0 and 0.7-0.94, respectively), outperforming ChatGPT-4o mini (κ values: 0.51-0.72) and Google Gemini (κ values: -0.04-0.47). ChatGPT-4.0 had higher sensitivity in detecting sHCC than ChatGPT-4o (83%-89% vs. 70%-78%, p < 0.02) with comparable specificity (76%-90% vs. 83%-86%, p > 0.05). Compared to human readers, ChatGPT-4.0 showed superior sensitivity (83%-89% vs. 63%-78%, p < 0.004) and comparable specificity (76%-90% vs. 90%-95%, p > 0.05) in diagnosing sHCC.

CONCLUSION: LLM integrated with CEUS LI-RADS offers potential tool in diagnosing sHCC for high-risk patients. ChatGPT-4.0 demonstrated satisfactory consistency in CEUS LI-RADS categorization, offering higher sensitivity in diagnosing sHCC while maintaining comparable specificity to that of human readers.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Table1(1).docx (231 kB)

PubMed ID

39744002

Language

English

Share

COinS