A Multi-Agent Large Language Model Framework to Automatically Assess Performance of a Clinical AI Triage Tool

Document Type

Article

Publication Date

5-25-2026

Comments

This article is the author’s final published version in npj Health Systems, Volume 3, Issue 1, 2026, Article number 35.

Abstract

Radiology reports can be used as a surrogate for performance of clinical AI tools. Radiology reports were analyzed by an ensemble of eight open-source LLM models and a internal version of GPT-4o using a single multi-shot prompt that assessed for presence of ICH. Performance of the open-source models, consensus of models and GPT-4o were compared to human report review. Three ideal consensus LLM ensembles were tested for rating the performance of the triage tool. The capability of each LLM varied. The highest AUC performance was achieved with llama3.3:70b and GPT-4o. Using MCC the ideal combination of LLMs were: Full-9 Ensemble, Top-3 Ensemble and consensus. No statistically significant differences were observed between Top-3, Full-9, and consensus. An ensemble of open-source LLMs provides a more consistent and reliable method to derive a ground truth retrospective evaluation of a clinical AI triage tool over a single LLM alone.

Recommended Citation

Flanders, Adam; Peng, Yifan; Prevedello, Luciano; Ball, Robyn; Colak, Errol; Menon, Prahlad; Shih, George; Lin, Hui-Ming; and Lakhani, Paras, "A Multi-Agent Large Language Model Framework to Automatically Assess Performance of a Clinical AI Triage Tool" (2026). Department of Radiology Faculty Papers. Paper 196.
https://jdc.jefferson.edu/radiologyfp/196

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Supplementary information.docx (14 kB)

PubMed ID

42245913

Language

English

Department of Radiology Faculty Papers

A Multi-Agent Large Language Model Framework to Automatically Assess Performance of a Clinical AI Triage Tool

Document Type

Publication Date

Comments

Abstract

Recommended Citation

Creative Commons License

PubMed ID

Language

Included in

Browse

Search

Author Corner

Resources

Questions ?

Department of Radiology Faculty Papers

A Multi-Agent Large Language Model Framework to Automatically Assess Performance of a Clinical AI Triage Tool

Authors

Document Type

Publication Date

Comments

Abstract

Recommended Citation

Creative Commons License

PubMed ID

Language

Included in

Share

Browse

Search

Author Corner

Resources

Questions ?