Document Type

Conference Proceeding

Publication Date



© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

A. Tahmasebi et al., "Ultrasonographic risk stratification of indeterminate thyroid nodules; a comparison of an artificial intelligence algorithm with radiologist performance," 2020 IEEE International Ultrasonics Symposium (IUS), Las Vegas, NV, USA, 2020, pp. 1-4, doi: 10.1109/IUS46767.2020.9251374.


Background, Motivation and Objective: Thyroid nodules with indeterminate or suspicious cytology are commonly encountered in clinical practice and their clinical management is controversial. Recently, genetical analysis of thyroid fine needle aspiration (FNAs) was implemented at some institutions to differentiate thyroid nodules as high and low risk based on the presence of certain oncogenes commonly associated with aggressive tumor behavior and poor patient outcomes. Our group recently detailed the performance of a machine-learning model based on ultrasonography images of thyroid nodules for the prediction of high and low risk mutations. This study evaluated the performance of a second-generation machine-learning algorithm incorporating both object detection analysis and image classification and subsequently compared performance against blinded radiologists.

Statement of Contribution/Methods: This retrospective study was conducted at Thomas Jefferson University and included an evaluation of 262 thyroid nodules that underwent ultrasound imaging, ultrasound-guided FNA and next-generation sequencing (NGS) or surgical pathology after resection. An object detection and image classification model were employed to first identify the location of nodules and then to assess the malignancy. A Google cloud platform (AutoML Vision; Google LLC) was used for this purpose. Either NGS or surgical pathology was considered as reference standard upon availability. 211 nodules were used for model development and the unused 51 nodules for model testing. Diagnostic performance in 47 nodules for which pathology or NGS were available was compared to blinded reads by 3 radiologists and performance expressed as mean ± standard deviation %.

Results/Discussion: The algorithm achieved positive predictive value (PPV) of 68.31% and sensitivity of 86.81% within the training model. The model was tested on images of 51 unused nodules and all 51 nodules were correctly located (100%). For risk stratification, the model demonstrated a sensitivity of 73.9%, specificity of 70.8%, positive predictive value (PPV) of 70.8%, negative predictive value (NPV) of 73.9% and overall accuracy of 66.7% in the 47 nodules. For comparison, the 3 radiologist performance in this same dataset demonstrated a sensitivity of, specificity of, PPV of, NPV of, and overall accuracy of This work demonstrates that a machine-learning algorithm using image classification performed similarly, if not slightly better than 3 experienced radiologists. Future research will focus on incorporating machine learning findings within radiologist interpretation to potentially improve diagnostic accuracy.



Included in

Radiology Commons