Document Type

Article

Publication Date

4-1-2026

Comments

This article is the author’s final published version in Medical Physics, Volume 53, Issue 4, 2026, Article number e70406.

The published version is available at https://doi.org/10.1002/mp.70406. Copyright © 2026 The Author(s).

 

Abstract

BACKGROUND: Pulmonary ventilation imaging enables functional avoidance radiotherapy treatment plans by quantifying regional lung function. However, current clinical standards, such as 99𝑚Tc-based single-photon emission computed tomography (SPECT), rely on radioactive tracers, which can introduce imaging deposition artifacts. CT ventilation imaging (CTVI) methods based on both physical models and deep learning approaches currently require multiple CT images as input, such as the inhale/exhale phases of a 4DCT. While the theoretical foundation of physics-based CTVI is built on multi-phase information, the feasibility of single-phase deep learning CTV models has not been determined.

PURPOSE: While deep learning methods have predicted SPECT ventilation from multi-phase 4DCT, the benefit of including more than one respiratory phase remains unclear. Predicting ventilation using only single-phase CTs reduces computational expense, potentially simplifies the image acquisition process, and avoids artifacts introduced by image registration, thereby making deep learning-based CTV approaches more feasible for clinical applications outside of radiotherapy. This study (1) develops a deep learning model to predict SPECT ventilation using only the inhale phase of non-contrast 4DCT and (2) evaluates the impact of adding the exhale phase.

METHODS: We developed a SwinUNETR-based architecture using the maximum inhale 4DCT phase to predict pulmonary ventilation. A total of 44 cases with paired inhale CT and SPECT scans were used in the training. To assess multi-phase benefits, we compared: (1) InhaleCT-Swin Model-trained on inhale CT only; (2) ExhaleCT-Swin Model-trained on exhale CT only; (3) Hybrid Models IECT-Swin-FTD, IECT-Swin-FTDE, IECT-Swin-FTDES, fine-tuned on inhale/exhale CT pairs (IECT) with varying network components updated. A standard U-Net was also trained on inhale CT (InhaleCT-UNet), exhale CT (ExhaleCT-UNet), and IECT (IECT-UNet) for cross-architecture evaluation.

RESULTS: The SwinUNETR-based Hybrid Model, IECT-Swin-FTD, achieved mean voxel-wise Spearman correlation of 0.762 ± 0.035, outperforming the current state-of-the-art methods. Our transformer-based model trained on inhale CT slightly outperformed exhale CT with no significant differences ( p=0.098 ). U-Net achieved lower overall accuracy, though its highest performance occurred with IECT. No significant difference was found between InhaleCT-Swin Model and the best-performing hybrid UNet Model, IECT-UNet ( p=0.556 ).

CONCLUSIONS: A transformer-based model with its decoder fine-tuned on IECT (IECT-Swin-FTD) achieved state-of-the-art accuracy for SPECT ventilation prediction. Moreover, our InhaleCT-Swin Model achieved comparable results with widely used UNet-based models that require multi-phase CT, showing that single CT may be sufficient for accurate ventilation prediction and may improve clinical workflow by reducing acquisition requirements and registration-related artifacts.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

PubMed ID

41881558

Language

English

Share

COinS