TY - JOUR
T1 - The VAMPIRE challenge: A multi-institutional validation study of CT ventilation imaging
AU - Kipritidis, John
AU - Tahir, Bilal A.
AU - Cazoulat, Guillaume
AU - Hofman, Michael S.
AU - Siva, Shankar
AU - Callahan, Jason
AU - Hardcastle, Nicholas
AU - Yamamoto, Tokihiro
AU - Christensen, Gary E.
AU - Reinhardt, Joseph M.
AU - Kadoya, Noriyuki
AU - Patton, Taylor J.
AU - Gerard, Sarah E.
AU - Duarte, Isabella
AU - Archibald-Heeren, Ben
AU - Byrne, Mikel
AU - Sims, Rick
AU - Ramsay, Scott
AU - Booth, Jeremy T.
AU - Eslick, Enid
AU - Hegi-Johnson, Fiona
AU - Woodruff, Henry C.
AU - Ireland, Rob H.
AU - Wild, Jim M.
AU - Cai, Jing
AU - Bayouth, John E.
AU - Brock, Kristy
AU - Keall, Paul J.
PY - 2019/3
Y1 - 2019/3
N2 - Purpose: CT ventilation imaging (CTVI) is being used to achieve functional avoidance lung cancer radiation therapy in three clinical trials (NCT02528942, NCT02308709, NCT02843568). To address the need for common CTVI validation tools, we have built the Ventilation And Medical Pulmonary Image Registration Evaluation (VAMPIRE) Dataset, and present the results of the first VAMPIRE Challenge to compare relative ventilation distributions between different CTVI algorithms and other established ventilation imaging modalities. Methods: The VAMPIRE Dataset includes 50 pairs of 4DCT scans and corresponding clinical or experimental ventilation scans, referred to as reference ventilation images (RefVIs). The dataset includes 25 humans imaged with Galligas 4DPET/CT, 21 humans imaged with DTPA-SPECT, and 4 sheep imaged with Xenon-CT. For the VAMPIRE Challenge, 16 subjects were allocated to a training group (with RefVI provided) and 34 subjects were allocated to a validation group (with RefVI blinded). Seven research groups downloaded the Challenge dataset and uploaded CTVIs based on deformable image registration (DIR) between the 4DCT inhale/exhale phases. Participants used DIR methods broadly classified into B-splines, Free-form, Diffeomorphisms, or Biomechanical modeling, with CT ventilation metrics based on the DIR evaluation of volume change, Hounsfield Unit change, or various hybrid approaches. All CTVIs were evaluated against the corresponding RefVI using the voxel-wise Spearman coefficient r S , and Dice similarity coefficients evaluated for low function lung (DSC low ) and high function lung (DSC high ). Results: A total of 37 unique combinations of DIR method and CT ventilation metric were either submitted by participants directly or derived from participant-submitted DIR motion fields using the in-house software, VESPIR. The r S and DSC results reveal a high degree of inter-algorithm and intersubject variability among the validation subjects, with algorithm rankings changing by up to ten positions depending on the choice of evaluation metric. The algorithm with the highest overall cross-modality correlations used a biomechanical model-based DIR with a hybrid ventilation metric, achieving a median (range) of 0.49 (0.27–0.73) for r S , 0.52 (0.36–0.67) for (DSC low ), and 0.45 (0.28–0.62) for (DSC high ). All other algorithms exhibited at least one negative r S value, and/or one DSC value less than 0.5. Conclusions: The VAMPIRE Challenge results demonstrate that the cross-modality correlation between CTVIs and the RefVIs varies not only with the choice of CTVI algorithm but also with the choice of RefVI modality, imaging subject, and the evaluation metric used to compare relative ventilation distributions. This variability may arise from the fact that each of the different CTVI algorithms and RefVI modalities provides a distinct physiologic measurement. Ultimately this variability, coupled with the lack of a “gold standard,” highlights the ongoing importance of further validation studies before CTVI can be widely translated from academic centers to the clinic. It is hoped that the information gleaned from the VAMPIRE Challenge can help inform future validation efforts.
AB - Purpose: CT ventilation imaging (CTVI) is being used to achieve functional avoidance lung cancer radiation therapy in three clinical trials (NCT02528942, NCT02308709, NCT02843568). To address the need for common CTVI validation tools, we have built the Ventilation And Medical Pulmonary Image Registration Evaluation (VAMPIRE) Dataset, and present the results of the first VAMPIRE Challenge to compare relative ventilation distributions between different CTVI algorithms and other established ventilation imaging modalities. Methods: The VAMPIRE Dataset includes 50 pairs of 4DCT scans and corresponding clinical or experimental ventilation scans, referred to as reference ventilation images (RefVIs). The dataset includes 25 humans imaged with Galligas 4DPET/CT, 21 humans imaged with DTPA-SPECT, and 4 sheep imaged with Xenon-CT. For the VAMPIRE Challenge, 16 subjects were allocated to a training group (with RefVI provided) and 34 subjects were allocated to a validation group (with RefVI blinded). Seven research groups downloaded the Challenge dataset and uploaded CTVIs based on deformable image registration (DIR) between the 4DCT inhale/exhale phases. Participants used DIR methods broadly classified into B-splines, Free-form, Diffeomorphisms, or Biomechanical modeling, with CT ventilation metrics based on the DIR evaluation of volume change, Hounsfield Unit change, or various hybrid approaches. All CTVIs were evaluated against the corresponding RefVI using the voxel-wise Spearman coefficient r S , and Dice similarity coefficients evaluated for low function lung (DSC low ) and high function lung (DSC high ). Results: A total of 37 unique combinations of DIR method and CT ventilation metric were either submitted by participants directly or derived from participant-submitted DIR motion fields using the in-house software, VESPIR. The r S and DSC results reveal a high degree of inter-algorithm and intersubject variability among the validation subjects, with algorithm rankings changing by up to ten positions depending on the choice of evaluation metric. The algorithm with the highest overall cross-modality correlations used a biomechanical model-based DIR with a hybrid ventilation metric, achieving a median (range) of 0.49 (0.27–0.73) for r S , 0.52 (0.36–0.67) for (DSC low ), and 0.45 (0.28–0.62) for (DSC high ). All other algorithms exhibited at least one negative r S value, and/or one DSC value less than 0.5. Conclusions: The VAMPIRE Challenge results demonstrate that the cross-modality correlation between CTVIs and the RefVIs varies not only with the choice of CTVI algorithm but also with the choice of RefVI modality, imaging subject, and the evaluation metric used to compare relative ventilation distributions. This variability may arise from the fact that each of the different CTVI algorithms and RefVI modalities provides a distinct physiologic measurement. Ultimately this variability, coupled with the lack of a “gold standard,” highlights the ongoing importance of further validation studies before CTVI can be widely translated from academic centers to the clinic. It is hoped that the information gleaned from the VAMPIRE Challenge can help inform future validation efforts.
KW - 4DCT
KW - CT ventilation imaging
KW - deformable image registration
KW - lung cancer
UR - http://www.scopus.com/inward/record.url?scp=85060916328&partnerID=8YFLogxK
U2 - 10.1002/mp.13346
DO - 10.1002/mp.13346
M3 - Journal article
C2 - 30575051
AN - SCOPUS:85060916328
SN - 0094-2405
VL - 46
SP - 1198
EP - 1217
JO - Medical Physics
JF - Medical Physics
IS - 3
ER -