%0 Journal Article
%A Weißer, Cedric
%A Netzer, Nils
%A Görtz, Magdalena
%A Schütz, Viktoria
%A Hielscher, Thomas
%A Schwab, Constantin
%A Hohenfellner, Markus
%A Schlemmer, Heinz-Peter
%A Maier-Hein, Klaus H
%A Bonekamp, David
%T Weakly Supervised MRI Slice-Level Deep Learning Classification of Prostate Cancer Approximates Full Voxel- and Slice-Level Annotation: Effect of Increasing Training Set Size.
%J Journal of magnetic resonance imaging
%V 59
%N 4
%@ 1053-1807
%C New York, NY
%I Wiley-Liss
%M DKFZ-2023-01529
%P 1409-1422
%D 2024
%Z #EA:E010#LA:E010# / 2024 Apr;59(4):1409-1422
%X Weakly supervised learning promises reduced annotation effort while maintaining performance.To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC).Retrospective.One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695).1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging.Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions.Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05.Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70).Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data.3 TECHNICAL EFFICACY: Stage 2.
%K MRI (Other)
%K PI-RADS (Other)
%K deep learning (Other)
%K prostate cancer (Other)
%K weakly supervised training (Other)
%F PUB:(DE-HGF)16
%9 Journal Article
%$ pmid:37504495
%R 10.1002/jmri.28891
%U https://inrepo02.dkfz.de/record/277861