%0 Journal Article %A Weißer, Cedric %A Netzer, Nils %A Görtz, Magdalena %A Schütz, Viktoria %A Hielscher, Thomas %A Schwab, Constantin %A Hohenfellner, Markus %A Schlemmer, Heinz-Peter %A Maier-Hein, Klaus H %A Bonekamp, David %T Weakly Supervised MRI Slice-Level Deep Learning Classification of Prostate Cancer Approximates Full Voxel- and Slice-Level Annotation: Effect of Increasing Training Set Size. %J Journal of magnetic resonance imaging %V 59 %N 4 %@ 1053-1807 %C New York, NY %I Wiley-Liss %M DKFZ-2023-01529 %P 1409-1422 %D 2024 %Z #EA:E010#LA:E010# / 2024 Apr;59(4):1409-1422 %X Weakly supervised learning promises reduced annotation effort while maintaining performance.To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC).Retrospective.One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695).1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging.Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions.Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05.Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70).Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data.3 TECHNICAL EFFICACY: Stage 2. %K MRI (Other) %K PI-RADS (Other) %K deep learning (Other) %K prostate cancer (Other) %K weakly supervised training (Other) %F PUB:(DE-HGF)16 %9 Journal Article %$ pmid:37504495 %R 10.1002/jmri.28891 %U https://inrepo02.dkfz.de/record/277861