%0 Journal Article
%A Dexl, Jakob
%A Gatidis, Sergios
%A Früh, Marcel
%A Jeblick, Katharina
%A Mittermeier, Andreas
%A Stüber, Anna Theresa
%A Schachtner, Balthasar
%A Topalis, Johanna
%A Fabritius, Matthias P
%A Gu, Sijing
%A Murugesan, Gowtham Krishnan
%A VanOss, Jeff
%A Ye, Jin
%A He, Junjun
%A Alloula, Anissa
%A Papież, Bartłomiej W
%A Mesbah, Zacharia
%A Modzelewski, Romain
%A Hadlich, Matthias
%A Marinov, Zdravko
%A Stiefelhagen, Rainer
%A Isensee, Fabian
%A Maier-Hein, Klaus H
%A Galdran, Adrian
%A Nikolaou, Konstantin
%A la Fougère, Christian
%A Kim, Moon
%A Kallenberg, Nico
%A Kleesiek, Jens
%A Herrmann, Ken
%A Werner, Rudolf
%A Ingrisch, Michael
%A Cyran, Clemens C
%A Küstner, Thomas
%T AutoPET Challenge on Fully Automated Lesion Segmentation in Oncologic PET/CT Imaging, Part 2: Domain Generalization.
%J Journal of nuclear medicine
%V nn
%@ 0097-9058
%C New York, NY
%I Soc.
%M DKFZ-2026-00004
%P nn
%D 2025
%Z epub
%X This article reports the results of the second iteration of the autoPET challenge on automated lesion segmentation in whole-body PET/CT, held in conjunction with the 26th International Conference on Medical Image Computing and Computer Assisted Intervention in 2023. In contrast to the first autoPET challenge, which served as a proof of concept, this study investigates whether machine learning-based segmentation models trained on data from a single source can maintain performance across clinically relevant variations in PET/CT data, reflecting the demands of real-world deployment. Methods: A comprehensive biomedical segmentation challenge on PET/CT domain generalization was designed and conducted. Participants were tasked to train machine learning models on annotated whole-body 18F-FDG data (n = 1,014). These models were then evaluated on a test set of 200 samples from 5 clinically relevant domains, including variations in institutions, pathologies, and populations and a different tracer. Performance was measured in terms of average dice similarity coefficient, average false-positive volume, and average false-negative volume. The best-performing teams were awarded in 3 categories. Furthermore, a detailed analysis was conducted after the challenge, examining results across domains and unique instances, along with a ranking analysis. Results: Generalization from a single-source domain remains a significant challenge. Seventeen international teams successfully participated in the challenge. The best-performing team reached an average dice similarity coefficient of 0.5038, a mean false-positive volume of 87.8388 mL, and a mean false-negative volume of 8.4154 mL on the test set. nnU-Net was the most commonly used framework, with most participants using a 3-dimensional U-Net. Despite competitive in-domain results, out-of-domain performance deteriorated substantially, particularly on pediatric and prostate-specific membrane antigen data. Detailed error analysis revealed frequent false-positives due to physiologic uptake and decreased sensitivity in detecting small or low-uptake lesions. A majority-vote ensemble offered minimal performance gains, whereas an oracle ensemble indicates hypothetical gains. Ranking analysis showed no single team consistently outperformed all others across ranking schemes. Conclusion: The second autoPET challenge provides a comprehensive evaluation of the current state of automated PET/CT tumor segmentation, highlighting both progress and persistent challenges of single-source domain generalization and the need for diverse public datasets to enhance algorithm robustness.
%K PET/CT (Other)
%K biomedical image analysis challenge (Other)
%K deep learning (Other)
%K domain generalization (Other)
%K oncology (Other)
%K segmentation (Other)
%F PUB:(DE-HGF)16
%9 Journal Article
%$ pmid:41469162
%R 10.2967/jnumed.125.270260
%U https://inrepo02.dkfz.de/record/307499