Comparative benchmarking of failure detection methods in medical image segmentation: Unveiling the role of confidence aggregation.

Zenk, Maximilian; Norajitra, Tobias; Isensee, Fabian; Zimmerer, David; Traub, Jeremias; Jäger, Paul F; Maier-Hein, Klaus

doi:10.1016/j.media.2024.103392

Items
Marc 21

001			294922
005			20241216111324.0
024	7	_	\|a 10.1016/j.media.2024.103392 \|2 doi
024	7	_	\|a pmid:39657400 \|2 pmid
024	7	_	\|a 1361-8415 \|2 ISSN
024	7	_	\|a 1361-8431 \|2 ISSN
024	7	_	\|a 1361-8423 \|2 ISSN
037	_	_	\|a DKFZ-2024-02629
041	_	_	\|a English
082	_	_	\|a 610
100	1	_	\|a Zenk, Maximilian \|0 P:(DE-He78)eafef5cb69dd3d85f1cc942c474a220f \|b 0 \|e First author \|u dkfz
245	_	_	\|a Comparative benchmarking of failure detection methods in medical image segmentation: Unveiling the role of confidence aggregation.
260	_	_	\|a Amsterdam [u.a.] \|c 2025 \|b Elsevier Science
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1734343984_24909 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
500	_	_	\|a #EA:E230#LA:E230# / Available online 30 November 2024
520	_	_	\|a Semantic segmentation is an essential component of medical image analysis research, with recent deep learning algorithms offering out-of-the-box applicability across diverse datasets. Despite these advancements, segmentation failures remain a significant concern for real-world clinical applications, necessitating reliable detection mechanisms. This paper introduces a comprehensive benchmarking framework aimed at evaluating failure detection methodologies within medical image segmentation. Through our analysis, we identify the strengths and limitations of current failure detection metrics, advocating for the risk-coverage analysis as a holistic evaluation approach. Utilizing a collective dataset comprising five public 3D medical image collections, we assess the efficacy of various failure detection strategies under realistic test-time distribution shifts. Our findings highlight the importance of pixel confidence aggregation and we observe superior performance of the pairwise Dice score (Roy et al., 2019) between ensemble predictions, positioning it as a simple and robust baseline for failure detection in medical image segmentation. To promote ongoing research, we make the benchmarking framework available to the community.
536	_	_	\|a 315 - Bildgebung und Radioonkologie (POF4-315) \|0 G:(DE-HGF)POF4-315 \|c POF4-315 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
650	_	7	\|a Distribution shift \|2 Other
650	_	7	\|a Failure detection \|2 Other
650	_	7	\|a Quality control \|2 Other
650	_	7	\|a Semantic segmentation \|2 Other
650	_	7	\|a Uncertainty estimation \|2 Other
700	1	_	\|a Zimmerer, David \|0 P:(DE-He78)c1fcef80eab3d1e4fc187faece1a439c \|b 1 \|u dkfz
700	1	_	\|a Isensee, Fabian \|0 P:(DE-He78)7ea9af59d03ec7deb982a0e0562358fa \|b 2 \|u dkfz
700	1	_	\|a Traub, Jeremias \|0 P:(DE-He78)ee61463b02d49b2e085e3e7c8d6d963e \|b 3 \|u dkfz
700	1	_	\|a Norajitra, Tobias \|0 P:(DE-He78)a70f21a2bf78bbc1306c3d432ae08dc7 \|b 4 \|u dkfz
700	1	_	\|a Jäger, Paul F \|0 P:(DE-He78)04a0b5a49db132d8f00cee326cb743ca \|b 5 \|u dkfz
700	1	_	\|a Maier-Hein, Klaus \|0 P:(DE-He78)33c74005e1ce56f7025c4f6be15321b3 \|b 6 \|e Last author \|u dkfz
773	_	_	\|a 10.1016/j.media.2024.103392 \|g Vol. 101, p. 103392 - \|0 PERI:(DE-600)1497450-2 \|p 103392 \|t Medical image analysis \|v 101 \|y 2025 \|x 1361-8415
909	C	O	\|p VDB \|o oai:inrepo02.dkfz.de:294922
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 0 \|6 P:(DE-He78)eafef5cb69dd3d85f1cc942c474a220f
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 1 \|6 P:(DE-He78)c1fcef80eab3d1e4fc187faece1a439c
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 2 \|6 P:(DE-He78)7ea9af59d03ec7deb982a0e0562358fa
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 3 \|6 P:(DE-He78)ee61463b02d49b2e085e3e7c8d6d963e
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 4 \|6 P:(DE-He78)a70f21a2bf78bbc1306c3d432ae08dc7
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 5 \|6 P:(DE-He78)04a0b5a49db132d8f00cee326cb743ca
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 6 \|6 P:(DE-He78)33c74005e1ce56f7025c4f6be15321b3
913	1	_	\|a DE-HGF \|b Gesundheit \|l Krebsforschung \|1 G:(DE-HGF)POF4-310 \|0 G:(DE-HGF)POF4-315 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-300 \|4 G:(DE-HGF)POF \|v Bildgebung und Radioonkologie \|x 0
914	1	_	\|y 2024
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b MED IMAGE ANAL : 2022 \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0600 \|2 StatID \|b Ebsco Academic Search \|d 2023-10-21
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b ASC \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Clarivate Analytics Master Journal List \|d 2023-10-21
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0113 \|2 StatID \|b Science Citation Index Expanded \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0160 \|2 StatID \|b Essential Science Indicators \|d 2023-10-21
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1160 \|2 StatID \|b Current Contents - Engineering, Computing and Technology \|d 2023-10-21
915	_	_	\|a IF >= 10 \|0 StatID:(DE-HGF)9910 \|2 StatID \|b MED IMAGE ANAL : 2022 \|d 2023-10-21
920	2	_	\|0 I:(DE-He78)E230-20160331 \|k E230 \|l E230 Medizinische Bildverarbeitung \|x 0
920	1	_	\|0 I:(DE-He78)E230-20160331 \|k E230 \|l E230 Medizinische Bildverarbeitung \|x 0
920	1	_	\|0 I:(DE-He78)E290-20160331 \|k E290 \|l NWG Interaktives maschinelles Lernen \|x 1
920	0	_	\|0 I:(DE-He78)E230-20160331 \|k E230 \|l E230 Medizinische Bildverarbeitung \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-He78)E230-20160331
980	_	_	\|a I:(DE-He78)E290-20160331
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help