Comparative benchmarking of failure detection methods in medical image segmentation: Unveiling the role of confidence aggregation.

Zenk, Maximilian; Norajitra, Tobias; Isensee, Fabian; Zimmerer, David; Traub, Jeremias; Jäger, Paul F; Maier-Hein, Klaus
doi:10.1016/j.media.2024.103392
000294922 001__ 294922
000294922 005__ 20241216111324.0
000294922 0247_ $$2doi$$a10.1016/j.media.2024.103392
000294922 0247_ $$2pmid$$apmid:39657400
000294922 0247_ $$2ISSN$$a1361-8415
000294922 0247_ $$2ISSN$$a1361-8431
000294922 0247_ $$2ISSN$$a1361-8423
000294922 037__ $$aDKFZ-2024-02629
000294922 041__ $$aEnglish
000294922 082__ $$a610
000294922 1001_ $$0P:(DE-He78)eafef5cb69dd3d85f1cc942c474a220f$$aZenk, Maximilian$$b0$$eFirst author$$udkfz
000294922 245__ $$aComparative benchmarking of failure detection methods in medical image segmentation: Unveiling the role of confidence aggregation.
000294922 260__ $$aAmsterdam [u.a.]$$bElsevier Science$$c2025
000294922 3367_ $$2DRIVER$$aarticle
000294922 3367_ $$2DataCite$$aOutput Types/Journal article
000294922 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1734343984_24909
000294922 3367_ $$2BibTeX$$aARTICLE
000294922 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000294922 3367_ $$00$$2EndNote$$aJournal Article
000294922 500__ $$a#EA:E230#LA:E230# / Available online 30 November 2024
000294922 520__ $$aSemantic segmentation is an essential component of medical image analysis research, with recent deep learning algorithms offering out-of-the-box applicability across diverse datasets. Despite these advancements, segmentation failures remain a significant concern for real-world clinical applications, necessitating reliable detection mechanisms. This paper introduces a comprehensive benchmarking framework aimed at evaluating failure detection methodologies within medical image segmentation. Through our analysis, we identify the strengths and limitations of current failure detection metrics, advocating for the risk-coverage analysis as a holistic evaluation approach. Utilizing a collective dataset comprising five public 3D medical image collections, we assess the efficacy of various failure detection strategies under realistic test-time distribution shifts. Our findings highlight the importance of pixel confidence aggregation and we observe superior performance of the pairwise Dice score (Roy et al., 2019) between ensemble predictions, positioning it as a simple and robust baseline for failure detection in medical image segmentation. To promote ongoing research, we make the benchmarking framework available to the community.
000294922 536__ $$0G:(DE-HGF)POF4-315$$a315 - Bildgebung und Radioonkologie (POF4-315)$$cPOF4-315$$fPOF IV$$x0
000294922 588__ $$aDataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
000294922 650_7 $$2Other$$aDistribution shift
000294922 650_7 $$2Other$$aFailure detection
000294922 650_7 $$2Other$$aQuality control
000294922 650_7 $$2Other$$aSemantic segmentation
000294922 650_7 $$2Other$$aUncertainty estimation
000294922 7001_ $$0P:(DE-He78)c1fcef80eab3d1e4fc187faece1a439c$$aZimmerer, David$$b1$$udkfz
000294922 7001_ $$0P:(DE-He78)7ea9af59d03ec7deb982a0e0562358fa$$aIsensee, Fabian$$b2$$udkfz
000294922 7001_ $$0P:(DE-He78)ee61463b02d49b2e085e3e7c8d6d963e$$aTraub, Jeremias$$b3$$udkfz
000294922 7001_ $$0P:(DE-He78)a70f21a2bf78bbc1306c3d432ae08dc7$$aNorajitra, Tobias$$b4$$udkfz
000294922 7001_ $$0P:(DE-He78)04a0b5a49db132d8f00cee326cb743ca$$aJäger, Paul F$$b5$$udkfz
000294922 7001_ $$0P:(DE-He78)33c74005e1ce56f7025c4f6be15321b3$$aMaier-Hein, Klaus$$b6$$eLast author$$udkfz
000294922 773__ $$0PERI:(DE-600)1497450-2$$a10.1016/j.media.2024.103392$$gVol. 101, p. 103392 -$$p103392$$tMedical image analysis$$v101$$x1361-8415$$y2025
000294922 909CO $$ooai:inrepo02.dkfz.de:294922$$pVDB
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)eafef5cb69dd3d85f1cc942c474a220f$$aDeutsches Krebsforschungszentrum$$b0$$kDKFZ
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)c1fcef80eab3d1e4fc187faece1a439c$$aDeutsches Krebsforschungszentrum$$b1$$kDKFZ
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)7ea9af59d03ec7deb982a0e0562358fa$$aDeutsches Krebsforschungszentrum$$b2$$kDKFZ
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)ee61463b02d49b2e085e3e7c8d6d963e$$aDeutsches Krebsforschungszentrum$$b3$$kDKFZ
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)a70f21a2bf78bbc1306c3d432ae08dc7$$aDeutsches Krebsforschungszentrum$$b4$$kDKFZ
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)04a0b5a49db132d8f00cee326cb743ca$$aDeutsches Krebsforschungszentrum$$b5$$kDKFZ
000294922 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)33c74005e1ce56f7025c4f6be15321b3$$aDeutsches Krebsforschungszentrum$$b6$$kDKFZ
000294922 9131_ $$0G:(DE-HGF)POF4-315$$1G:(DE-HGF)POF4-310$$2G:(DE-HGF)POF4-300$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lKrebsforschung$$vBildgebung und Radioonkologie$$x0
000294922 9141_ $$y2024
000294922 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bMED IMAGE ANAL : 2022$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology$$d2023-10-21
000294922 915__ $$0StatID:(DE-HGF)9910$$2StatID$$aIF >= 10$$bMED IMAGE ANAL : 2022$$d2023-10-21
000294922 9202_ $$0I:(DE-He78)E230-20160331$$kE230$$lE230 Medizinische Bildverarbeitung$$x0
000294922 9201_ $$0I:(DE-He78)E230-20160331$$kE230$$lE230 Medizinische Bildverarbeitung$$x0
000294922 9201_ $$0I:(DE-He78)E290-20160331$$kE290$$lNWG Interaktives maschinelles Lernen$$x1
000294922 9200_ $$0I:(DE-He78)E230-20160331$$kE230$$lE230 Medizinische Bildverarbeitung$$x0
000294922 980__ $$ajournal
000294922 980__ $$aVDB
000294922 980__ $$aI:(DE-He78)E230-20160331
000294922 980__ $$aI:(DE-He78)E290-20160331
000294922 980__ $$aUNRESTRICTED
guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help