Automated curation of large-scale cancer histopathology image datasets using deep learning.

Hilgers, Lars; Yuan, Tanwei; West, Nicholas P; Ghaffari Laleh, Narmin; Brobeil, Alexander; Loeffler, Chiara M L; Brenner, Hermann; Brinker, Titus J; Westwood, Alice; Hewitt, Katherine J; Quirke, Philip; Kather, Jakob Nikolas; Grabsch, Heike I; Matthaei, Emylou; Hoffmeister, Michael; Carrero, Zunamys I

doi:10.1111/his.15159

Items
Marc 21

001			288710
005			20250528125727.0
024	7	_	\|a 10.1111/his.15159 \|2 doi
024	7	_	\|a pmid:38409878 \|2 pmid
024	7	_	\|a 0309-0167 \|2 ISSN
024	7	_	\|a 1365-2559 \|2 ISSN
024	7	_	\|a altmetric:160182880 \|2 altmetric
037	_	_	\|a DKFZ-2024-00429
041	_	_	\|a English
082	_	_	\|a 610
100	1	_	\|a Hilgers, Lars \|b 0
245	_	_	\|a Automated curation of large-scale cancer histopathology image datasets using deep learning.
260	_	_	\|a Oxford [u.a.] \|c 2024 \|b Wiley-Blackwell
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1714035834_1931 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
500	_	_	\|a 2024 Jun;84(7):1139-1153
520	_	_	\|a Artificial intelligence (AI) has numerous applications in pathology, supporting diagnosis and prognostication in cancer. However, most AI models are trained on highly selected data, typically one tissue slide per patient. In reality, especially for large surgical resection specimens, dozens of slides can be available for each patient. Manually sorting and labelling whole-slide images (WSIs) is a very time-consuming process, hindering the direct application of AI on the collected tissue samples from large cohorts. In this study we addressed this issue by developing a deep-learning (DL)-based method for automatic curation of large pathology datasets with several slides per patient.We collected multiple large multicentric datasets of colorectal cancer histopathological slides from the United Kingdom (FOXTROT, N = 21,384 slides; CR07, N = 7985 slides) and Germany (DACHS, N = 3606 slides). These datasets contained multiple types of tissue slides, including bowel resection specimens, endoscopic biopsies, lymph node resections, immunohistochemistry-stained slides, and tissue microarrays. We developed, trained, and tested a deep convolutional neural network model to predict the type of slide from the slide overview (thumbnail) image. The primary statistical endpoint was the macro-averaged area under the receiver operating curve (AUROCs) for detection of the type of slide.In the primary dataset (FOXTROT), with an AUROC of 0.995 [95% confidence interval [CI]: 0.994-0.996] the algorithm achieved a high classification performance and was able to accurately predict the type of slide from the thumbnail image alone. In the two external test cohorts (CR07, DACHS) AUROCs of 0.982 [95% CI: 0.979-0.985] and 0.875 [95% CI: 0.864-0.887] were observed, which indicates the generalizability of the trained model on unseen datasets. With a confidence threshold of 0.95, the model reached an accuracy of 94.6% (7331 classified cases) in CR07 and 85.1% (2752 classified cases) for the DACHS cohort.Our findings show that using the low-resolution thumbnail image is sufficient to accurately classify the type of slide in digital pathology. This can support researchers to make the vast resource of existing pathology archives accessible to modern AI models with only minimal manual annotations.
536	_	_	\|a 313 - Krebsrisikofaktoren und Prävention (POF4-313) \|0 G:(DE-HGF)POF4-313 \|c POF4-313 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
650	_	7	\|a colorectal cancer \|2 Other
650	_	7	\|a deep learning \|2 Other
650	_	7	\|a digital pathology \|2 Other
650	_	7	\|a quality control \|2 Other
700	1	_	\|a Ghaffari Laleh, Narmin \|b 1
700	1	_	\|a West, Nicholas P \|0 0000-0002-0346-6709 \|b 2
700	1	_	\|a Westwood, Alice \|b 3
700	1	_	\|a Hewitt, Katherine J \|b 4
700	1	_	\|a Quirke, Philip \|b 5
700	1	_	\|a Grabsch, Heike I \|b 6
700	1	_	\|a Carrero, Zunamys I \|b 7
700	1	_	\|a Matthaei, Emylou \|b 8
700	1	_	\|a Loeffler, Chiara M L \|b 9
700	1	_	\|a Brinker, Titus J \|0 P:(DE-He78)1e33961c8780aca9b76d776d1fdc1ebb \|b 10 \|u dkfz
700	1	_	\|a Yuan, Tanwei \|0 P:(DE-He78)b9e439a1aa1244925f92d547c0919349 \|b 11 \|u dkfz
700	1	_	\|a Brenner, Hermann \|0 P:(DE-He78)90d5535ff896e70eed81f4a4f6f22ae2 \|b 12 \|u dkfz
700	1	_	\|a Brobeil, Alexander \|b 13
700	1	_	\|a Hoffmeister, Michael \|0 P:(DE-He78)6c5d058b7552d071a7fa4c5e943fff0f \|b 14 \|u dkfz
700	1	_	\|a Kather, Jakob Nikolas \|b 15
773	_	_	\|a 10.1111/his.15159 \|g p. his.15159 \|0 PERI:(DE-600)2006447-0 \|n 7 \|p 1139-1153 \|t Histopathology \|v 84 \|y 2024 \|x 0309-0167
856	4	_	\|u https://inrepo02.dkfz.de/record/288710/files/Histopathology%20-%202024%20-%20Hilgers%20-%20Automated%20curation%20of%20large%E2%80%90scale%20cancer%20histopathology%20image%20datasets%20using%20deep.pdf
856	4	_	\|u https://inrepo02.dkfz.de/record/288710/files/Histopathology%20-%202024%20-%20Hilgers%20-%20Automated%20curation%20of%20large%E2%80%90scale%20cancer%20histopathology%20image%20datasets%20using%20deep.pdf?subformat=pdfa \|x pdfa
909	C	O	\|p VDB \|o oai:inrepo02.dkfz.de:288710
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 10 \|6 P:(DE-He78)1e33961c8780aca9b76d776d1fdc1ebb
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 11 \|6 P:(DE-He78)b9e439a1aa1244925f92d547c0919349
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 12 \|6 P:(DE-He78)90d5535ff896e70eed81f4a4f6f22ae2
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 14 \|6 P:(DE-He78)6c5d058b7552d071a7fa4c5e943fff0f
913	1	_	\|a DE-HGF \|b Gesundheit \|l Krebsforschung \|1 G:(DE-HGF)POF4-310 \|0 G:(DE-HGF)POF4-313 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-300 \|4 G:(DE-HGF)POF \|v Krebsrisikofaktoren und Prävention \|x 0
914	1	_	\|y 2024
915	_	_	\|a Nationallizenz \|0 StatID:(DE-HGF)0420 \|2 StatID \|d 2023-08-23 \|w ger
915	_	_	\|a DEAL Wiley \|0 StatID:(DE-HGF)3001 \|2 StatID \|d 2023-08-23 \|w ger
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b HISTOPATHOLOGY : 2022 \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0600 \|2 StatID \|b Ebsco Academic Search \|d 2023-08-23
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b ASC \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Clarivate Analytics Master Journal List \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1050 \|2 StatID \|b BIOSIS Previews \|d 2023-08-23
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0113 \|2 StatID \|b Science Citation Index Expanded \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1030 \|2 StatID \|b Current Contents - Life Sciences \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1190 \|2 StatID \|b Biological Abstracts \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0160 \|2 StatID \|b Essential Science Indicators \|d 2023-08-23
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1110 \|2 StatID \|b Current Contents - Clinical Medicine \|d 2023-08-23
915	_	_	\|a IF >= 5 \|0 StatID:(DE-HGF)9905 \|2 StatID \|b HISTOPATHOLOGY : 2022 \|d 2023-08-23
920	1	_	\|0 I:(DE-He78)C140-20160331 \|k C140 \|l NWG Digitale Biomarker in der Onkologie \|x 0
920	1	_	\|0 I:(DE-He78)C070-20160331 \|k C070 \|l C070 Klinische Epidemiologie und Alternf. \|x 1
920	1	_	\|0 I:(DE-He78)C120-20160331 \|k C120 \|l Präventive Onkologie \|x 2
920	1	_	\|0 I:(DE-He78)HD01-20160331 \|k HD01 \|l DKTK HD zentral \|x 3
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-He78)C140-20160331
980	_	_	\|a I:(DE-He78)C070-20160331
980	_	_	\|a I:(DE-He78)C120-20160331
980	_	_	\|a I:(DE-He78)HD01-20160331
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help