Benchmarking foundation models as feature extractors for weakly supervised computational pathology.

Neidlinger, Peter; El Nahhas, Omar S M; Saldanha, Oliver Lester; Brenner, Hermann; Dislich, Bastian; Foersch, Sebastian; Langer, Rupert; Lenz, Tim; Kather, Jakob Nikolas; Röcken, Christoph; Behrens, Hans Michael; Truhn, Daniel; Muti, Hannah Sophie; Hoffmeister, Michael; van Treeck, Marko; Marra, Antonio

doi:10.1038/s41551-025-01516-3

Journal Article

DKFZ-2025-02019

Benchmarking foundation models as feature extractors for weakly supervised computational pathology.

Neidlinger, P. ; El Nahhas, O. S. M. ; Muti, H. S. ; Lenz, T. ; Hoffmeister, M.DKFZ* ; Brenner, H.DKFZ* ; van Treeck, M. ; Langer, R. ; Dislich, B. ; Behrens, H. M. ; Röcken, C. ; Foersch, S. ; Truhn, D. ; Marra, A. ; Saldanha, O. L. ; Kather, J. N.

2025
Nature Research Tokyo

Nature biomedical engineering nn, nn (2025) [10.1038/s41551-025-01516-3]

This record in other databases:

Please use a persistent id in citations: doi:10.1038/s41551-025-01516-3

Abstract: Numerous pathology foundation models have been developed to extract clinically relevant information. There is currently limited literature independently evaluating these foundation models on external cohorts and clinically relevant tasks to uncover adjustments for future improvements. Here we benchmark 19 histopathology foundation models on 13 patient cohorts with 6,818 patients and 9,528 slides from lung, colorectal, gastric and breast cancers. The models were evaluated on weakly supervised tasks related to biomarkers, morphological properties and prognostic outcomes. We show that a vision-language foundation model, CONCH, yielded the highest overall performance when compared with vision-only foundation models, with Virchow2 as close second, although its superior performance was less pronounced in low-data scenarios and low-prevalence tasks. The experiments reveal that foundation models trained on distinct cohorts learn complementary features to predict the same label, and can be fused to outperform the current state of the art. An ensemble combining CONCH and Virchow2 predictions outperformed individual models in 55% of tasks, leveraging their complementary strengths in classification scenarios. Moreover, our findings suggest that data diversity outweighs data volume for foundation models.

Classification:

ddc:610

Note: epub

Contributing Institute(s):

Research Program(s):

313 - Krebsrisikofaktoren und Prävention (POF4-313) (POF4-313)

Appears in the scientific report 2025

Database coverage:
Medline

; Clarivate Analytics Master Journal List ; DEAL Nature ; Essential Science Indicators ; IF >= 25 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

Record created 2025-10-02, last modified 2025-10-05

Similar records

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help