Recommendations for validating omics prediction models: Insights from a lung cancer RNA biomarker study.

Pestarino, Luca; Nøst, Therese H; Urbarova, Ilona; Røe, Oluf D; Rounge, Trine Ballestad; Fotopoulos, Ioannis; Langseth, Hilde; Turzanski-Fortner, Renée

doi:10.1158/1055-9965.EPI-25-0787

Items
Marc 21

001			303479
005			20250814114517.0
024	7	_	\|a 10.1158/1055-9965.EPI-25-0787 \|2 doi
024	7	_	\|a pmid:40794097 \|2 pmid
024	7	_	\|a 1055-9965 \|2 ISSN
024	7	_	\|a 1538-7755 \|2 ISSN
037	_	_	\|a DKFZ-2025-01676
041	_	_	\|a English
082	_	_	\|a 610
100	1	_	\|a Pestarino, Luca \|0 0000-0001-7097-2954 \|b 0
245	_	_	\|a Recommendations for validating omics prediction models: Insights from a lung cancer RNA biomarker study.
260	_	_	\|a Philadelphia, Pa. \|c 2025 \|b AACR
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1755152012_13098 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
500	_	_	\|a epub
520	_	_	\|a External validation of predictive models in medical research is crucial to ensure their generalizability and applicability across diverse populations. However, validation often reveals discrepancies in model performance due to cohort differences, sample collection and storage, overfitting, and inconsistencies in data handling. This study investigates the challenges encountered during external validation of predictive models for early lung cancer detection using small RNA biomarkers, tying these challenges to specific validation outcomes and deriving recommendations.Predictive models based on the XGBoost algorithm, developed from serum samples in the JanusRNA cohort, were externally validated in two independent Norwegian cohorts: HUNT and NOWAC. These cohorts differed in sample types, RNA abundance, library preparation protocols, and lung cancer histological classification. Strategies to harmonize data processing and address these discrepancies were employed to ensure a robust validation process.Validation revealed significant challenges due to cohort heterogeneity. Median AUC values ranged from 0.50 to 0.66 in validation cohorts, compared to 0.62-0.76 in the original models. Models performed worse in the female-only NOWAC cohort, where plasma was used, highlighting the impact of sample type and cohort characteristics on predictive accuracy.Based on the challenges encountered during validation, we propose seven recommendations to guide robust external validation of omics-based predictive models including harmonizing data processing across cohorts, re-evaluating overfitting, and critically assessing model performance for clinical applications.By highlighting practical issues in model validation and providing recommendations, this study supports more reliable and clinically applicable biomarker-based prediction models, ultimately aiding cancer screening and prevention efforts.
536	_	_	\|a 313 - Krebsrisikofaktoren und Prävention (POF4-313) \|0 G:(DE-HGF)POF4-313 \|c POF4-313 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
700	1	_	\|a Turzanski-Fortner, Renée \|0 P:(DE-He78)74a6af8347ec5cbd4b77e562e10ca1f2 \|b 1 \|u dkfz
700	1	_	\|a Nøst, Therese H \|0 0000-0001-6805-3094 \|b 2
700	1	_	\|a Fotopoulos, Ioannis \|0 0009-0006-3398-3498 \|b 3
700	1	_	\|a Urbarova, Ilona \|0 0000-0001-6626-2917 \|b 4
700	1	_	\|a Røe, Oluf D \|0 0000-0002-4870-5822 \|b 5
700	1	_	\|a Langseth, Hilde \|0 0000-0002-9446-4855 \|b 6
700	1	_	\|a Rounge, Trine Ballestad \|0 0000-0003-2677-2722 \|b 7
773	_	_	\|a 10.1158/1055-9965.EPI-25-0787 \|0 PERI:(DE-600)2036781-8 \|p nn \|t Cancer epidemiology, biomarkers & prevention \|v nn \|y 2025 \|x 1055-9965
909	C	O	\|o oai:inrepo02.dkfz.de:303479 \|p VDB
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 1 \|6 P:(DE-He78)74a6af8347ec5cbd4b77e562e10ca1f2
913	1	_	\|a DE-HGF \|b Gesundheit \|l Krebsforschung \|1 G:(DE-HGF)POF4-310 \|0 G:(DE-HGF)POF4-313 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-300 \|4 G:(DE-HGF)POF \|v Krebsrisikofaktoren und Prävention \|x 0
914	1	_	\|y 2025
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Clarivate Analytics Master Journal List \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1050 \|2 StatID \|b BIOSIS Previews \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0160 \|2 StatID \|b Essential Science Indicators \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1030 \|2 StatID \|b Current Contents - Life Sciences \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1190 \|2 StatID \|b Biological Abstracts \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1110 \|2 StatID \|b Current Contents - Clinical Medicine \|d 2024-12-10
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0113 \|2 StatID \|b Science Citation Index Expanded \|d 2024-12-10
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection \|d 2024-12-10
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b CANCER EPIDEM BIOMAR : 2022 \|d 2024-12-10
915	_	_	\|a IF < 5 \|0 StatID:(DE-HGF)9900 \|2 StatID \|d 2024-12-10
920	1	_	\|0 I:(DE-He78)C180-20160331 \|k C180 \|l Krebsepidemiologie \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-He78)C180-20160331
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help