Recommendations for validating omics prediction models: Insights from a lung cancer RNA biomarker study.

Pestarino, Luca; Nøst, Therese H; Urbarova, Ilona; Røe, Oluf D; Rounge, Trine Ballestad; Fotopoulos, Ioannis; Langseth, Hilde; Turzanski-Fortner, Renée
doi:10.1158/1055-9965.EPI-25-0787
000303479 001__ 303479
000303479 005__ 20251012023059.0
000303479 0247_ $$2doi$$a10.1158/1055-9965.EPI-25-0787
000303479 0247_ $$2pmid$$apmid:40794097
000303479 0247_ $$2ISSN$$a1055-9965
000303479 0247_ $$2ISSN$$a1538-7755
000303479 0247_ $$2altmetric$$aaltmetric:182029261
000303479 037__ $$aDKFZ-2025-01676
000303479 041__ $$aEnglish
000303479 082__ $$a610
000303479 1001_ $$00000-0001-7097-2954$$aPestarino, Luca$$b0
000303479 245__ $$aRecommendations for validating omics prediction models: Insights from a lung cancer RNA biomarker study.
000303479 260__ $$aPhiladelphia, Pa.$$bAACR$$c2025
000303479 3367_ $$2DRIVER$$aarticle
000303479 3367_ $$2DataCite$$aOutput Types/Journal article
000303479 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1759827774_6326
000303479 3367_ $$2BibTeX$$aARTICLE
000303479 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000303479 3367_ $$00$$2EndNote$$aJournal Article
000303479 500__ $$a(2025) 34 (10): 1852–1860
000303479 520__ $$aExternal validation of predictive models in medical research is crucial to ensure their generalizability and applicability across diverse populations. However, validation often reveals discrepancies in model performance due to cohort differences, sample collection and storage, overfitting, and inconsistencies in data handling. This study investigates the challenges encountered during external validation of predictive models for early lung cancer detection using small RNA biomarkers, tying these challenges to specific validation outcomes and deriving recommendations.Predictive models based on the XGBoost algorithm, developed from serum samples in the JanusRNA cohort, were externally validated in two independent Norwegian cohorts: HUNT and NOWAC. These cohorts differed in sample types, RNA abundance, library preparation protocols, and lung cancer histological classification. Strategies to harmonize data processing and address these discrepancies were employed to ensure a robust validation process.Validation revealed significant challenges due to cohort heterogeneity. Median AUC values ranged from 0.50 to 0.66 in validation cohorts, compared to 0.62-0.76 in the original models. Models performed worse in the female-only NOWAC cohort, where plasma was used, highlighting the impact of sample type and cohort characteristics on predictive accuracy.Based on the challenges encountered during validation, we propose seven recommendations to guide robust external validation of omics-based predictive models including harmonizing data processing across cohorts, re-evaluating overfitting, and critically assessing model performance for clinical applications.By highlighting practical issues in model validation and providing recommendations, this study supports more reliable and clinically applicable biomarker-based prediction models, ultimately aiding cancer screening and prevention efforts.
000303479 536__ $$0G:(DE-HGF)POF4-313$$a313 - Krebsrisikofaktoren und Prävention (POF4-313)$$cPOF4-313$$fPOF IV$$x0
000303479 588__ $$aDataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
000303479 7001_ $$0P:(DE-He78)74a6af8347ec5cbd4b77e562e10ca1f2$$aTurzanski-Fortner, Renée$$b1$$udkfz
000303479 7001_ $$00000-0001-6805-3094$$aNøst, Therese H$$b2
000303479 7001_ $$00009-0006-3398-3498$$aFotopoulos, Ioannis$$b3
000303479 7001_ $$00000-0001-6626-2917$$aUrbarova, Ilona$$b4
000303479 7001_ $$00000-0002-4870-5822$$aRøe, Oluf D$$b5
000303479 7001_ $$00000-0002-9446-4855$$aLangseth, Hilde$$b6
000303479 7001_ $$00000-0003-2677-2722$$aRounge, Trine Ballestad$$b7
000303479 773__ $$0PERI:(DE-600)2036781-8$$a10.1158/1055-9965.EPI-25-0787$$n10$$p1852–1860$$tCancer epidemiology, biomarkers & prevention$$v34$$x1055-9965$$y2025
000303479 909CO $$ooai:inrepo02.dkfz.de:303479$$pVDB
000303479 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)74a6af8347ec5cbd4b77e562e10ca1f2$$aDeutsches Krebsforschungszentrum$$b1$$kDKFZ
000303479 9131_ $$0G:(DE-HGF)POF4-313$$1G:(DE-HGF)POF4-310$$2G:(DE-HGF)POF4-300$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lKrebsforschung$$vKrebsrisikofaktoren und Prävention$$x0
000303479 9141_ $$y2025
000303479 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)1030$$2StatID$$aDBCoverage$$bCurrent Contents - Life Sciences$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)1110$$2StatID$$aDBCoverage$$bCurrent Contents - Clinical Medicine$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bCANCER EPIDEM BIOMAR : 2022$$d2024-12-10
000303479 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5$$d2024-12-10
000303479 9201_ $$0I:(DE-He78)C180-20160331$$kC180$$lKrebsepidemiologie$$x0
000303479 980__ $$ajournal
000303479 980__ $$aVDB
000303479 980__ $$aI:(DE-He78)C180-20160331
000303479 980__ $$aUNRESTRICTED
guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help