Journal Article DKFZ-2025-01676

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Recommendations for validating omics prediction models: Insights from a lung cancer RNA biomarker study.

 ;  ;  ;  ;  ;  ;  ;

2025
AACR Philadelphia, Pa.

Cancer epidemiology, biomarkers & prevention nn, nn () [10.1158/1055-9965.EPI-25-0787]
 GO

This record in other databases:

Please use a persistent id in citations: doi:

Abstract: External validation of predictive models in medical research is crucial to ensure their generalizability and applicability across diverse populations. However, validation often reveals discrepancies in model performance due to cohort differences, sample collection and storage, overfitting, and inconsistencies in data handling. This study investigates the challenges encountered during external validation of predictive models for early lung cancer detection using small RNA biomarkers, tying these challenges to specific validation outcomes and deriving recommendations.Predictive models based on the XGBoost algorithm, developed from serum samples in the JanusRNA cohort, were externally validated in two independent Norwegian cohorts: HUNT and NOWAC. These cohorts differed in sample types, RNA abundance, library preparation protocols, and lung cancer histological classification. Strategies to harmonize data processing and address these discrepancies were employed to ensure a robust validation process.Validation revealed significant challenges due to cohort heterogeneity. Median AUC values ranged from 0.50 to 0.66 in validation cohorts, compared to 0.62-0.76 in the original models. Models performed worse in the female-only NOWAC cohort, where plasma was used, highlighting the impact of sample type and cohort characteristics on predictive accuracy.Based on the challenges encountered during validation, we propose seven recommendations to guide robust external validation of omics-based predictive models including harmonizing data processing across cohorts, re-evaluating overfitting, and critically assessing model performance for clinical applications.By highlighting practical issues in model validation and providing recommendations, this study supports more reliable and clinically applicable biomarker-based prediction models, ultimately aiding cancer screening and prevention efforts.

Classification:

Note: epub

Contributing Institute(s):
  1. Krebsepidemiologie (C180)
Research Program(s):
  1. 313 - Krebsrisikofaktoren und Prävention (POF4-313) (POF4-313)

Appears in the scientific report 2025
Database coverage:
Medline ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; Current Contents - Clinical Medicine ; Current Contents - Life Sciences ; Essential Science Indicators ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2025-08-13, last modified 2025-08-14



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)