Journal Article DKFZ-2025-03027

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Sequential sample size calculations and learning curves safeguard the robust development of a clinical prediction model for individuals.

 ;  ;  ;  ;  ;  ;  ;  ;  ;

2025
Elsevier Science Amsterdam [u.a.]

Journal of clinical epidemiology nn, nn () [10.1016/j.jclinepi.2025.112117]
 GO

Abstract: When recruiting participants to a new study developing a clinical prediction model (CPM), sample size calculations are typically conducted before data collection based on sensible assumptions. This leads to a fixed sample size, but if the assumptions are inaccurate, the actual sample size required to develop a reliable model may be higher or even lower. To safeguard against this, adaptive sample size approaches have been proposed, based on sequential evaluation of (changes in) a model's predictive performance.To illustrate and extend sequential sample size calculations for CPM development by (i) proposing stopping rules for prospective data collection based on minimising uncertainty (instability) and misclassification of individual-level predictions, and (ii) showcasing how it safeguards against inaccurate fixed sample size calculations.Using the sequential approach repeats the pre-defined model development strategy every time a chosen number (e.g., 100) of participants are recruited and adequately followed up. At each stage, CPM performance is evaluated using bootstrapping, leading to prediction and classification stability statistics and plots, alongside optimism-adjusted measures of calibration and discrimination. Learning curves display the trend of results against sample size and recruitment is stopped when a chosen stopping rule is met.Our approach is illustrated for model development of acute kidney injury using (penalised) logistic regression CPMs. Prior to recruitment based on perceived sensible assumptions, the fixed sample size calculation suggests recruiting 342 patients to minimise overfitting; however, during data collection the sequential approach reveals that a much larger sample size of 1100 is required to minimise overfitting (targeting a bootstrap-corrected calibration slope ≥0.9). If the stopping rule criteria also target small uncertainty and misclassification probability of individual predictions, the sequential approach suggests an even larger sample size of about n=1800.For CPM development studies involving prospective data collection, a sequential sample size approach allows users to dynamically monitor individual-level prediction and classification instability. This helps determine when enough participants have been recruited and safeguards against using inaccurate assumptions in a sample size calculation prior to data recruitment. Engagement with patients and other stakeholders is crucial to identify sensible context-specific stopping rules for robust individual predictions.

Keyword(s): Clinical Prediction Models ; Instability ; Learning Curves ; Model Development ; Sample Size ; Sequential ; Uncertainty

Classification:

Note: epub

Contributing Institute(s):
  1. E130 Intelligente Medizinische Systeme (E130)
Research Program(s):
  1. 315 - Bildgebung und Radioonkologie (POF4-315) (POF4-315)

Appears in the scientific report 2025
Database coverage:
Medline ; Clarivate Analytics Master Journal List ; Current Contents - Clinical Medicine ; Current Contents - Life Sciences ; Ebsco Academic Search ; Essential Science Indicators ; IF >= 5 ; JCR ; NationallizenzNationallizenz ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2025-12-22, last modified 2025-12-23



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)