TY - JOUR
AU - Legha, Amardeep
AU - Ensor, Joie
AU - Whittle, Rebecca
AU - Archer, Lucinda
AU - Van Calster, Ben
AU - Christodoulou, Evangelia
AU - Snell, Kym I E
AU - Sadatsafavi, Mohsen
AU - Collins, Gary S
AU - Riley, Richard D
TI - Sequential sample size calculations and learning curves safeguard the robust development of a clinical prediction model for individuals.
JO - Journal of clinical epidemiology
VL - nn
SN - 0895-4356
CY - Amsterdam [u.a.]
PB - Elsevier Science
M1 - DKFZ-2025-03027
SP - nn
PY - 2025
N1 - epub
AB - When recruiting participants to a new study developing a clinical prediction model (CPM), sample size calculations are typically conducted before data collection based on sensible assumptions. This leads to a fixed sample size, but if the assumptions are inaccurate, the actual sample size required to develop a reliable model may be higher or even lower. To safeguard against this, adaptive sample size approaches have been proposed, based on sequential evaluation of (changes in) a model's predictive performance.To illustrate and extend sequential sample size calculations for CPM development by (i) proposing stopping rules for prospective data collection based on minimising uncertainty (instability) and misclassification of individual-level predictions, and (ii) showcasing how it safeguards against inaccurate fixed sample size calculations.Using the sequential approach repeats the pre-defined model development strategy every time a chosen number (e.g., 100) of participants are recruited and adequately followed up. At each stage, CPM performance is evaluated using bootstrapping, leading to prediction and classification stability statistics and plots, alongside optimism-adjusted measures of calibration and discrimination. Learning curves display the trend of results against sample size and recruitment is stopped when a chosen stopping rule is met.Our approach is illustrated for model development of acute kidney injury using (penalised) logistic regression CPMs. Prior to recruitment based on perceived sensible assumptions, the fixed sample size calculation suggests recruiting 342 patients to minimise overfitting; however, during data collection the sequential approach reveals that a much larger sample size of 1100 is required to minimise overfitting (targeting a bootstrap-corrected calibration slope ≥0.9). If the stopping rule criteria also target small uncertainty and misclassification probability of individual predictions, the sequential approach suggests an even larger sample size of about n=1800.For CPM development studies involving prospective data collection, a sequential sample size approach allows users to dynamically monitor individual-level prediction and classification instability. This helps determine when enough participants have been recruited and safeguards against using inaccurate assumptions in a sample size calculation prior to data recruitment. Engagement with patients and other stakeholders is crucial to identify sensible context-specific stopping rules for robust individual predictions.
KW - Clinical Prediction Models (Other)
KW - Instability (Other)
KW - Learning Curves (Other)
KW - Model Development (Other)
KW - Sample Size (Other)
KW - Sequential (Other)
KW - Uncertainty (Other)
LB - PUB:(DE-HGF)16
C6 - pmid:41423140
DO - DOI:10.1016/j.jclinepi.2025.112117
UR - https://inrepo02.dkfz.de/record/307383
ER -