Sequential sample size calculations and learning curves safeguard the robust development of a clinical prediction model for individuals.

Legha, Amardeep; Riley, Richard D; Whittle, Rebecca; Archer, Lucinda; Van Calster, Ben; Snell, Kym I E; Christodoulou, Evangelia; Collins, Gary S; Sadatsafavi, Mohsen; Ensor, Joie

doi:10.1016/j.jclinepi.2025.112117

Items
Marc 21

001			307383
005			20251223120207.0
024	7	_	\|a 10.1016/j.jclinepi.2025.112117 \|2 doi
024	7	_	\|a pmid:41423140 \|2 pmid
024	7	_	\|a 0895-4356 \|2 ISSN
024	7	_	\|a 1878-5921 \|2 ISSN
037	_	_	\|a DKFZ-2025-03027
041	_	_	\|a English
082	_	_	\|a 610
100	1	_	\|a Legha, Amardeep \|b 0
245	_	_	\|a Sequential sample size calculations and learning curves safeguard the robust development of a clinical prediction model for individuals.
260	_	_	\|a Amsterdam [u.a.] \|c 2025 \|b Elsevier Science
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1766414821_3634864 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
500	_	_	\|a epub
520	_	_	\|a When recruiting participants to a new study developing a clinical prediction model (CPM), sample size calculations are typically conducted before data collection based on sensible assumptions. This leads to a fixed sample size, but if the assumptions are inaccurate, the actual sample size required to develop a reliable model may be higher or even lower. To safeguard against this, adaptive sample size approaches have been proposed, based on sequential evaluation of (changes in) a model's predictive performance.To illustrate and extend sequential sample size calculations for CPM development by (i) proposing stopping rules for prospective data collection based on minimising uncertainty (instability) and misclassification of individual-level predictions, and (ii) showcasing how it safeguards against inaccurate fixed sample size calculations.Using the sequential approach repeats the pre-defined model development strategy every time a chosen number (e.g., 100) of participants are recruited and adequately followed up. At each stage, CPM performance is evaluated using bootstrapping, leading to prediction and classification stability statistics and plots, alongside optimism-adjusted measures of calibration and discrimination. Learning curves display the trend of results against sample size and recruitment is stopped when a chosen stopping rule is met.Our approach is illustrated for model development of acute kidney injury using (penalised) logistic regression CPMs. Prior to recruitment based on perceived sensible assumptions, the fixed sample size calculation suggests recruiting 342 patients to minimise overfitting; however, during data collection the sequential approach reveals that a much larger sample size of 1100 is required to minimise overfitting (targeting a bootstrap-corrected calibration slope ≥0.9). If the stopping rule criteria also target small uncertainty and misclassification probability of individual predictions, the sequential approach suggests an even larger sample size of about n=1800.For CPM development studies involving prospective data collection, a sequential sample size approach allows users to dynamically monitor individual-level prediction and classification instability. This helps determine when enough participants have been recruited and safeguards against using inaccurate assumptions in a sample size calculation prior to data recruitment. Engagement with patients and other stakeholders is crucial to identify sensible context-specific stopping rules for robust individual predictions.
536	_	_	\|a 315 - Bildgebung und Radioonkologie (POF4-315) \|0 G:(DE-HGF)POF4-315 \|c POF4-315 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
650	_	7	\|a Clinical Prediction Models \|2 Other
650	_	7	\|a Instability \|2 Other
650	_	7	\|a Learning Curves \|2 Other
650	_	7	\|a Model Development \|2 Other
650	_	7	\|a Sample Size \|2 Other
650	_	7	\|a Sequential \|2 Other
650	_	7	\|a Uncertainty \|2 Other
700	1	_	\|a Ensor, Joie \|b 1
700	1	_	\|a Whittle, Rebecca \|b 2
700	1	_	\|a Archer, Lucinda \|b 3
700	1	_	\|a Van Calster, Ben \|b 4
700	1	_	\|a Christodoulou, Evangelia \|0 P:(DE-He78)8da2eca0bc6341c8681c317fe2b8e27b \|b 5 \|u dkfz
700	1	_	\|a Snell, Kym I E \|b 6
700	1	_	\|a Sadatsafavi, Mohsen \|b 7
700	1	_	\|a Collins, Gary S \|b 8
700	1	_	\|a Riley, Richard D \|b 9
773	_	_	\|a 10.1016/j.jclinepi.2025.112117 \|g p. 112117 - \|0 PERI:(DE-600)1500490-9 \|p nn \|t Journal of clinical epidemiology \|v nn \|y 2025 \|x 0895-4356
909	C	O	\|o oai:inrepo02.dkfz.de:307383 \|p VDB
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 5 \|6 P:(DE-He78)8da2eca0bc6341c8681c317fe2b8e27b
913	1	_	\|a DE-HGF \|b Gesundheit \|l Krebsforschung \|1 G:(DE-HGF)POF4-310 \|0 G:(DE-HGF)POF4-315 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-300 \|4 G:(DE-HGF)POF \|v Bildgebung und Radioonkologie \|x 0
914	1	_	\|y 2025
915	_	_	\|a Nationallizenz \|0 StatID:(DE-HGF)0420 \|2 StatID \|d 2024-12-11 \|w ger
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Clarivate Analytics Master Journal List \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0160 \|2 StatID \|b Essential Science Indicators \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1030 \|2 StatID \|b Current Contents - Life Sciences \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1110 \|2 StatID \|b Current Contents - Clinical Medicine \|d 2024-12-11
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0113 \|2 StatID \|b Science Citation Index Expanded \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection \|d 2024-12-11
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b J CLIN EPIDEMIOL : 2022 \|d 2024-12-11
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0600 \|2 StatID \|b Ebsco Academic Search \|d 2024-12-11
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b ASC \|d 2024-12-11
915	_	_	\|a IF >= 5 \|0 StatID:(DE-HGF)9905 \|2 StatID \|b J CLIN EPIDEMIOL : 2022 \|d 2024-12-11
920	1	_	\|0 I:(DE-He78)E130-20160331 \|k E130 \|l E130 Intelligente Medizinische Systeme \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-He78)E130-20160331
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help