Journal Article DKFZ-2025-01137

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Simulation study to evaluate when Plasmode simulation is superior to parametric simulation in comparing classification methods on high-dimensional data.

 ;  ;  ;  ;  ;  ;  ;

2025
PLOS San Francisco, California, US

PLOS ONE 20(6), e0322887 - () [10.1371/journal.pone.0322887]
 GO

This record in other databases:

Please use a persistent id in citations: doi:

Abstract: Simulation studies, especially neutral comparison studies, are crucial for evaluating and comparing statistical methods as they investigate whether methods work as intended and can guide an appropriate method choice. Typically, the term simulation refers to parametric simulation, i.e. computer experiments using pseudo-random numbers. For these, the full data-generating process (DGP) and outcome-generating model (OGM) are known within the simulation. However, the specification of realistic DGPs might be difficult in practice leading to oversimplified assumptions. The problem is more severe for higher-dimensional data as the number of parameters to specify typically increases with the number of variables in the data. Plasmode simulation, which is a combination of resampling covariates from a real-life dataset from the DGP of interest together with a specified OGM is often claimed to solve this problem since no explicit specification of the DGP is necessary. However, this claim is not well supported by empirical results. Here, parametric and Plasmode simulations are compared in the context of a method comparison study for binary classification methods. We focus on studies conducted with some specific data type or application in mind whose true, unknown data-generating mechanism is mimicked. The performance of Plasmode and parametric comparison studies for estimating classifier performance is compared as well as their ability to reproduce the true method ranking. The influence of misspecifications of the DGP on the results of parametric simulation and of misspecifications of the OGM on the results of parametric and Plasmode simulation are investigated. Moreover, different resampling strategies are compared for Plasmode comparison studies. The study finds that misspecifications of the DGP and OGM negatively influence the ability of the comparison studies to estimate the classification performances and method rankings. The best choice of the resampling strategy in Plasmode simulation depends on the concrete scenario.

Keyword(s): Computer Simulation (MeSH) ; Models, Statistical (MeSH) ; Humans (MeSH) ; Algorithms (MeSH)

Classification:

Contributing Institute(s):
  1. C060 Biostatistik (C060)
Research Program(s):
  1. 313 - Krebsrisikofaktoren und Prävention (POF4-313) (POF4-313)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY (No Version) ; DOAJ ; Article Processing Charges ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; DOAJ Seal ; Ebsco Academic Search ; Essential Science Indicators ; Fees ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection ; Zoological Record
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2025-06-03, last modified 2025-06-04



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)