000301753 001__ 301753
000301753 005__ 20250604113441.0
000301753 0247_ $$2doi$$a10.1371/journal.pone.0322887
000301753 0247_ $$2pmid$$apmid:40455868
000301753 037__ $$aDKFZ-2025-01137
000301753 041__ $$aEnglish
000301753 082__ $$a610
000301753 1001_ $$00009-0002-0711-6789$$aStolte, Marieke$$b0
000301753 245__ $$aSimulation study to evaluate when Plasmode simulation is superior to parametric simulation in comparing classification methods on high-dimensional data.
000301753 260__ $$aSan Francisco, California, US$$bPLOS$$c2025
000301753 3367_ $$2DRIVER$$aarticle
000301753 3367_ $$2DataCite$$aOutput Types/Journal article
000301753 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1748954505_28541
000301753 3367_ $$2BibTeX$$aARTICLE
000301753 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000301753 3367_ $$00$$2EndNote$$aJournal Article
000301753 520__ $$aSimulation studies, especially neutral comparison studies, are crucial for evaluating and comparing statistical methods as they investigate whether methods work as intended and can guide an appropriate method choice. Typically, the term simulation refers to parametric simulation, i.e. computer experiments using pseudo-random numbers. For these, the full data-generating process (DGP) and outcome-generating model (OGM) are known within the simulation. However, the specification of realistic DGPs might be difficult in practice leading to oversimplified assumptions. The problem is more severe for higher-dimensional data as the number of parameters to specify typically increases with the number of variables in the data. Plasmode simulation, which is a combination of resampling covariates from a real-life dataset from the DGP of interest together with a specified OGM is often claimed to solve this problem since no explicit specification of the DGP is necessary. However, this claim is not well supported by empirical results. Here, parametric and Plasmode simulations are compared in the context of a method comparison study for binary classification methods. We focus on studies conducted with some specific data type or application in mind whose true, unknown data-generating mechanism is mimicked. The performance of Plasmode and parametric comparison studies for estimating classifier performance is compared as well as their ability to reproduce the true method ranking. The influence of misspecifications of the DGP on the results of parametric simulation and of misspecifications of the OGM on the results of parametric and Plasmode simulation are investigated. Moreover, different resampling strategies are compared for Plasmode comparison studies. The study finds that misspecifications of the DGP and OGM negatively influence the ability of the comparison studies to estimate the classification performances and method rankings. The best choice of the resampling strategy in Plasmode simulation depends on the concrete scenario.
000301753 536__ $$0G:(DE-HGF)POF4-313$$a313 - Krebsrisikofaktoren und Prävention (POF4-313)$$cPOF4-313$$fPOF IV$$x0
000301753 588__ $$aDataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
000301753 650_2 $$2MeSH$$aComputer Simulation
000301753 650_2 $$2MeSH$$aModels, Statistical
000301753 650_2 $$2MeSH$$aHumans
000301753 650_2 $$2MeSH$$aAlgorithms
000301753 7001_ $$0P:(DE-He78)0d054b6843ace36d1c965b6cb938d1c9$$aSchreck, Nicholas$$b1$$udkfz
000301753 7001_ $$aSlynko, Alla$$b2
000301753 7001_ $$0P:(DE-He78)609d3f1c1420bf59b2332eeab889cb74$$aSaadati, Maral$$b3$$udkfz
000301753 7001_ $$0P:(DE-He78)e15dfa1260625c69d6690a197392a994$$aBenner, Axel$$b4$$udkfz
000301753 7001_ $$aRahnenführer, Jörg$$b5
000301753 7001_ $$aBommert, Andrea$$b6
000301753 7001_ $$adata”, topic group “High-dimensional$$b7$$eCollaboration Author
000301753 773__ $$0PERI:(DE-600)2267670-3$$a10.1371/journal.pone.0322887$$gVol. 20, no. 6, p. e0322887 -$$n6$$pe0322887 -$$tPLOS ONE$$v20$$x1932-6203$$y2025
000301753 909CO $$ooai:inrepo02.dkfz.de:301753$$pVDB
000301753 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)0d054b6843ace36d1c965b6cb938d1c9$$aDeutsches Krebsforschungszentrum$$b1$$kDKFZ
000301753 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)609d3f1c1420bf59b2332eeab889cb74$$aDeutsches Krebsforschungszentrum$$b3$$kDKFZ
000301753 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)e15dfa1260625c69d6690a197392a994$$aDeutsches Krebsforschungszentrum$$b4$$kDKFZ
000301753 9131_ $$0G:(DE-HGF)POF4-313$$1G:(DE-HGF)POF4-310$$2G:(DE-HGF)POF4-300$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lKrebsforschung$$vKrebsrisikofaktoren und Prävention$$x0
000301753 9141_ $$y2025
000301753 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bPLOS ONE : 2022$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2024-02-08T09:37:46Z
000301753 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2024-02-08T09:37:46Z
000301753 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Anonymous peer review$$d2024-02-08T09:37:46Z
000301753 915__ $$0LIC:(DE-HGF)CCBYNV$$2V:(DE-HGF)$$aCreative Commons Attribution CC BY (No Version)$$bDOAJ$$d2024-02-08T09:37:46Z
000301753 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)1040$$2StatID$$aDBCoverage$$bZoological Record$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2024-12-16
000301753 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2024-12-16
000301753 9201_ $$0I:(DE-He78)C060-20160331$$kC060$$lC060 Biostatistik$$x0
000301753 980__ $$ajournal
000301753 980__ $$aVDB
000301753 980__ $$aI:(DE-He78)C060-20160331
000301753 980__ $$aUNRESTRICTED