Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.

Sill, Martin; Benner, Axel; Saadati, Maral
doi:10.1093/bioinformatics/btv197
000127522 001__ 127522
000127522 005__ 20240228140933.0
000127522 0247_ $$2doi$$a10.1093/bioinformatics/btv197
000127522 0247_ $$2pmid$$apmid:25861969
000127522 0247_ $$2pmc$$apmc:PMC4528629
000127522 0247_ $$2ISSN$$a0266-7061
000127522 0247_ $$2ISSN$$a1367-4803
000127522 0247_ $$2ISSN$$a1367-4811
000127522 0247_ $$2ISSN$$a1460-2059
000127522 0247_ $$2altmetric$$aaltmetric:3898206
000127522 037__ $$aDKFZ-2017-03545
000127522 041__ $$aeng
000127522 082__ $$a004
000127522 1001_ $$0P:(DE-He78)45440b44791309bd4b7dbb4f73333f9b$$aSill, Martin$$b0$$eFirst author$$udkfz
000127522 245__ $$aApplying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.
000127522 260__ $$aOxford$$bOxford Univ. Press$$c2015
000127522 3367_ $$2DRIVER$$aarticle
000127522 3367_ $$2DataCite$$aOutput Types/Journal article
000127522 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1521712773_2109
000127522 3367_ $$2BibTeX$$aARTICLE
000127522 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000127522 3367_ $$00$$2EndNote$$aJournal Article
000127522 520__ $$aPrincipal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of features contribute to a principal component (PC), this estimation consistency can be retained. Most existing sparse PCA methods use L1-penalization, i.e. the lasso, to perform feature selection. But, the lasso is known to lack variable selection consistency in high dimensions and therefore a subsequent interpretation of selected features can give misleading results.We present S4VDPCA, a sparse PCA method that incorporates a subsampling approach, namely stability selection. S4VDPCA can consistently select the truly relevant variables contributing to a sparse PC while also consistently estimate the direction of maximal variability. The performance of the S4VDPCA is assessed in a simulation study and compared to other PCA approaches, as well as to a hypothetical oracle PCA that knows the truly relevant features in advance and thus finds optimal, unbiased sparse PCs. S4VDPCA is computationally efficient and performs best in simulations regarding parameter estimation consistency and feature selection consistency. Furthermore, S4VDPCA is applied to a publicly available gene expression data set of medulloblastoma brain tumors. Features contributing to the first two estimated sparse PCs represent genes significantly over-represented in pathways typically deregulated between molecular subgroups of medulloblastoma.Software is available at https://github.com/mwsill/s4vdpca.m.sill@dkfz.deSupplementary data are available at Bioinformatics online.
000127522 536__ $$0G:(DE-HGF)POF3-313$$a313 - Cancer risk factors and prevention (POF3-313)$$cPOF3-313$$fPOF III$$x0
000127522 588__ $$aDataset connected to CrossRef, PubMed,
000127522 7001_ $$0P:(DE-He78)609d3f1c1420bf59b2332eeab889cb74$$aSaadati, Maral$$b1$$udkfz
000127522 7001_ $$0P:(DE-He78)e15dfa1260625c69d6690a197392a994$$aBenner, Axel$$b2$$eLast author$$udkfz
000127522 773__ $$0PERI:(DE-600)1468345-3$$a10.1093/bioinformatics/btv197$$gVol. 31, no. 16, p. 2683 - 2690$$n16$$p2683 - 2690$$tBioinformatics$$v31$$x1460-2059$$y2015
000127522 909CO $$ooai:inrepo02.dkfz.de:127522$$pVDB
000127522 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)45440b44791309bd4b7dbb4f73333f9b$$aDeutsches Krebsforschungszentrum$$b0$$kDKFZ
000127522 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)609d3f1c1420bf59b2332eeab889cb74$$aDeutsches Krebsforschungszentrum$$b1$$kDKFZ
000127522 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)e15dfa1260625c69d6690a197392a994$$aDeutsches Krebsforschungszentrum$$b2$$kDKFZ
000127522 9131_ $$0G:(DE-HGF)POF3-313$$1G:(DE-HGF)POF3-310$$2G:(DE-HGF)POF3-300$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lKrebsforschung$$vCancer risk factors and prevention$$x0
000127522 9141_ $$y2015
000127522 915__ $$0StatID:(DE-HGF)0400$$2StatID$$aAllianz-Lizenz / DFG
000127522 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz
000127522 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000127522 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000127522 915__ $$0StatID:(DE-HGF)0310$$2StatID$$aDBCoverage$$bNCBI Molecular Biology Database
000127522 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bBIOINFORMATICS : 2015
000127522 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search
000127522 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC
000127522 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000127522 915__ $$0StatID:(DE-HGF)0110$$2StatID$$aWoS$$bScience Citation Index
000127522 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000127522 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000127522 915__ $$0StatID:(DE-HGF)1030$$2StatID$$aDBCoverage$$bCurrent Contents - Life Sciences
000127522 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews
000127522 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bBIOINFORMATICS : 2015
000127522 9201_ $$0I:(DE-He78)C060-20160331$$kC060$$lBiostatistik$$x0
000127522 980__ $$ajournal
000127522 980__ $$aVDB
000127522 980__ $$aI:(DE-He78)C060-20160331
000127522 980__ $$aUNRESTRICTED
guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help