Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.

Sill, Martin; Benner, Axel; Saadati, Maral

doi:10.1093/bioinformatics/btv197

Items
Marc 21

001			127522
005			20240228140933.0
024	7	_	\|a 10.1093/bioinformatics/btv197 \|2 doi
024	7	_	\|a pmid:25861969 \|2 pmid
024	7	_	\|a pmc:PMC4528629 \|2 pmc
024	7	_	\|a 0266-7061 \|2 ISSN
024	7	_	\|a 1367-4803 \|2 ISSN
024	7	_	\|a 1367-4811 \|2 ISSN
024	7	_	\|a 1460-2059 \|2 ISSN
024	7	_	\|a altmetric:3898206 \|2 altmetric
037	_	_	\|a DKFZ-2017-03545
041	_	_	\|a eng
082	_	_	\|a 004
100	1	_	\|a Sill, Martin \|0 P:(DE-He78)45440b44791309bd4b7dbb4f73333f9b \|b 0 \|e First author \|u dkfz
245	_	_	\|a Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.
260	_	_	\|a Oxford \|c 2015 \|b Oxford Univ. Press
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1521712773_2109 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
520	_	_	\|a Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of features contribute to a principal component (PC), this estimation consistency can be retained. Most existing sparse PCA methods use L1-penalization, i.e. the lasso, to perform feature selection. But, the lasso is known to lack variable selection consistency in high dimensions and therefore a subsequent interpretation of selected features can give misleading results.We present S4VDPCA, a sparse PCA method that incorporates a subsampling approach, namely stability selection. S4VDPCA can consistently select the truly relevant variables contributing to a sparse PC while also consistently estimate the direction of maximal variability. The performance of the S4VDPCA is assessed in a simulation study and compared to other PCA approaches, as well as to a hypothetical oracle PCA that knows the truly relevant features in advance and thus finds optimal, unbiased sparse PCs. S4VDPCA is computationally efficient and performs best in simulations regarding parameter estimation consistency and feature selection consistency. Furthermore, S4VDPCA is applied to a publicly available gene expression data set of medulloblastoma brain tumors. Features contributing to the first two estimated sparse PCs represent genes significantly over-represented in pathways typically deregulated between molecular subgroups of medulloblastoma.Software is available at https://github.com/mwsill/s4vdpca.m.sill@dkfz.deSupplementary data are available at Bioinformatics online.
536	_	_	\|a 313 - Cancer risk factors and prevention (POF3-313) \|0 G:(DE-HGF)POF3-313 \|c POF3-313 \|f POF III \|x 0
588	_	_	\|a Dataset connected to CrossRef, PubMed,
700	1	_	\|a Saadati, Maral \|0 P:(DE-He78)609d3f1c1420bf59b2332eeab889cb74 \|b 1 \|u dkfz
700	1	_	\|a Benner, Axel \|0 P:(DE-He78)e15dfa1260625c69d6690a197392a994 \|b 2 \|e Last author \|u dkfz
773	_	_	\|a 10.1093/bioinformatics/btv197 \|g Vol. 31, no. 16, p. 2683 - 2690 \|0 PERI:(DE-600)1468345-3 \|n 16 \|p 2683 - 2690 \|t Bioinformatics \|v 31 \|y 2015 \|x 1460-2059
909	C	O	\|o oai:inrepo02.dkfz.de:127522 \|p VDB
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 0 \|6 P:(DE-He78)45440b44791309bd4b7dbb4f73333f9b
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 1 \|6 P:(DE-He78)609d3f1c1420bf59b2332eeab889cb74
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 2 \|6 P:(DE-He78)e15dfa1260625c69d6690a197392a994
913	1	_	\|a DE-HGF \|l Krebsforschung \|1 G:(DE-HGF)POF3-310 \|0 G:(DE-HGF)POF3-313 \|2 G:(DE-HGF)POF3-300 \|v Cancer risk factors and prevention \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|b Gesundheit
914	1	_	\|y 2015
915	_	_	\|a Allianz-Lizenz / DFG \|0 StatID:(DE-HGF)0400 \|2 StatID
915	_	_	\|a Nationallizenz \|0 StatID:(DE-HGF)0420 \|2 StatID
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0310 \|2 StatID \|b NCBI Molecular Biology Database
915	_	_	\|a JCR \|0 StatID:(DE-HGF)0100 \|2 StatID \|b BIOINFORMATICS : 2015
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0600 \|2 StatID \|b Ebsco Academic Search
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b ASC
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Thomson Reuters Master Journal List
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0110 \|2 StatID \|b Science Citation Index
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0111 \|2 StatID \|b Science Citation Index Expanded
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1030 \|2 StatID \|b Current Contents - Life Sciences
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1050 \|2 StatID \|b BIOSIS Previews
915	_	_	\|a IF >= 5 \|0 StatID:(DE-HGF)9905 \|2 StatID \|b BIOINFORMATICS : 2015
920	1	_	\|0 I:(DE-He78)C060-20160331 \|k C060 \|l Biostatistik \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-He78)C060-20160331
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help