Clustering of samples and variables with mixed-type data.

Hummel, Manuela; Edelmann, Dominic; Kopp-Schneider, Annette
doi:10.1371/journal.pone.0188274
000131053 001__ 131053
000131053 005__ 20240228145601.0
000131053 0247_ $$2doi$$a10.1371/journal.pone.0188274
000131053 0247_ $$2pmid$$apmid:29182671
000131053 037__ $$aDKFZ-2017-06120
000131053 041__ $$aeng
000131053 082__ $$a500
000131053 1001_ $$00000-0002-7870-2227$$aHummel, Manuela$$b0$$eFirst author
000131053 245__ $$aClustering of samples and variables with mixed-type data.
000131053 260__ $$aLawrence, Kan.$$bPLoS$$c2017
000131053 3367_ $$2DRIVER$$aarticle
000131053 3367_ $$2DataCite$$aOutput Types/Journal article
000131053 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1525853598_14346
000131053 3367_ $$2BibTeX$$aARTICLE
000131053 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000131053 3367_ $$00$$2EndNote$$aJournal Article
000131053 520__ $$aAnalysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.
000131053 536__ $$0G:(DE-HGF)POF3-313$$a313 - Cancer risk factors and prevention (POF3-313)$$cPOF3-313$$fPOF III$$x0
000131053 588__ $$aDataset connected to CrossRef, PubMed,
000131053 7001_ $$0P:(DE-He78)92820b4867c955a04f642707ecf35b40$$aEdelmann, Dominic$$b1$$udkfz
000131053 7001_ $$0P:(DE-He78)bb6a7a70f976eb8df1769944bf913596$$aKopp-Schneider, Annette$$b2$$eLast author$$udkfz
000131053 773__ $$0PERI:(DE-600)2267670-3$$a10.1371/journal.pone.0188274$$gVol. 12, no. 11, p. e0188274 -$$n11$$pe0188274 -$$tPLoS one$$v12$$x1932-6203$$y2017
000131053 909CO $$ooai:inrepo02.dkfz.de:131053$$pVDB
000131053 9101_ $$0I:(DE-588b)2036810-0$$60000-0002-7870-2227$$aDeutsches Krebsforschungszentrum$$b0$$kDKFZ
000131053 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)92820b4867c955a04f642707ecf35b40$$aDeutsches Krebsforschungszentrum$$b1$$kDKFZ
000131053 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-He78)bb6a7a70f976eb8df1769944bf913596$$aDeutsches Krebsforschungszentrum$$b2$$kDKFZ
000131053 9131_ $$0G:(DE-HGF)POF3-313$$1G:(DE-HGF)POF3-310$$2G:(DE-HGF)POF3-300$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bGesundheit$$lKrebsforschung$$vCancer risk factors and prevention$$x0
000131053 9141_ $$y2017
000131053 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bPLOS ONE : 2015
000131053 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000131053 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000131053 915__ $$0StatID:(DE-HGF)0310$$2StatID$$aDBCoverage$$bNCBI Molecular Biology Database
000131053 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal
000131053 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ
000131053 915__ $$0LIC:(DE-HGF)CCBYNV$$2V:(DE-HGF)$$aCreative Commons Attribution CC BY (No Version)$$bDOAJ
000131053 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search
000131053 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC
000131053 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000131053 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000131053 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000131053 915__ $$0StatID:(DE-HGF)1040$$2StatID$$aDBCoverage$$bZoological Record
000131053 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews
000131053 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5
000131053 9201_ $$0I:(DE-He78)C060-20160331$$kC060$$lBiostatistik$$x0
000131053 980__ $$ajournal
000131053 980__ $$aVDB
000131053 980__ $$aI:(DE-He78)C060-20160331
000131053 980__ $$aUNRESTRICTED
guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help