Journal Article DKFZ-2019-02018

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
RF_Purify: a novel tool for comprehensive analysis of tumor-purity in methylation array data based on random forest regression.

 ;  ;  ;

2019
Springer Heidelberg

BMC bioinformatics 20(1), 428 () [10.1186/s12859-019-3014-z]
 GO

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: With the advent of array-based techniques to measure methylation levels in primary tumor samples, systematic investigations of methylomes have widely been performed on a large number of tumor entities. Most of these approaches are not based on measuring individual cell methylation but rather the bulk tumor sample DNA, which contains a mixture of tumor cells, infiltrating immune cells and other stromal components. This raises questions about the purity of a certain tumor sample, given the varying degrees of stromal infiltration in different entities. Previous methods to infer tumor purity require or are based on the use of matching control samples which are rarely available. Here we present a novel, reference free method to quantify tumor purity, based on two Random Forest classifiers, which were trained on ABSOLUTE as well as ESTIMATE purity values from TCGA tumor samples. We subsequently apply this method to a previously published, large dataset of brain tumors, proving that these models perform well in datasets that have not been characterized with respect to tumor purity .Using two gold standard methods to infer purity - the ABSOLUTE score based on whole genome sequencing data and the ESTIMATE score based on gene expression data- we have optimized Random Forest classifiers to predict tumor purity in entities that were contained in the TCGA project. We validated these classifiers using an independent test data set and cross-compared it to other methods which have been applied to the TCGA datasets (such as ESTIMATE and LUMP). Using Illumina methylation array data of brain tumor entities (as published in Capper et al. (Nature 555:469-474,2018)) we applied this model to estimate tumor purity and find that subgroups of brain tumors display substantial differences in tumor purity.Random forest- based tumor purity prediction is a well suited tool to extrapolate gold standard measures of purity to novel methylation array datasets. In contrast to other available methylation based tumor purity estimation methods, our classifiers do not need a priori knowledge about the tumor entity or matching control tissue to predict tumor purity.

Classification:

Contributing Institute(s):
  1. Pädiatrische Neuroonkologie (B062)
  2. DKTK Heidelberg (L101)
Research Program(s):
  1. 312 - Functional and structural genomics (POF3-312) (POF3-312)

Appears in the scientific report 2019
Database coverage:
Medline ; Creative Commons Attribution CC BY (No Version) ; DOAJ ; BIOSIS Previews ; Clarivate Analytics Master Journal List ; DOAJ Seal ; Ebsco Academic Search ; IF < 5 ; JCR ; NCBI Molecular Biology Database ; PubMed Central ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2019-08-22, last modified 2024-02-29


Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)