TY  - JOUR
AU  - Aybey, Bogac
AU  - Zhao, Sheng
AU  - Brors, Benedikt
AU  - Staub, Eike
TI  - Immune cell type signature discovery and random forest classification for analysis of single cell gene expression datasets.
JO  - Frontiers in immunology
VL  - 14
SN  - 1664-3224
CY  - Lausanne
PB  - Frontiers Media
M1  - DKFZ-2023-01700
SP  - 1194745
PY  - 2023
AB  - Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches.We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment.We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.
KW  - cell clustering (Other)
KW  - cell type classification (Other)
KW  - gene signature discovery (Other)
KW  - machine learning (Other)
KW  - single-cell RNA sequencing (Other)
KW  - tumor microenvironment (Other)
LB  - PUB:(DE-HGF)16
C6  - pmid:37609075
C2  - pmc:PMC10441575
DO  - DOI:10.3389/fimmu.2023.1194745
UR  - https://inrepo02.dkfz.de/record/278729
ER  -