TY - JOUR
AU - Moslehi, Zahra
AU - AmeriFar, Sareh
AU - de Azevedo, Kevin
AU - Buettner, Florian
TI - Learning interpretable representations of single-cell multi-omics data with multi-output Gaussian processes.
JO - Nucleic acids research
VL - 53
IS - 14
SN - 0305-1048
CY - Oxford
PB - Oxford Univ. Press
M1 - DKFZ-2025-01521
SP - gkaf630
PY - 2025
N1 - ISSN 1362-4962
AB - Learning representations of single-cell genomics data is challenging due to the nonlinear and often multi-modal nature of the data on one hand and the need for interpretable representations on the other hand. Existing approaches tend to focus either on interpretability aspects via linear matrix factorization or on maximizing expressive power via neural network-based embeddings using black-box variational autoencoders or graph embedding approaches. We address this trade-off between expressive power and interpretability by introducing a novel approach that combines highly expressive representation learning via an embedding layer with interpretable multi-output Gaussian processes within a unified framework. In our model, we learn distinct representations for samples (cells) and features (genes) from multi-modal single-cell data. We demonstrate that even a few interpretable latent dimensions can effectively capture the underlying structure of the data. Our model yields interpretable relationships between groups of cells and their associated marker genes: leveraging a gene relevance map, we establish connections between cell clusters (e.g. specific cell types) and feature clusters (e.g. marker genes for those specific cell types) within the learned latent spaces of cells and features.
KW - Single-Cell Analysis: methods
KW - Normal Distribution
KW - Genomics: methods
KW - Humans
KW - Neural Networks, Computer
KW - Algorithms
KW - Machine Learning
KW - Multiomics
LB - PUB:(DE-HGF)16
C6 - pmid:40694853
DO - DOI:10.1093/nar/gkaf630
UR - https://inrepo02.dkfz.de/record/303096
ER -