TY - JOUR
AU - Lujumba, Ibra
AU - Adam, Yagoub
AU - Ziaei Jam, Helyaneh
AU - Isewon, Itunuoluwa
AU - Monnakgotla, Nomakhosazana
AU - Li, Yang
AU - Onyido, Blessing
AU - Fredrick, Kakembo
AU - Adegoke, Faith
AU - Emmanuel, Jerry
AU - Adeyemi, Jumoke
AU - Ibitoye, Olajumoke
AU - Owusu-Ansah, Samuel
AU - Akanle, Matthew Boladele
AU - Joseph, Habi
AU - Nsubuga, Mike
AU - Galiwango, Ronald
AU - Okitwi, Martin
AU - Magdalene, Namuswe
AU - Walter, Odur
AU - Mngadi, Zama
AU - Adebiyi, Marion
AU - Oyelade, Jelili
AU - Nel, Melissa
AU - Jjingo, Daudi
AU - Gymrek, Melissa
AU - Adebiyi, Ezekiel Femi
TI - A practical guide to identifying associations between tandem repeats and complex human traits using consensus genotypes from multiple tools.
JO - Nature protocols
VL - nn
SN - 1754-2189
CY - Basingstoke
PB - Nature Publishing Group
M1 - DKFZ-2025-01826
SP - nn
PY - 2025
N1 - #LA:B330# / epub
AB - Tandem repeats (TRs) are highly variable loci in the human genome that are linked to various human phenotypes. Accurate and reliable genotyping of TRs is important in understanding population TR variation dynamics and their effects in TR-trait association studies. In this protocol, we describe how to generate high-quality consensus TR genotypes for population genomics studies. In particular, we detail steps to: (i) perform TR genotyping from short-read whole-genome sequencing data by using the HipSTR, GangSTR, adVNTR and ExpansionHunter tools, (ii) perform quality control checks on TR genotypes by using TRTools and (iii) integrate TR genotypes from different tools by using EnsembleTR. We further discuss how to visualize and investigate TR variation patterns to identify population-specific expansions and perform TR-trait association analyses. We demonstrate the utility of these steps by analyzing a small dataset from the 1000 Genomes Project. In addition, we recapitulate a previously identified association between TR length and gene expression in the African population and provide a generalized discussion on TR analysis and its relevance to identifying complex traits. The expected time for installing the necessary software for each section is 10 min. The expected run time on the user's desired dataset can vary from hours to days depending on factors such as the size of the data, input parameters and the capacity of the computing infrastructure.
LB - PUB:(DE-HGF)16
C6 - pmid:40890532
DO - DOI:10.1038/s41596-025-01231-y
UR - https://inrepo02.dkfz.de/record/304286
ER -