TY - JOUR
AU - Mate, Sebastian
AU - Kampf, Marvin
AU - Rödle, Wolfgang
AU - Kraus, Stefan
AU - Proynova, Rumyana
AU - Silander, Kaisa
AU - Ebert, Lars
AU - Lablans, Martin
AU - Schüttler, Christina
AU - Knell, Christian
AU - Eklund, Niina
AU - Hummel, Michael
AU - Holub, Petr
AU - Prokosch, Hans-Ulrich
TI - Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC.
JO - Applied clinical informatics
VL - 10
IS - 4
SN - 1869-0327
CY - Stuttgart
PB - Schattauer
M1 - DKFZ-2019-02225
SP - 679 - 692
PY - 2019
AB - High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48
LB - PUB:(DE-HGF)16
C6 - pmid:31509880
C2 - pmc:PMC6739205
DO - DOI:10.1055/s-0039-1695793
UR - https://inrepo02.dkfz.de/record/144793
ER -