TY  - JOUR
AU  - Mate, Sebastian
AU  - Kampf, Marvin
AU  - Rödle, Wolfgang
AU  - Kraus, Stefan
AU  - Proynova, Rumyana
AU  - Silander, Kaisa
AU  - Ebert, Lars
AU  - Lablans, Martin
AU  - Schüttler, Christina
AU  - Knell, Christian
AU  - Eklund, Niina
AU  - Hummel, Michael
AU  - Holub, Petr
AU  - Prokosch, Hans-Ulrich
TI  - Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC.
JO  - Applied clinical informatics
VL  - 10
IS  - 4
SN  - 1869-0327
CY  - Stuttgart
PB  - Schattauer
M1  - DKFZ-2019-02225
SP  - 679 - 692
PY  - 2019
AB  - High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48
LB  - PUB:(DE-HGF)16
C6  - pmid:31509880
C2  - pmc:PMC6739205
DO  - DOI:10.1055/s-0039-1695793
UR  - https://inrepo02.dkfz.de/record/144793
ER  -