%0 Journal Article
%A Mate, Sebastian
%A Kampf, Marvin
%A Rödle, Wolfgang
%A Kraus, Stefan
%A Proynova, Rumyana
%A Silander, Kaisa
%A Ebert, Lars
%A Lablans, Martin
%A Schüttler, Christina
%A Knell, Christian
%A Eklund, Niina
%A Hummel, Michael
%A Holub, Petr
%A Prokosch, Hans-Ulrich
%T Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC.
%J Applied clinical informatics
%V 10
%N 4
%@ 1869-0327
%C Stuttgart
%I Schattauer
%M DKFZ-2019-02225
%P 679 - 692
%D 2019
%X High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48
%F PUB:(DE-HGF)16
%9 Journal Article
%$ pmid:31509880
%2 pmc:PMC6739205
%R 10.1055/s-0039-1695793
%U https://inrepo02.dkfz.de/record/144793