In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature.

Kebede, Mihiretu; Le Cornet, Charlotte; Turzanski-Fortner, Renée

doi:10.1002/jrsm.1589

Journal Article

DKFZ-2022-01433

In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature.

Kebede, M. (First author)DKFZ* ; Le Cornet, C.DKFZ* ; Turzanski-Fortner, R. (Last author)DKFZ*

2023
Programa de Estudos Pós-Graduados em História Sao Paulo

Cordis 14(2), 156-172 (2023) [10.1002/jrsm.1589]

This record in other databases:

Please use a persistent id in citations: doi:10.1002/jrsm.1589

Abstract: We aimed to evaluate the performance of supervised machine learning algorithms in predicting articles relevant for full-text review in a systematic review. Overall, 16,430 manually screened titles/abstracts, including 861 references identified relevant for full-text review were used for the analysis. Of these, 40% (n=6573) were sub-divided for training (70%) and testing (30%) the algorithms. The remaining 60% (n=9857) were used as a validation set. We evaluated down- and up-sampling methods and compared unigram, bigram, and singular value decomposition (SVD) approaches. For each approach, Naïve Bayes, Support Vector Machines (SVM), regularized logistic regressions, Neural Networks, random forest, Logit boost, and XGBoost were implemented using simple term frequency or Tf-Idf feature representations. Performance was evaluated using sensitivity, specificity, precision and area under the Curve. We combined predictions of best-performing algorithms (Youden Index ≥0.3 with sensitivity/specificity≥70/60%). In down sample unigram approach, Naïve Bayes, SVM/quanteda text models with Tf-Idf, and linear SVM e1071 package with Tf-Idf achieved >90% sensitivity at specificity >65%. Combining the predictions of the 10 best-performing algorithms improved the performance to reach 95% sensitivity and 64% specificity in the validation set. Crude screening burden was reduced by 61% (5979) (adjusted: 80.3%) with 5% (27) false negativity rate. All the other approaches yielded relatively poorer performances. The down sampling unigram approach achieved good performance in our data. Combining the predictions of algorithms improved sensitivity while screening burden was reduced by almost two-third. Implementing machine learning approaches in title/abstract screening should be investigated further toward refining these tools and automating their implementation. This article is protected by copyright. All rights reserved.

Keyword(s): Automated screening ; Citation Screening ; Machine Learning ; NLP ; Natural Language Processing ; Systematic review ; Text mining

Classification:

ddc:900

Note: #EA:C020#LA:C020# / 2023 Mar;14(2):156-172

Contributing Institute(s):

C020 Epidemiologie von Krebs (C020)

Research Program(s):

313 - Krebsrisikofaktoren und Prävention (POF4-313) (POF4-313)

Appears in the scientific report 2022

Database coverage:
Medline

; Clarivate Analytics Master Journal List ; Current Contents - Clinical Medicine ; DEAL Wiley ; Ebsco Academic Search ; Essential Science Indicators ; IF >= 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > C020
Public records
Publication Charges
Publications database

Record created 2022-07-08, last modified 2024-12-20

Similar records

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help