Journal Article DKFZ-2026-01219

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Classifying Clinical Evidence Levels of Cancer Variants in Biomedical Literature Using Machine Learning and Large Language Models.

 ;  ;  ;  ;

2026
IOS Press Amsterdam

Studies in health technology and informatics 336, 854-858 () [DOI:10.3233/SHTI260300]  GO

Abstract: Automating the classification of clinical evidence levels in biomedical literature can support precision oncology by facilitating the acceleration of variant interpretation and informed decision-making. This study compares the performance of two state-of-the-art large language models (LLMs) (GPT-4.1-mini and Gemini-2.5-Flash) and two machine learning (ML) algorithms (decision tree and XGBoost) for classifying publications according to the Clinical Interpretation of Variants in Cancer (CIViC) evidence level system. Zero- and few-shot prompting strategies were tested for LLMs, while Term Frequency-Inverse Document Frequency (TF-IDF) and word embedding representations were evaluated for ML models. XGBoost with TF-IDF achieved the highest performance (micro-F1 = 0.83), outperforming both LLMs and decision trees. All models performed best on mid-range evidence levels (B to D) and struggled with high (A) and inferential (E) levels, reflecting dataset imbalance and linguistic ambiguity. These findings suggest that, at present, abstract-level evidence classification is largely driven by explicit lexical cues, with limited added benefit from standalone LLM-based approaches.

Keyword(s): Machine Learning (MeSH) ; Humans (MeSH) ; Neoplasms: genetics (MeSH) ; Neoplasms: classification (MeSH) ; Neoplasms: diagnosis (MeSH) ; Natural Language Processing (MeSH) ; Data Mining: methods (MeSH) ; Algorithms (MeSH) ; Large Language Models (MeSH) ; Clinical Evidence Level ; Large Language Models ; Machine Learning ; Text Classification

Classification:

Note: ISSN:0926-9630

Contributing Institute(s):
  1. Clinical Trial Office (M130)
Research Program(s):
  1. 319H - Addenda (POF4-319H) (POF4-319H)

Appears in the scientific report 2026
Database coverage:
Medline ; NCBI Molecular Biology Database ; SCOPUS
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2026-05-26, last modified 2026-05-27



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)