| Home > Publications database > Classifying Clinical Evidence Levels of Cancer Variants in Biomedical Literature Using Machine Learning and Large Language Models. |
| Journal Article | DKFZ-2026-01219 |
; ; ; ;
2026
IOS Press
Amsterdam
Abstract: Automating the classification of clinical evidence levels in biomedical literature can support precision oncology by facilitating the acceleration of variant interpretation and informed decision-making. This study compares the performance of two state-of-the-art large language models (LLMs) (GPT-4.1-mini and Gemini-2.5-Flash) and two machine learning (ML) algorithms (decision tree and XGBoost) for classifying publications according to the Clinical Interpretation of Variants in Cancer (CIViC) evidence level system. Zero- and few-shot prompting strategies were tested for LLMs, while Term Frequency-Inverse Document Frequency (TF-IDF) and word embedding representations were evaluated for ML models. XGBoost with TF-IDF achieved the highest performance (micro-F1 = 0.83), outperforming both LLMs and decision trees. All models performed best on mid-range evidence levels (B to D) and struggled with high (A) and inferential (E) levels, reflecting dataset imbalance and linguistic ambiguity. These findings suggest that, at present, abstract-level evidence classification is largely driven by explicit lexical cues, with limited added benefit from standalone LLM-based approaches.
Keyword(s): Machine Learning (MeSH) ; Humans (MeSH) ; Neoplasms: genetics (MeSH) ; Neoplasms: classification (MeSH) ; Neoplasms: diagnosis (MeSH) ; Natural Language Processing (MeSH) ; Data Mining: methods (MeSH) ; Algorithms (MeSH) ; Large Language Models (MeSH) ; Clinical Evidence Level ; Large Language Models ; Machine Learning ; Text Classification
|
The record appears in these collections: |