LLM-powered breast cancer staging from PET/CT reports: a comparative performance study.

Spitzl, Daniel; Steinhelfer, Lisa; Eiber, Matthias; Endrös, Lukas; Braren, Rickmer; Mergen, Markus

doi:10.1016/j.ijmedinf.2025.106053

TY  - JOUR
AU  - Spitzl, Daniel
AU  - Mergen, Markus
AU  - Braren, Rickmer
AU  - Endrös, Lukas
AU  - Eiber, Matthias
AU  - Steinhelfer, Lisa
TI  - LLM-powered breast cancer staging from PET/CT reports: a comparative performance study.
JO  - International journal of medical informatics
VL  - 204
SN  - 1386-5056
CY  - Amsterdam [u.a.]
PB  - Elsevier
M1  - DKFZ-2025-01532
SP  - 106053
PY  - 2025
AB  - Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.Here, we evaluated four LLMs-ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash-for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model's outputs were measured against expert-generated TNM classifications and stage categorizations.Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95
KW  - Artificial intelligence (Other)
KW  - Breast cancer (Other)
KW  - Clinical decision support (Other)
KW  - Diagnostics (Other)
LB  - PUB:(DE-HGF)16
C6  - pmid:40706196
DO  - DOI:10.1016/j.ijmedinf.2025.106053
UR  - https://inrepo02.dkfz.de/record/303113
ER  -

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help