TY - JOUR
AU - Spitzl, Daniel
AU - Mergen, Markus
AU - Braren, Rickmer
AU - Endrös, Lukas
AU - Eiber, Matthias
AU - Steinhelfer, Lisa
TI - LLM-powered breast cancer staging from PET/CT reports: a comparative performance study.
JO - International journal of medical informatics
VL - 204
SN - 1386-5056
CY - Amsterdam [u.a.]
PB - Elsevier
M1 - DKFZ-2025-01532
SP - 106053
PY - 2025
AB - Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.Here, we evaluated four LLMs-ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash-for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model's outputs were measured against expert-generated TNM classifications and stage categorizations.Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95
KW - Artificial intelligence (Other)
KW - Breast cancer (Other)
KW - Clinical decision support (Other)
KW - Diagnostics (Other)
LB - PUB:(DE-HGF)16
C6 - pmid:40706196
DO - DOI:10.1016/j.ijmedinf.2025.106053
UR - https://inrepo02.dkfz.de/record/303113
ER -