LLM-powered breast cancer staging from PET/CT reports: a comparative performance study.

Spitzl, Daniel; Steinhelfer, Lisa; Eiber, Matthias; Endrös, Lukas; Braren, Rickmer; Mergen, Markus

doi:10.1016/j.ijmedinf.2025.106053

%0 Journal Article
%A Spitzl, Daniel
%A Mergen, Markus
%A Braren, Rickmer
%A Endrös, Lukas
%A Eiber, Matthias
%A Steinhelfer, Lisa
%T LLM-powered breast cancer staging from PET/CT reports: a comparative performance study.
%J International journal of medical informatics
%V 204
%@ 1386-5056
%C Amsterdam [u.a.]
%I Elsevier
%M DKFZ-2025-01532
%P 106053
%D 2025
%X Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.Here, we evaluated four LLMs-ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash-for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model's outputs were measured against expert-generated TNM classifications and stage categorizations.Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95
%K Artificial intelligence (Other)
%K Breast cancer (Other)
%K Clinical decision support (Other)
%K Diagnostics (Other)
%F PUB:(DE-HGF)16
%9 Journal Article
%$ pmid:40706196
%R 10.1016/j.ijmedinf.2025.106053
%U https://inrepo02.dkfz.de/record/303113

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help