TY  - JOUR
AU  - Li, Cheng-Peng
AU  - Kalisa, Aimé Terence
AU  - Roohani, Siyer
AU  - Hummedah, Kamal
AU  - Menge, Franka
AU  - Reißfelder, Christoph
AU  - Albertsmeier, Markus
AU  - Kasper, Bernd
AU  - Jakob, Jens
AU  - Yang, Cui
TI  - The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial.
JO  - Journal of cancer research and clinical oncology
VL  - 151
IS  - 9
SN  - 0301-1585
CY  - Heidelberg
PB  - Springer
M1  - DKFZ-2025-01872
SP  - 248
PY  - 2025
AB  - The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.5 Sonnet, DeepSeek-R1, and OpenAI-o1 across five anonymized STS cases from the sarcoma ring trial. Each model was queried 21 times per case using a standardized prompt, and the responses were compared with human MTBs in terms of intra-model consistency, treatment recommendation alignment, alternative recommendations, and source citation.LLMs demonstrated high inter-model and intra-model consistency in only 20
KW  - Humans
KW  - Sarcoma: therapy
KW  - Sarcoma: pathology
KW  - Benchmarking: methods
KW  - Cancer Care Facilities
KW  - Language
KW  - Large Language Models
KW  - Artificial intelligence (Other)
KW  - Clinical decision (Other)
KW  - Large language model (Other)
KW  - Multidisciplinary tumor board (Other)
KW  - Soft tissue sarcoma (Other)
LB  - PUB:(DE-HGF)16
C6  - pmid:40926110
DO  - DOI:10.1007/s00432-025-06304-9
UR  - https://inrepo02.dkfz.de/record/304480
ER  -