The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial.

Li, Cheng-Peng; Albertsmeier, Markus; Reißfelder, Christoph; Hummedah, Kamal; Kalisa, Aimé Terence; Menge, Franka; Roohani, Siyer; Yang, Cui; Kasper, Bernd; Jakob, Jens

doi:10.1007/s00432-025-06304-9

Items
Marc 21

001			304480
005			20250914022644.0
024	7	_	\|a 10.1007/s00432-025-06304-9 \|2 doi
024	7	_	\|a pmid:40926110 \|2 pmid
024	7	_	\|a 0301-1585 \|2 ISSN
024	7	_	\|a 0084-5353 \|2 ISSN
024	7	_	\|a 0171-5216 \|2 ISSN
024	7	_	\|a 1432-1335 \|2 ISSN
024	7	_	\|a altmetric:181292270 \|2 altmetric
037	_	_	\|a DKFZ-2025-01872
041	_	_	\|a English
082	_	_	\|a 610
100	1	_	\|a Li, Cheng-Peng \|b 0
245	_	_	\|a The imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial.
260	_	_	\|a Heidelberg \|c 2025 \|b Springer
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1757570457_26502 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
520	_	_	\|a The study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.5 Sonnet, DeepSeek-R1, and OpenAI-o1 across five anonymized STS cases from the sarcoma ring trial. Each model was queried 21 times per case using a standardized prompt, and the responses were compared with human MTBs in terms of intra-model consistency, treatment recommendation alignment, alternative recommendations, and source citation.LLMs demonstrated high inter-model and intra-model consistency in only 20% of cases, and their recommendations aligned with human consensus in only 20-60% of cases. The model with the highest concordance with the most common MTB recommendation, Claude 3.5 Sonnet, aligned with experts in only 60% of cases. Notably, the recommendations across MTBs were highly heterogenous, contextualizing the variable LLM performance. Discrepancies were particularly notable, where common human recommendations were often absent in LLM outputs. Additionally, the sources for the recommendation rationale of LLMs were clearly derived from the German S3 sarcoma guidelines in only 24.8% to 55.2% of the responses. LLMs occasionally suggested potentially harmful information were also observed in alternative recommendations.Despite the considerable heterogeneity observed in MTB recommendations, the significant discrepancies and potentially harmful recommendations highlight current AI tools' limitations, underscoring that referral to high-volume sarcoma centers remains essential for optimal patient care. At the same time, LLMs could serve as an excellent tool to prepare for MDT discussions.
536	_	_	\|a 899 - ohne Topic (POF4-899) \|0 G:(DE-HGF)POF4-899 \|c POF4-899 \|f POF IV \|x 0
588	_	_	\|a Dataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
650	_	7	\|a Artificial intelligence \|2 Other
650	_	7	\|a Clinical decision \|2 Other
650	_	7	\|a Large language model \|2 Other
650	_	7	\|a Multidisciplinary tumor board \|2 Other
650	_	7	\|a Soft tissue sarcoma \|2 Other
650	_	2	\|a Humans \|2 MeSH
650	_	2	\|a Sarcoma: therapy \|2 MeSH
650	_	2	\|a Sarcoma: pathology \|2 MeSH
650	_	2	\|a Benchmarking: methods \|2 MeSH
650	_	2	\|a Cancer Care Facilities \|2 MeSH
650	_	2	\|a Language \|2 MeSH
650	_	2	\|a Large Language Models \|2 MeSH
700	1	_	\|a Kalisa, Aimé Terence \|b 1
700	1	_	\|a Roohani, Siyer \|0 P:(DE-HGF)0 \|b 2
700	1	_	\|a Hummedah, Kamal \|b 3
700	1	_	\|a Menge, Franka \|b 4
700	1	_	\|a Reißfelder, Christoph \|b 5
700	1	_	\|a Albertsmeier, Markus \|b 6
700	1	_	\|a Kasper, Bernd \|b 7
700	1	_	\|a Jakob, Jens \|b 8
700	1	_	\|a Yang, Cui \|b 9
773	_	_	\|a 10.1007/s00432-025-06304-9 \|g Vol. 151, no. 9, p. 248 \|0 PERI:(DE-600)1459285-X \|n 9 \|p 248 \|t Journal of cancer research and clinical oncology \|v 151 \|y 2025 \|x 0301-1585
909	C	O	\|o oai:inrepo02.dkfz.de:304480 \|p VDB
910	1	_	\|a Deutsches Krebsforschungszentrum \|0 I:(DE-588b)2036810-0 \|k DKFZ \|b 2 \|6 P:(DE-HGF)0
913	1	_	\|a DE-HGF \|b Programmungebundene Forschung \|l ohne Programm \|1 G:(DE-HGF)POF4-890 \|0 G:(DE-HGF)POF4-899 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-800 \|4 G:(DE-HGF)POF \|v ohne Topic \|x 0
914	1	_	\|y 2025
915	_	_	\|a DEAL Springer \|0 StatID:(DE-HGF)3002 \|2 StatID \|d 2024-12-20 \|w ger
915	_	_	\|a DEAL Springer \|0 StatID:(DE-HGF)3002 \|2 StatID \|d 2024-12-20 \|w ger
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0300 \|2 StatID \|b Medline \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Clarivate Analytics Master Journal List \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1050 \|2 StatID \|b BIOSIS Previews \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0160 \|2 StatID \|b Essential Science Indicators \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1030 \|2 StatID \|b Current Contents - Life Sciences \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)1190 \|2 StatID \|b Biological Abstracts \|d 2024-12-20
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0113 \|2 StatID \|b Science Citation Index Expanded \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection \|d 2024-12-20
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0600 \|2 StatID \|b Ebsco Academic Search \|d 2024-12-20
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b ASC \|d 2024-12-20
920	1	_	\|0 I:(DE-He78)BE01-20160331 \|k BE01 \|l DKTK Koordinierungsstelle Berlin \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-He78)BE01-20160331
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help