000304480 001__ 304480
000304480 005__ 20250914022644.0
000304480 0247_ $$2doi$$a10.1007/s00432-025-06304-9
000304480 0247_ $$2pmid$$apmid:40926110
000304480 0247_ $$2ISSN$$a0301-1585
000304480 0247_ $$2ISSN$$a0084-5353
000304480 0247_ $$2ISSN$$a0171-5216
000304480 0247_ $$2ISSN$$a1432-1335
000304480 0247_ $$2altmetric$$aaltmetric:181292270
000304480 037__ $$aDKFZ-2025-01872
000304480 041__ $$aEnglish
000304480 082__ $$a610
000304480 1001_ $$aLi, Cheng-Peng$$b0
000304480 245__ $$aThe imitation game: large language models versus multidisciplinary tumor boards: benchmarking AI against 21 sarcoma centers from the ring trial.
000304480 260__ $$aHeidelberg$$bSpringer$$c2025
000304480 3367_ $$2DRIVER$$aarticle
000304480 3367_ $$2DataCite$$aOutput Types/Journal article
000304480 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1757570457_26502
000304480 3367_ $$2BibTeX$$aARTICLE
000304480 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000304480 3367_ $$00$$2EndNote$$aJournal Article
000304480 520__ $$aThe study aims to compare the treatment recommendations generated by four leading large language models (LLMs) with those from 21 sarcoma centers' multidisciplinary tumor boards (MTBs) of the sarcoma ring trial in managing complex soft tissue sarcoma (STS) cases.We simulated STS-MTBs using four LLMs-Llama 3.2-vison: 90b, Claude 3.5 Sonnet, DeepSeek-R1, and OpenAI-o1 across five anonymized STS cases from the sarcoma ring trial. Each model was queried 21 times per case using a standardized prompt, and the responses were compared with human MTBs in terms of intra-model consistency, treatment recommendation alignment, alternative recommendations, and source citation.LLMs demonstrated high inter-model and intra-model consistency in only 20% of cases, and their recommendations aligned with human consensus in only 20-60% of cases. The model with the highest concordance with the most common MTB recommendation, Claude 3.5 Sonnet, aligned with experts in only 60% of cases. Notably, the recommendations across MTBs were highly heterogenous, contextualizing the variable LLM performance. Discrepancies were particularly notable, where common human recommendations were often absent in LLM outputs. Additionally, the sources for the recommendation rationale of LLMs were clearly derived from the German S3 sarcoma guidelines in only 24.8% to 55.2% of the responses. LLMs occasionally suggested potentially harmful information were also observed in alternative recommendations.Despite the considerable heterogeneity observed in MTB recommendations, the significant discrepancies and potentially harmful recommendations highlight current AI tools' limitations, underscoring that referral to high-volume sarcoma centers remains essential for optimal patient care. At the same time, LLMs could serve as an excellent tool to prepare for MDT discussions.
000304480 536__ $$0G:(DE-HGF)POF4-899$$a899 - ohne Topic (POF4-899)$$cPOF4-899$$fPOF IV$$x0
000304480 588__ $$aDataset connected to CrossRef, PubMed, , Journals: inrepo02.dkfz.de
000304480 650_7 $$2Other$$aArtificial intelligence
000304480 650_7 $$2Other$$aClinical decision
000304480 650_7 $$2Other$$aLarge language model
000304480 650_7 $$2Other$$aMultidisciplinary tumor board
000304480 650_7 $$2Other$$aSoft tissue sarcoma
000304480 650_2 $$2MeSH$$aHumans
000304480 650_2 $$2MeSH$$aSarcoma: therapy
000304480 650_2 $$2MeSH$$aSarcoma: pathology
000304480 650_2 $$2MeSH$$aBenchmarking: methods
000304480 650_2 $$2MeSH$$aCancer Care Facilities
000304480 650_2 $$2MeSH$$aLanguage
000304480 650_2 $$2MeSH$$aLarge Language Models
000304480 7001_ $$aKalisa, Aimé Terence$$b1
000304480 7001_ $$0P:(DE-HGF)0$$aRoohani, Siyer$$b2
000304480 7001_ $$aHummedah, Kamal$$b3
000304480 7001_ $$aMenge, Franka$$b4
000304480 7001_ $$aReißfelder, Christoph$$b5
000304480 7001_ $$aAlbertsmeier, Markus$$b6
000304480 7001_ $$aKasper, Bernd$$b7
000304480 7001_ $$aJakob, Jens$$b8
000304480 7001_ $$aYang, Cui$$b9
000304480 773__ $$0PERI:(DE-600)1459285-X$$a10.1007/s00432-025-06304-9$$gVol. 151, no. 9, p. 248$$n9$$p248$$tJournal of cancer research and clinical oncology$$v151$$x0301-1585$$y2025
000304480 909CO $$ooai:inrepo02.dkfz.de:304480$$pVDB
000304480 9101_ $$0I:(DE-588b)2036810-0$$6P:(DE-HGF)0$$aDeutsches Krebsforschungszentrum$$b2$$kDKFZ
000304480 9131_ $$0G:(DE-HGF)POF4-899$$1G:(DE-HGF)POF4-890$$2G:(DE-HGF)POF4-800$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bProgrammungebundene Forschung$$lohne Programm$$vohne Topic$$x0
000304480 9141_ $$y2025
000304480 915__ $$0StatID:(DE-HGF)3002$$2StatID$$aDEAL Springer$$d2024-12-20$$wger
000304480 915__ $$0StatID:(DE-HGF)3002$$2StatID$$aDEAL Springer$$d2024-12-20$$wger
000304480 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)1030$$2StatID$$aDBCoverage$$bCurrent Contents - Life Sciences$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2024-12-20
000304480 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2024-12-20
000304480 9201_ $$0I:(DE-He78)BE01-20160331$$kBE01$$lDKTK Koordinierungsstelle Berlin$$x0
000304480 980__ $$ajournal
000304480 980__ $$aVDB
000304480 980__ $$aI:(DE-He78)BE01-20160331
000304480 980__ $$aUNRESTRICTED