Enhancing clinicians' trust in large language models via transparent source attribution: A randomized controlled evaluation in uro-oncology.

Carl, Nicolas; Wessels, Frederik; Mangold, Maurin Helen; Winterstein, Jana Theres; Haggenmüller, Sarah; Michel, Maurice Stephan; Worst, Thomas Stefan; Hetz, Martin Joachim; Wies, Christoph; Maywald, Lasse; Brinker, Titus; Westhoff, Niklas

doi:10.1016/j.ejca.2025.116168

Journal Article

DKFZ-2025-02974

Enhancing clinicians' trust in large language models via transparent source attribution: A randomized controlled evaluation in uro-oncology.

Carl, N. (First author)DKFZ* ; Hetz, M. J.DKFZ* ; Wies, C.DKFZ* ; Haggenmüller, S.DKFZ* ; Winterstein, J. T.DKFZ* ; Mangold, M. H. ; Maywald, L. ; Worst, T. S. ; Westhoff, N. ; Michel, M. S. ; Wessels, F. ; Brinker, T. (Last author)DKFZ*

2026
Elsevier Amsterdam [u.a.]

European journal of cancer 233, 116168 (2026) [10.1016/j.ejca.2025.116168]

Abstract: Large language models (LLMs) are utilized to answer queries in urology and oncology, yet the performance is limited due to outdated data and missing source transparency, which undermines clinical reliability and therefore adoption.We developed UroBot, a urology-specific chatbot integrating retrieval-augmented generation (RAG) to provide in-line references and source text previews for each response. In a randomized controlled reader study, UroBot and ChatGPT were compared across ten uro-oncological case rounds. Thirty urologists assessed recommendation correctness, source verifiability and trust with preference ratings collected after each round.UroBot performed significantly better than ChatGPT in recommendation correctness (73 % vs. 50 %; p < 0.001), source attribution (74 % vs. 30 %; p < 0.001) and verifiability of sources (84 % vs. 35 %; p < 0.001). Furthermore, clinicians consistently preferred UroBot for accuracy, source verifiability and trust. Qualitative analysis showed that ChatGPT often produced vague or incorrect citations, with 28 % being non-existent or outdated and 83 % lacking specific sections, whereas UroBot achieved complete alignment on guideline sub-section and page level. These gains in citation precision were mirrored by higher clinician ratings for verifiability and trust. Limitations include the small sample size of ten cases due to feasibility, which may not cover the full uro-oncological spectrum.Our findings show that combining LLMs with RAG with in-line references and source text previews markedly enhances perceived source attribution and verifiability compared to state-of-the-art conventional LLMs. Importantly, this approach is readily transferable across medical subspecialties, enabling reliable and up-to-date clinical decision support.

Keyword(s): Chatbot ; Explainability ; In-line references ; Retrieval augmented generation ; Source text preview ; Verifiability

Classification:

ddc:610

Note: #EA:C140#LA:C140# / Available online 11 December 2025 / 2026 Jan 17;233:116168

Contributing Institute(s):

Digitale Prävention, Diagnostik und Therapiesteuerung (C140)

Research Program(s):

313 - Krebsrisikofaktoren und Prävention (POF4-313) (POF4-313)

Appears in the scientific report 2025

Database coverage:
Medline

;

; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; Current Contents - Clinical Medicine ; Current Contents - Life Sciences ; Ebsco Academic Search ; Essential Science Indicators ; IF >= 5 ; JCR ; Nationallizenz

; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database
Open Access

Record created 2025-12-17, last modified 2026-03-05

Similar records

OpenAccess:

PDF

PDF (PDFA)

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help