Bots in white coats: are large language models the future of patient education? a multi-center cross-sectional analysis.

Aghamaliyev, Ughur; Ilmer, Matthias; Werner, Jens; Thomas, Michael; Guba, Markus O; Kahlert, Christoph; Schölch, Sebastian; Niess, Hanno; Zamparas, Athanasios; Krautz, Christian; Karimbayli, Javad; Bösch, Florian; Renz, Bernhard W; Schmidt, Thomas; Angele, Martin K

doi:10.1097/JS9.0000000000002250

TY  - JOUR
AU  - Aghamaliyev, Ughur
AU  - Karimbayli, Javad
AU  - Zamparas, Athanasios
AU  - Bösch, Florian
AU  - Thomas, Michael
AU  - Schmidt, Thomas
AU  - Krautz, Christian
AU  - Kahlert, Christoph
AU  - Schölch, Sebastian
AU  - Angele, Martin K
AU  - Niess, Hanno
AU  - Guba, Markus O
AU  - Werner, Jens
AU  - Ilmer, Matthias
AU  - Renz, Bernhard W
TI  - Bots in white coats: are large language models the future of patient education? a multi-center cross-sectional analysis.
JO  - International journal of surgery
VL  - 111
IS  - 3
SN  - 1743-9191
CY  - Amsterdam [u.a.]
PB  - Elsevier Science
M1  - DKFZ-2025-00253
SP  - 2376-2384
PY  - 2025
N1  - 2025 Mar 1;111(3):2376-2384
AB  - Every year, around 300 million surgeries are conducted worldwide, with an estimated 4.2 million deaths occurring within 30 days after surgery. Adequate patient education is crucial, but often falls short due to the stress patients experience before surgery. Large language models (LLMs) can significantly enhance this process by delivering thorough information and addressing patient concerns that might otherwise go unnoticed.This cross-sectional study evaluated ChatGPT-4o's audio-based responses to frequently asked questions (FAQs) regarding six general surgical procedures. Three experienced surgeons and two senior residents formulated seven general and three procedure-specific FAQs for both preoperative and postoperative situations, covering six surgical scenarios (major: pancreatic head resection, rectal resection, total gastrectomy; minor: cholecystectomy, Lichtenstein procedure, hemithyroidectomy). In total, 120 audio responses were generated, transcribed, and assessed by 11 surgeons from six different German university hospitals.ChatGPT-4o demonstrated strong performance, achieving an average score of 4.12/5 for accuracy, 4.46/5 for relevance, and 0.22/5 for potential harm across 120 questions. Postoperative responses surpassed preoperative ones in both accuracy and relevance, while also exhibiting lower potential for harm. Additionally, responses related to minor surgeries were minimal, but significantly more accurate compared to those for major surgeries.This study underscores GPT-4o's potential to enhance patient education both before and after surgery by delivering accurate and relevant responses to FAQs about various surgical procedures. Responses regarding the postoperative course proved to be more accurate and less harmful than those addressing preoperative ones. Although a few responses carried moderate risks, the overall performance was robust, indicating GPT-4o's value in patient education. The study suggests the development of hospital-specific applications or the integration of GPT-4o into interactive robotic systems to provide patients with reliable, immediate answers, thereby improving patient satisfaction and informed decision-making.
LB  - PUB:(DE-HGF)16
C6  - pmid:39878073
DO  - DOI:10.1097/JS9.0000000000002250
UR  - https://inrepo02.dkfz.de/record/298349
ER  -

guest :: login DKFZ
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help