TY - JOUR
AU - Stueker, Esther Helene
AU - Kolbinger, Fiona R
AU - Saldanha, Oliver Lester
AU - Digomann, David
AU - Pistorius, Steffen
AU - Oehme, Florian
AU - Van Treeck, Marko
AU - Ferber, Dyke
AU - Löffler, Chiara Maria Lavinia
AU - Weitz, Jürgen
AU - Distler, Marius
AU - Kather, Jakob Nikolas
AU - Muti, Hannah Sophie
TI - Vision-language models for automated video analysis and documentation in laparoscopic surgery: a proof-of-concept study.
JO - International journal of surgery
VL - nn
SN - 1743-9191
CY - Amsterdam [u.a.]
PB - Elsevier Science
M1 - DKFZ-2025-01431
SP - nn
PY - 2025
N1 - epub
AB - The ongoing shortage of medical personnel highlights the urgent need to automate clinical documentation and reduce administrative burden. Large Vision-Language Models (VLMs) offer promising potential for supporting surgical documentation and intraoperative analysis.We conducted an observational, comparative performance study of two general-purpose VLMs-GPT-4o (OpenAI) and Gemini-1.5-pro (Google)-from June to September 2024, using 15 cholecystectomy and 15 appendectomy videos (1 fps) from the CholecT45 and LapApp datasets. Tasks included object detection (vessel clips, gauze, retrieval bags, bleeding), surgery type classification, appendicitis grading, and surgical report generation. In-context learning (ICL) was evaluated as an enhancement method. Performance was assessed using descriptive accuracy metrics.Both models identified vessel clips with 100
KW - appendectomy (Other)
KW - cholecystectomy (Other)
KW - minimally invasive surgery (Other)
KW - surgical video analysis (Other)
KW - vision-language models (Other)
LB - PUB:(DE-HGF)16
C6 - pmid:40679978
DO - DOI:10.1097/JS9.0000000000003069
UR - https://inrepo02.dkfz.de/record/302984
ER -