%0 Journal Article
%A Stueker, Esther Helene
%A Kolbinger, Fiona R
%A Saldanha, Oliver Lester
%A Digomann, David
%A Pistorius, Steffen
%A Oehme, Florian
%A Van Treeck, Marko
%A Ferber, Dyke
%A Löffler, Chiara Maria Lavinia
%A Weitz, Jürgen
%A Distler, Marius
%A Kather, Jakob Nikolas
%A Muti, Hannah Sophie
%T Vision-language models for automated video analysis and documentation in laparoscopic surgery: a proof-of-concept study.
%J International journal of surgery
%V nn
%@ 1743-9191
%C Amsterdam [u.a.]
%I Elsevier Science
%M DKFZ-2025-01431
%P nn
%D 2025
%Z epub
%X The ongoing shortage of medical personnel highlights the urgent need to automate clinical documentation and reduce administrative burden. Large Vision-Language Models (VLMs) offer promising potential for supporting surgical documentation and intraoperative analysis.We conducted an observational, comparative performance study of two general-purpose VLMs-GPT-4o (OpenAI) and Gemini-1.5-pro (Google)-from June to September 2024, using 15 cholecystectomy and 15 appendectomy videos (1 fps) from the CholecT45 and LapApp datasets. Tasks included object detection (vessel clips, gauze, retrieval bags, bleeding), surgery type classification, appendicitis grading, and surgical report generation. In-context learning (ICL) was evaluated as an enhancement method. Performance was assessed using descriptive accuracy metrics.Both models identified vessel clips with 100
%K appendectomy (Other)
%K cholecystectomy (Other)
%K minimally invasive surgery (Other)
%K surgical video analysis (Other)
%K vision-language models (Other)
%F PUB:(DE-HGF)16
%9 Journal Article
%$ pmid:40679978
%R 10.1097/JS9.0000000000003069
%U https://inrepo02.dkfz.de/record/302984