Journal Article DKFZ-2026-00280

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Large-scale self-supervised video foundation model for intelligent surgery.

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2026
Macmillan Publishers Limited [Basingstoke]

npj digital medicine nn, nn () [10.1038/s41746-026-02403-0]
 GO

Abstract: Computer-Assisted Intervention has the potential to revolutionize modern surgery, with surgical scene understanding serving as a critical component in supporting decision-making and improving procedural efficacy. While existing AI-driven approaches alleviate annotation burdens via self-supervised spatial representation learning, their lack of explicit temporal modeling during pre-training fundamentally restricts the capture of dynamic surgical contexts, resulting in incomplete spatiotemporal understanding. In this work, we introduce the first video-level surgical pre-training framework that enables joint spatiotemporal representation learning from large-scale surgical video data. To achieve this, we constructed a large-scale surgical video dataset comprising 3650 videos and 3.55 million frames, spanning more than 20 surgical procedures and over 10 anatomical structures. Building upon this dataset, we propose SurgVISTA (Surgical Video-level Spatial-Temporal Architecture), a reconstruction-based pre-training method that jointly captures intricate spatial structures and temporal dynamics. Additionally, SurgVISTA incorporates image-level knowledge distillation guided by a surgery-specific expert model to enhance the learning of fine-grained anatomical and semantic features. To validate its effectiveness, we established a comprehensive benchmark comprising 13 video-level datasets spanning six surgical procedures across four tasks. Extensive experiments demonstrate that SurgVISTA consistently outperforms both natural- and surgical-domain pre-trained models, demonstrating strong potential to advance intelligent surgical systems in clinically meaningful scenarios.

Classification:

Note: #NCTZFB26# / epub

Contributing Institute(s):
  1. E130 Intelligente Medizinische Systeme (E130)
  2. Koordinierungsstelle NCT Heidelberg (HD02)
Research Program(s):
  1. 315 - Bildgebung und Radioonkologie (POF4-315) (POF4-315)

Appears in the scientific report 2026
Database coverage:
Medline ; DOAJ ; Article Processing Charges ; Clarivate Analytics Master Journal List ; Current Contents - Clinical Medicine ; DOAJ Seal ; Essential Science Indicators ; Fees ; IF >= 15 ; JCR ; PubMed Central ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Public records
Publications database

 Record created 2026-02-05, last modified 2026-02-05



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)