At its 2022 annual meeting, the Surgical Outcomes Club—a leading consortium of surgeons and health services researchers dedicated to advancing surgical outcomes science—hosted a panel of four experts to discuss the growing role of predictive analytics and artificial intelligence (AI) in surgical research. The discussion centered on three core domains where AI is poised to make a significant impact: computer vision, digital transformation at the point of care, and the utilization of electronic health records (EHR) data. The panel addressed both the opportunities and inherent challenges associated with integrating AI into surgical practice.
Computer Vision: Giving Machines Eyes in the OR
The increasing capture of surgical video—routinely generated during minimally invasive and robotic procedures, and now expanding into open surgeries—offers a new frontier for AI. Real-time video annotation powered by computer vision can help evaluate surgical performance, identify complex anatomy, and provide intraoperative feedback to mitigate technical errors. Beyond performance assessment, this technology holds promise for surgical education, allowing skills training and behavior review through tool and hand tracking, and phase annotation.
With the advent of convolutional neural networks and other advanced models, video-based AI tools can now approach the visual complexity of surgery. These innovations can enhance surgeon training and potentially assist with decision-making during procedures. However, real-world implementation remains limited by several barriers, including the complexity of surgical environments, insufficient generalizability of current models, and a lack of large, annotated, and diverse datasets. Data sharing limitations and institutional barriers further complicate the creation of robust open-source datasets.
While current efforts rely on public video sources of inconsistent quality, a recent consensus suggests retrospective training tools may be feasible within two years, and real-time applications may emerge within the next decade. Recognizing the early-stage nature of this field is critical to fostering collaboration between surgeons and engineers, ensuring that AI tools ultimately support, rather than replace, surgical expertise.
Building Surgical Intelligence Through Video-Based Analytics
Video analytics offer the potential to assist surgeons during operations by identifying key anatomical landmarks, outlining tumor margins, or analyzing instrument usage patterns. Particularly in rare or unexpected events—such as intraoperative bleeding—AI could provide scenario-based recommendations to guide next steps. Though not yet deployed in operating rooms, emerging research outlines how real-time decision support systems could soon become a reality.
Leveraging Data for Surgical Innovation
Despite the influx of new surgical devices, there's limited insight into how they compare to existing techniques. Video analysis can quantify how new technologies influence surgical workflow and learning curves, particularly during the adoption of minimally invasive or robotic procedures. Side-by-side comparisons of similar cases can inform best practices and highlight improvements or setbacks introduced by novel tools.
Surgical video provides “ground truth” data, offering unparalleled insights into intraoperative behavior. Beyond enhancing individual performance, these data can serve broader purposes—from reducing OR inefficiencies to defending medical decisions in legal contexts, and informing medical device development. However, building the necessary infrastructure for data processing and analytics requires technical expertise far beyond traditional clinical training. Successful implementation hinges on interdisciplinary collaboration between clinicians and data scientists.
Addressing the Complexities of Surgical Video Analysis
While promising, the use of surgical video raises significant legal, ethical, and logistical questions. Ownership of the footage remains unclear, and concerns about patient privacy, staff exposure, and potential misuse can discourage open sharing—especially in complex or unfavorable cases. Nonetheless, video evidence can also serve as proof of adherence to standard care protocols.
Other concerns include potential conflicts of interest, data security, and a disconnect between those who generate the data (surgeons) and those capable of analyzing it (engineers). Despite these challenges, carefully selected use cases and clear goals can help bridge these divides.
Calls to Action: Accelerating AI Integration in Neurosurgery
The increasing public awareness of AI offers a strategic moment to integrate it meaningfully into neurosurgical practice. Four key actions are proposed:
-
Establish AI Task Forces: Professional societies such as the American Association of Neurological Surgeons (AANS) and the Congress of Neurological Surgeons (CNS) should form joint task forces to define best practices, set data standards, and facilitate clinician-scientist collaboration. Subspecialty task forces should address domain-specific use cases, backed by dedicated research funding and aligned with interdisciplinary partners across related surgical and technical fields.
-
Create Multi-Institutional Research Organizations: Single-institution efforts lack the scale and diversity needed to train robust AI models. Instead, we should foster independent, multi-institutional research entities—either nonprofit or for-profit—that can secure funding, manage cross-institutional data, and develop reusable tools for ML integration.
-
Launch Conferences and Challenge Frameworks: There is a need for clinician-led conferences and grand challenges to define and advance AI use cases in surgery. Inspired by the “Common Task Framework” (CTF) model, such initiatives can attract diverse collaborators and reward clinically meaningful innovation. Dedicated tracks within surgical and technical conferences can help bridge the divide between these communities.
-
Standardize Data Capture and Sharing: Video is just one of many valuable data streams in the modern OR. Integrating and standardizing these streams for AI use remains a challenge due to technical and regulatory hurdles. Collaborative efforts between surgical and technical communities, supported by new regulatory frameworks, can unlock the potential of OR data for clinical improvement.
By aligning clinical insight with technical innovation, the surgical community can unlock the transformative potential of AI. Through multidisciplinary efforts, structured collaboration, and a shared vision, we can bring next-generation surgical analytics from concept to clinical reality—benefiting patients, providers, and the entire healthcare ecosystem.
As an example, a study developed an AI-ready dataset for model training by programmatically querying open surgical procedures on YouTube, selecting and manually annotating a subset of videos. This dataset was used to train a multitask AI model, subsequently applied in two proof-of-concept studies: (1) to generate “surgical signatures” that characterize procedural patterns, and (2) to identify hand motion kinematics indicative of surgeon experience and skill level.
The resulting Annotated Videos of Open Surgery (AVOS) dataset comprises 1,997 videos spanning 23 procedure types, sourced from 50 countries over a 15-year period. To test real-world applicability, additional deidentified surgical videos were prospectively collected from a tertiary academic medical center (Beth Israel Deaconess Medical Center [BIDMC]), with IRB approval and patient consent.
Multitask Model Architecture and Training
A multitask neural network was trained on the AVOS dataset to perform spatiotemporal analysis of hands, tools, and actions in surgical video. The model captured procedural flow and fine motor behaviors in near real time, enabling simultaneous analysis across multiple tasks. To improve generalizability across varied operative conditions, data augmentation techniques—including flipping, scaling, rotation, and occlusion testing—were applied during training. An alternating task training strategy was used to optimize both spatial and temporal branches, with a dedicated training stream for hand-pose estimation.
Inference was performed by extracting batches of four frames at five-second intervals from each video. Background actions were filtered to ensure consistent comparisons across procedures.
Proof-of-Concept: Generating Surgical Signatures
The model was tested on previously unseen videos of appendectomies, pilonidal cystectomies, and thyroidectomies—procedures well-represented in the AVOS dataset. These videos were manually reviewed to confirm the presence of key operative steps, with durations ranging from 2 to 30 minutes. Using temporal averaging of model outputs, distinct surgical signatures were generated for each procedure, reflecting expected progressions in tool use and action (e.g., from cutting to suturing).
These signatures serve as procedural benchmarks, and significant deviations from them may reflect disruptions in surgical flow, variations in technique, or complexity in a given case. This functionality offers the potential for early detection of surgical anomalies or challenges requiring expert intervention.
Proof-of-Concept: Quantifying Surgical Skill
To assess skill, the model was retrospectively applied to 101 prospectively collected surgical videos at BIDMC, including live procedures and simulated wound closures. Participants included 14 operators categorized as either trainees (medical students, residents) or experienced surgeons (fellows, attendings). Hand movements were tracked using bounding boxes and nine anatomical key points (thumb, index finger, and palm).
Kinematic metrics—including velocity, rotation, and translation—were extracted and summarized into a single compound skill score using principal component analysis. Logistic regression analysis showed this compound feature significantly predicted surgeon experience, with each unit increase associated with a 3.6-fold increase in odds of being an experienced surgeon (95% CI: 1.67–7.62; p = 0.001).
Implications for Surgical Education and AI-Augmented Assessment
The multitask model demonstrated procedure-agnostic capabilities, performing reliably across variable video conditions such as lighting and camera angles. The ability to analyze both procedural flow and individual surgeon behavior marks a major advance toward automated, objective surgical feedback.
By linking hand motion patterns to surgical expertise, the model offers actionable insights for training. For instance, AI-driven feedback on motion economy and steadiness could allow trainees to iteratively improve performance, aligning with best practices observed in expert surgeons. This scalable, unbiased approach to skill assessment may facilitate faster and more reliable surgical training, especially in simulation-based environments.
Sources:
https://doi.org/10.1001/jamasurg.2022.5444