← Back to Projects

Medical Multimodal Segmentation Using Foundation Models

Project Overview

This research undertakes a comprehensive reproducibility analysis and extension of "SegVol: Universal and Interactive Volumetric Medical Image Segmentation," a 3D foundation model for interactive volumetric medical image segmentation. The project validates SegVol's claims, assesses its geometric robustness to rotations, and proposes novel improvements using group equivariant convolutions.

Key Features

  • Reproduction of SegVol's core experiments on organ and lesion segmentation tasks
  • Evaluation of different prompting strategies (spatial, semantic, and combined)
  • Assessment of model robustness to global rotations in medical imaging
  • Novel contribution using SO(3)-equivariant steerable convolutions
  • Parameter-efficient adaptation techniques (LoRA and side-tuning)
  • Comprehensive analysis of strengths, weaknesses, and improvement potential

Technical Implementation

Implemented using the original SegVol codebase with modifications for reproducibility experiments. The model architecture integrates Vision Transformer (ViT) for volumetric encoding, CLIP text encoder for semantic prompts, and custom spatial encoders. For the novel contribution, developed SO(3)-equivariant patch embeddings using steerable convolutions from the ESCNN library, combined with LoRA adaptation modules for parameter-efficient fine-tuning. Evaluated performance using Dice Similarity Coefficient and assessed geometric robustness through systematic rotation experiments.

Key Findings & Impact

Successfully reproduced SegVol's superior performance over task-specific baselines and validated the benefits of combined spatial-semantic prompting. Identified critical robustness issues with global rotations (up to 26% performance drop on certain organs), demonstrating the need for geometric-aware architectures in medical imaging. Proposed novel SO(3)-equivariant adaptation achieved comparable performance to rotation-augmented baselines while requiring 17% fewer training epochs, showing promising data efficiency improvements for geometric robustness.

Publication

Co-authored with Danny van den Berg, Roan van Blanken, and Jesse Brouwers under the supervision of Teaching Assistant Stefanos Achlatis at the University of Amsterdam. This work reproduces and extends the SegVol paper by Du et al. (2023), providing valuable insights for robust medical image segmentation in clinical applications.