Comparative Evaluation of SAM2 and SAM1 for 2D and 3D Medical Image Segmentation: Performance Insights and Transfer Learning Potential

Recent advances in segmentation foundation models like the Segment Anything Model (SAM) have shown impressive performance on natural images and videos. Still, their application to medical data remains to be determined. SAM, trained on a vast dataset of natural images, struggles with medical images due to domain differences like lower resolution and unique image challenges. Although MedSAM has improved 2D medical image segmentation through fine-tuning, its performance on 3D pictures and videos is limited. SAM2 extends SAM to video segmentation and shows promise, but its effectiveness on medical data, particularly 3D photos and videos, has yet to be fully evaluated.

Researchers from the University Health Network and the University of Toronto have comprehensively evaluated the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos. They compared SAM2 with SAM1 and MedSAM, identifying both strengths and weaknesses. They developed a transfer learning pipeline to adapt SAM2 for medical use and successfully fine-tuned the model. Additionally, they integrated SAM2 into a 3D Slicer plugin. They implemented a Gradio API, enabling efficient 3D image and video segmentation for medical data like CT, MR, and PET, which the official SAM2 interface does not support.

The study used public datasets from the CVPR 2024 Medical Image Segmentation on Laptop Challenge for evaluation, excluding any data from the MedSAM training set. CT images were preprocessed with intensity cutoffs, MR and PET images were clipped and normalized, while other modalities remained unchanged. All images were converted to npz format for batch inference. SAM2, an extension of SAM1, incorporates Hiera for multi-scale feature extraction and a memory attention module for consistent video segmentation across frames. The fine-tuning of SAM2-Tiny involved freezing the prompt encoder, updating the image encoder and mask decoder, and using Dice and cross-entropy losses for robust segmentation.

The benchmark dataset used in the study includes 11 commonly used medical image modalities, such as CT, MRI, PET, and ultrasound. SAM2, a versatile image and video segmentation model, was evaluated on 2D, 3D, and video datasets. Comparisons were made with SAM1 and MedSAM across various model sizes. The evaluation involved segmenting 2D images directly, while 3D images were treated as sequences of 2D slices, with segmentation masks propagated from the middle slice. SAM2’s video segmentation capability allowed it to handle dynamic object locations across frames, which is particularly useful for ultrasound and endoscopy videos.

The results showed that SAM2 outperformed SAM1 in several modalities like MR and dermoscopy, but MedSAM consistently achieved better results in most 2D modalities except for PET and light microscopy. In 3D segmentation, SAM2 demonstrated significant improvements over SAM1 and MedSAM in CT and MR images by leveraging its video segmentation capabilities. However, SAM2 struggled with PET images due to over-segmentation errors. Transfer learning was applied to adapt SAM2 to medical domains, resulting in substantial performance gains across various organs in 3D CT scans. To enhance accessibility for medical professionals, user-friendly interfaces based on 3D Slicer and Gradio were developed for general 3D medical image and video segmentation.

In conclusion, the study compares the performance of SAM2 and SAM1 models in medical image segmentation, revealing that SAM2 outperforms SAM1 in certain 2D and 3D modalities, like MRI and CT, due to its advanced architecture and training on larger datasets. However, SAM1 performs better in others, such as OCT and PET. The analysis shows that more than model size is needed, as smaller SAM2 variants sometimes excel. SAM2’s video segmentation capabilities enhance its utility for 3D medical images but lag behind the specialized MedSAM in 2D tasks. The paper also highlights the potential for transfer learning to improve SAM2’s medical image segmentation performance.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 48k+ ML SubReddit

Find Upcoming AI Webinars here

Comparative Evaluation of SAM2 and SAM1 for 2D and 3D Medical Image Segmentation: Performance Insights and Transfer Learning Potential 2

Source link

What's Hot

No Train, All Gain: Enhancing Deep Frozen Representations with Self-Supervised Gradients

Meta AI Researchers Introduce Mixture-of-Transformers (MoT): A Sparse Multi-Modal Transformer Architecture that Significantly Reduces Pretraining Computational Costs

A Practical Framework for Data Analysis: 6 Essential Principles | by Pararawendy Indarjo | Nov, 2024

Comparative Evaluation of SAM2 and SAM1 for 2D and 3D Medical Image Segmentation: Performance Insights and Transfer Learning Potential

Meta AI Researchers Introduce Mixture-of-Transformers (MoT): A Sparse Multi-Modal Transformer Architecture that Significantly Reduces Pretraining Computational Costs

Researchers from Snowflake and CMU Introduce SuffixDecoding: A Novel Model-Free Approach to Accelerating Large Language Model (LLM) Inference through Speculative Decoding

Nous Research Introduces Two New Projects: The Forge Reasoning API Beta and Nous Chat

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

No Train, All Gain: Enhancing Deep Frozen Representations with Self-Supervised Gradients

Meta AI Researchers Introduce Mixture-of-Transformers (MoT): A Sparse Multi-Modal Transformer Architecture that Significantly Reduces Pretraining Computational Costs

A Practical Framework for Data Analysis: 6 Essential Principles | by Pararawendy Indarjo | Nov, 2024

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Our Picks

No Train, All Gain: Enhancing Deep Frozen Representations with Self-Supervised Gradients

Meta AI Researchers Introduce Mixture-of-Transformers (MoT): A Sparse Multi-Modal Transformer Architecture that Significantly Reduces Pretraining Computational Costs

A Practical Framework for Data Analysis: 6 Essential Principles | by Pararawendy Indarjo | Nov, 2024

What's Hot

Comparative Evaluation of SAM2 and SAM1 for 2D and 3D Medical Image Segmentation: Performance Insights and Transfer Learning Potential

Related Posts

Leave A Reply Cancel Reply