Fast Segment Anything

Paper ｜ GitHub

Fast Segment Anything Model（FastSAM）是一个仅使用SAM作者发布的SA-1B数据集的1/50进行训练的CNN Segment Anything模型。FastSAM在50倍的运行速度下实现了与SAM方法相当的性能。

Comparative analysis of FastSAM and SAM

模型描述

FastSAM将segment anything任务分解为两个连续的阶段，即全实例分割和提示引导选择。第一阶段依赖于基于卷积神经网络（CNN）的检测器的实现。它生成图像中所有实例的分割掩码。在第二阶段，它输出与提示相对应的感兴趣区域。通过利用CNN的计算效率，FastSAM可以在不太损失算法效果的情况下，可以实现实时的segment anything。
核心内容及贡献:

基于YOLOv8-seg实现了FastSAM，它比SAM快50倍，且训练数据只有SAM的1/50，同时运行速度不受输入点数量的影响。
FastSAM可根据提示进行语义分割任务。
将segment anything任务分解为2阶段，第一阶段为对输入图像的全景实例分割，第二阶段为根据提示输入对全景实例分割结果进行刷选。

The framework of FastSAM

第一阶段，直接使用YOLOv8-Seg方法进行全实例分割阶段。YOLOv8-Seg检测分支负责输出物体类别和边界框信息，而分割分支则输出k个原型以及对应的k个掩模系数。这两个任务是同时进行，分割分支输入高分辨率特征图，保留了空间细节和语义信息。该特征图经过卷积层处理，上采样后再经过两个卷积层，最终输出掩模。
第二阶段，使用各种提示来识别感兴趣的特定对象。支持使用点提示、框提示和文本提示。

分割效果:

Segmentation Results of FastSAM

期望模型使用方式以及适用范围

本模型适用范围较广，能对图片中包含的大部分感兴趣物体（COCO things 80类）根据提示(点、框、文本)进行分割。

如何使用

在ModelScope框架上，提供输入图片，即可通过简单的Pipeline调用来使用。

代码范例

安装依赖包

# pip install modelscope (modelscope的notebook不需要安装modelscope)
# pip install git+https://github.com/openai/CLIP.git (如失败，请多尝试几次，或者到官方git根据教程安装)

导入模型，并进行推理

from modelscope.models import Model
from modelscope.pipelines import pipeline
from urllib import request

model = 'damo/cv_fastsam_image-instance-segmentation_sa1b'
pipe = pipeline('fast-sam-task', model=model, model_revision='v1.0.5')

image_path = './input.jpg'
image_url = 'http://k.sinaimg.cn/n/sinacn18/380/w1698h1082/20180810/b678-hhnunsq9451531.png/w700d1q75cms.jpg'
request.urlretrieve(image_url, image_path)

inputs = {
    'img_path': image_path,  # 输入图像路径
    'device': 'cpu',         # 使用‘cpu’或者‘cuda’
    'retina_masks': True,    # 是否使用retina
    'imgsz': 1024,           # 输入图像分辨率
    'conf': 0.4,             # 置信度阈值
    'iou': 0.9               # iou阈值
}
prompt_process = pipe(inputs)

# 输出所有分割实例，返回mask
ann = prompt_process.everything_prompt()

# 使用框提示进行分割，bboxes框坐标格式: [[x1,y1,x2,y2], ...]，返回mask
#ann = prompt_process.box_prompt(bboxes=[[200, 200, 300, 300]])

# 使用点提示进行分割，points: [x,y], pointlabel: 0:background, 1:foreground，返回mask
#ann = prompt_process.point_prompt(points=[[620, 360]], pointlabel=[1]) 

# 使用文本提示进行分割，text: 文本，返回mask
#ann = prompt_process.text_prompt(text='a photo of a dog')

# 把结果绘制在原图上，并进行保存
prompt_process.plot(annotations=ann, output_path='./images/output_dog.jpg',)

模型局限性以及可能的偏差

部分感兴趣物体占比太小或遮挡严重可能会影响分割结果

训练数据介绍

分割任何10亿(SA-1B)是一个数据集，SA-1B由1100万张多样化、高分辨率、隐私保护图像和使用数据引擎收集的1.1B高质量分割掩码组成。

数据评估及结果

Instance segmentation results

Clone with HTTP

 git clone https://www.modelscope.cn/damo/cv_fastsam_image-instance-segmentation_sa1b.git

引用

如果该模型对你有所帮助，请引用相关的论文：

@misc{zhao2023fast,
      title={Fast Segment Anything},
      author={Xu Zhao and Wenchao Ding and Yongqi An and Yinglong Du and Tao Yu and Min Li and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2306.12156},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

FastSAM快速分割一切

作品详情