SMT_开源AI项目-程序员客栈

SMT

Scale-Aware Modulation Meet Transformer

介绍

SMT 最初是在paper(released soon)中描述的，它能够作为一种有前途的新的通用主干，用于高效的可视化建模。 SMT是一种新的混合ConvNet和vision Transformer主干，它可以有效地模拟随着网络深度从局部依赖到全局依赖的转变，从而获得优于ConvNets和Transformer的性能。

teaser

预训练模型在ImageNet上的主要结果

name	pretrain	resolution	acc@1	acc@5	#params	FLOPs	22K model	1K model
SMT-T	ImageNet-1K	224x224	82.2	96.0	12M	2.4G	-	github
SMT-S	ImageNet-1K	224x224	83.7	96.5	21M	4.7G	-	github
SMT-B	ImageNet-1K	224x224	84.3	96.9	32M	7.7G	-	github
SMT-L	ImageNet-22K	224x224	87.1	98.1	81M	17.6G	github	github
SMT-L	ImageNet-22K	384x384	88.1	98.4	81M	51.6G	github	github

下游任务的主要结果

COCO 目标检测 (2017 val)

Backbone	Method	pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs
SMT-S	Mask R-CNN	ImageNet-1K	3x	49.0	43.4	40M	265G
SMT-B	Mask R-CNN	ImageNet-1K	3x	49.8	44.0	52M	328G
SMT-S	Cascade Mask R-CNN	ImageNet-1K	3x	51.9	44.7	78M	744G
SMT-S	RetinaNet	ImageNet-1K	3x	47.3	-	30M	247G
SMT-S	Sparse R-CNN	ImageNet-1K	3x	50.2	-	102M	171G
SMT-S	ATSS	ImageNet-1K	3x	49.9	-	28M	214G
SMT-S	DINO	ImageNet-1K	4scale	54.0	-	40M	309G

ADE20K 语义分割 (val)

Backbone	Method	pretrain	Crop Size	Lr Schd	mIoU (ss)	mIoU (ms)	#params	FLOPs
SMT-S	UperNet	ImageNet-1K	512x512	160K	49.2	50.2	50M	935G
SMT-B	UperNet	ImageNet-1K	512x512	160K	49.6	50.6	62M	1004G

引用

@misc{lin2023scaleaware,
      title={Scale-Aware Modulation Meet Transformer}, 
      author={Weifeng Lin and Ziheng Wu and Jiayu Chen and Jun Huang and Lianwen Jin},
      year={2023},
      eprint={2307.08579},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

SMT

作品详情

SMT

介绍

预训练模型在ImageNet上的主要结果

下游任务的主要结果

引用

重点城市程序员兼职推荐

重点岗位程序员兼职推荐