SMT
Scale-Aware Modulation Meet Transformer
介绍
SMT 最初是在paper(released soon)中描述的,它能够作为一种有前途的新的通用主干,用于高效的可视化建模。 SMT是一种新的混合ConvNet和vision Transformer主干,它可以有效地模拟随着网络深度从局部依赖到全局依赖的转变,从而获得优于ConvNets和Transformer的性能。
预训练模型在ImageNet上的主要结果
name | pretrain | resolution | acc@1 | acc@5 | #params | FLOPs | 22K model | 1K model |
---|---|---|---|---|---|---|---|---|
SMT-T | ImageNet-1K | 224x224 | 82.2 | 96.0 | 12M | 2.4G | - | github |
SMT-S | ImageNet-1K | 224x224 | 83.7 | 96.5 | 21M | 4.7G | - | github |
SMT-B | ImageNet-1K | 224x224 | 84.3 | 96.9 | 32M | 7.7G | - | github |
SMT-L | ImageNet-22K | 224x224 | 87.1 | 98.1 | 81M | 17.6G | github | github |
SMT-L | ImageNet-22K | 384x384 | 88.1 | 98.4 | 81M | 51.6G | github | github |
下游任务的主要结果
COCO 目标检测 (2017 val)
Backbone | Method | pretrain | Lr Schd | box mAP | mask mAP | #params | FLOPs |
---|---|---|---|---|---|---|---|
SMT-S | Mask R-CNN | ImageNet-1K | 3x | 49.0 | 43.4 | 40M | 265G |
SMT-B | Mask R-CNN | ImageNet-1K | 3x | 49.8 | 44.0 | 52M | 328G |
SMT-S | Cascade Mask R-CNN | ImageNet-1K | 3x | 51.9 | 44.7 | 78M | 744G |
SMT-S | RetinaNet | ImageNet-1K | 3x | 47.3 | - | 30M | 247G |
SMT-S | Sparse R-CNN | ImageNet-1K | 3x | 50.2 | - | 102M | 171G |
SMT-S | ATSS | ImageNet-1K | 3x | 49.9 | - | 28M | 214G |
SMT-S | DINO | ImageNet-1K | 4scale | 54.0 | - | 40M | 309G |
ADE20K 语义分割 (val)
Backbone | Method | pretrain | Crop Size | Lr Schd | mIoU (ss) | mIoU (ms) | #params | FLOPs |
---|---|---|---|---|---|---|---|---|
SMT-S | UperNet | ImageNet-1K | 512x512 | 160K | 49.2 | 50.2 | 50M | 935G |
SMT-B | UperNet | ImageNet-1K | 512x512 | 160K | 49.6 | 50.6 | 62M | 1004G |
引用
@misc{lin2023scaleaware,
title={Scale-Aware Modulation Meet Transformer},
author={Weifeng Lin and Ziheng Wu and Jiayu Chen and Jun Huang and Lianwen Jin},
year={2023},
eprint={2307.08579},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
评论