描述
Swin Transformer v2 模型在 ImageNet-1k 上以 256x256 的分辨率进行预训练。 Liu 等人在论文《Swin Transformer V2:Scaling UpCapacity and Resolution》中对此进行了介绍。并首次在GitHub中发布。
本版本来自 Huggingface 仓库,用以方便因各种原因无法在原仓库中下载的情况。
如何使用
依赖安装
# 安装 modelscope
!pip install modelscope
# 使用 snapshot_download 下载模型可能需要(这几个包可能需要重启 Runtime)
!pip install urllib3 --upgrade
!pip install requests --upgrade
!pip install spotipy --upgrade
特征提取
from modelscope import snapshot_download
from transformers import AutoImageProcessor, Swinv2Model
from PIL import Image
import requests, torch
model_dir = snapshot_download('Alien1996/Mirror_swinv2-base-patch4-window8-256')
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained(model_dir)
model = Swinv2Model.from_pretrained(model_dir)
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
list(last_hidden_states.shape)
分类任务
from modelscope import snapshot_download
from transformers import AutoImageProcessor, Swinv2ForImageClassification
from PIL import Image
import requests, torch
model_dir = snapshot_download('Alien1996/Mirror_swinv2-base-patch4-window8-256')
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained(model_dir)
model = Swinv2ForImageClassification.from_pretrained(model_dir)
inputs = image_processor(image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
引用
@article{DBLP:journals/corr/abs-2111-09883,
author = {Ze Liu and
Han Hu and
Yutong Lin and
Zhuliang Yao and
Zhenda Xie and
Yixuan Wei and
Jia Ning and
Yue Cao and
Zheng Zhang and
Li Dong and
Furu Wei and
Baining Guo},
title = {Swin Transformer {V2:} Scaling Up Capacity and Resolution},
journal = {CoRR},
volume = {abs/2111.09883},
year = {2021},
url = {https://arxiv.org/abs/2111.09883},
eprinttype = {arXiv},
eprint = {2111.09883},
timestamp = {Thu, 02 Dec 2021 15:54:22 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2111-09883.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
评论