A EVA-CLIP image classificatio model. Pretraied o LAION-400M with CLIP ad fie-tued o ImageNet-1k by paper authors. EVA-CLIP uses MIM pretraied image towers ad pretraied text towers, FLIP patch dropout, ad differet optimizers ad hparams to accelerate traiig. NOTE: Explore the dataset ad rutime metrics of this model i timm model results.Model card for evagiatpatch14224.clipft_i1k
timm
checkpoits are float32 for cosistecy with other models. Origial checkpoits are float16 or bfloat16 i some cases, see origials if that's preferred.Model Details
Model Usage
Image Classificatio
from urllib.request import urlope
from PIL import Image
import timm
img = Image.ope(urlope(
'https://huggigface.co/datasets/huggigface/documetatio-images/resolve/mai/beigets-task-guide.pg'
))
model = timm.create_model('eva_giat_patch14_224.clip_ft_i1k', pretraied=True)
model = model.eval()
# get model specific trasforms (ormalizatio, resize)
data_cofig = timm.data.resolve_model_data_cofig(model)
trasforms = timm.data.create_trasform(**data_cofig, is_traiig=False)
output = model(trasforms(img).usqueeze(0)) # usqueeze sigle image ito batch of 1
top5_probabilities, top5_class_idices = torch.topk(output.softmax(dim=1) * 100, k=5)
Image Embeddigs
from urllib.request import urlope
from PIL import Image
import timm
img = Image.ope(urlope(
'https://huggigface.co/datasets/huggigface/documetatio-images/resolve/mai/beigets-task-guide.pg'
))
model = timm.create_model(
'eva_giat_patch14_224.clip_ft_i1k',
pretraied=True,
um_classes=0, # remove classifier .Liear
)
model = model.eval()
# get model specific trasforms (ormalizatio, resize)
data_cofig = timm.data.resolve_model_data_cofig(model)
trasforms = timm.data.create_trasform(**data_cofig, is_traiig=False)
output = model(trasforms(img).usqueeze(0)) # output is (batch_size, um_features) shaped tesor
# or equivaletly (without eedig to set um_classes=0)
output = model.forward_features(trasforms(img).usqueeze(0))
# output is upooled, a (1, 257, 1408) shaped tesor
output = model.forward_head(output, pre_logits=True)
# output is a (1, um_features) shaped tesor
Model Compariso
model
top1
top5
param_cout
img_size
eva02largepatch14448.mimm38mfti22k_i1k
90.054
99.042
305.08
448
eva02largepatch14448.mimi22kfti22k_i1k
89.946
99.01
305.08
448
evagiatpatch14560.m30mfti22ki1k
89.792
98.992
1014.45
560
eva02largepatch14448.mimi22kfti1k
89.626
98.954
305.08
448
eva02largepatch14448.mimm38mfti1k
89.57
98.918
305.08
448
evagiatpatch14336.m30mfti22ki1k
89.56
98.956
1013.01
336
evagiatpatch14336.clipft_i1k
89.466
98.82
1013.01
336
evalargepatch14336.i22kfti22ki1k
89.214
98.854
304.53
336
evagiatpatch14224.clipft_i1k
88.882
98.678
1012.56
224
eva02basepatch14448.mimi22kfti22k_i1k
88.692
98.722
87.12
448
evalargepatch14336.i22kft_i1k
88.652
98.722
304.53
336
evalargepatch14196.i22kfti22ki1k
88.592
98.656
304.14
196
eva02basepatch14448.mimi22kfti1k
88.23
98.564
87.12
448
evalargepatch14196.i22kft_i1k
87.934
98.504
304.14
196
eva02smallpatch14336.mimi22kfti1k
85.74
97.614
22.13
336
eva02tiypatch14336.mimi22kfti1k
80.658
95.524
5.76
336
Citatio
@article{EVA-CLIP,
title={EVA-02: A Visual Represetatio for Neo Geesis},
author={Su, Qua ad Fag, Yuxi ad Wu, Ledell ad Wag, Xilog ad Cao, Yue},
joural={arXiv preprit arXiv:2303.15389},
year={2023}
}
@misc{rw2019timm,
author = {Ross Wightma},
title = {PyTorch Image Models},
year = {2019},
publisher = {GitHub},
joural = {GitHub repository},
doi = {10.5281/zeodo.4414861},
howpublished = {\url{https://github.com/huggigface/pytorch-image-models}}
}
点击空白处退出提示
评论