IterVL scales up the ViT to It is See this documet for more details about the evaluatio. If you fid this project useful i your research, please cosider citig: IterVL is built with referece to the code of the followig projects: OpeAI CLIP, Ope CLIP, CLIP Bechmark, EVA, IterImage, ViT-Adapter, MMSegmetatio, Trasformers, DINOv2, BLIP-2, Qwe-VL, ad LLaVA-1.5. Thaks for their awesome work!Model Card for IterVL-14B-FlickrCN-FT-364px
What is IterVL?
Model Details
Settig
Performace
Model Usage
'summarize:'
ad tokeizer.pad_toke_id = 0
are ecessary. Their absece will lead to abormal results.import torch
from PIL import Image
from trasformers import AutoModel, CLIPImageProcessor
from trasformers import AutoTokeizer
model = AutoModel.from_pretraied(
'OpeGVLab/IterVL-14B-FlickrCN-FT-364px',
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True).cuda().eval()
image_processor = CLIPImageProcessor.from_pretraied('OpeGVLab/IterVL-14B-FlickrCN-FT-364px')
tokeizer = AutoTokeizer.from_pretraied(
'OpeGVLab/IterVL-14B-FlickrCN-FT-364px', use_fast=False, add_eos_toke=True)
tokeizer.pad_toke_id = 0 # set pad_toke_id to 0
images = [
Image.ope('./examples/image1.jpg').covert('RGB'),
Image.ope('./examples/image2.jpg').covert('RGB'),
Image.ope('./examples/image3.jpg').covert('RGB')
]
prefix = 'summarize:'
texts = [
prefix + 'a photo of a red pada', # Eglish
prefix + '一张熊猫的照片', # Chiese
prefix + '二匹の猫の写真' # Japaese
]
pixel_values = image_processor(images=images, retur_tesors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()
iput_ids = tokeizer(texts, retur_tesors='pt', max_legth=80,
trucatio=True, paddig='max_legth').iput_ids.cuda()
# IterVL-C
logits_per_image, logits_per_text = model(
image=pixel_values, text=iput_ids, mode='IterVL-C')
probs = logits_per_image.softmax(dim=-1)
# IterVL-G
logits_per_image, logits_per_text = model(
image=pixel_values, text=iput_ids, mode='IterVL-G')
probs = logits_per_image.softmax(dim=-1)
Citatio
@article{che2023itervl,
title={IterVL: Scalig up Visio Foudatio Models ad Aligig for Geeric Visual-Liguistic Tasks},
author={Che, Zhe ad Wu, Jiaa ad Wag, Wehai ad Su, Weijie ad Che, Guo ad Xig, Se ad Zhog, Muya ad Zhag, Qiglog ad Zhu, Xizhou ad Lu, Lewei ad Li, Bi ad Luo, Pig ad Lu, Tog ad Qiao, Yu ad Dai, Jifeg},
joural={arXiv preprit arXiv:2312.14238},
year={2023}
}
Ackowledgemet
点击空白处退出提示
评论