InternVL-14B-FlickrCN-FT-364px

我要开发同款
匿名用户2024年07月31日
55阅读

技术信息

官网地址
https://github.com/opengvlab
开源地址
https://modelscope.cn/models/OpenGVLab/InternVL-14B-FlickrCN-FT-364px
授权协议
mit

作品详情

Model Card for IterVL-14B-FlickrCN-FT-364px

What is IterVL?

[Paper] [GitHub] [Chat Demo]

IterVL scales up the ViT to 6B parameters ad aligs it with LLM.

It is the largest ope-source visio/visio-laguage foudatio model (14B) to date, achievig 32 state-of-the-art performaces o a wide rage of tasks such as visual perceptio, cross-modal retrieval, multimodal dialogue, etc.

image/pg

Model Details

  • Model Type: fie-tued retrieval model
  • Support Tasks: image-text retrieval
  • Model Stats:
  • Params: 14B
  • Image size: 364 x 364
  • Fie-tue Dataset: FlickrCN

Settig

image/pg

Performace

See this documet for more details about the evaluatio.

image/pg

Model Usage

Note: the prefix 'summarize:' ad tokeizer.pad_toke_id = 0 are ecessary. Their absece will lead to abormal results.

import torch
from PIL import Image
from trasformers import AutoModel, CLIPImageProcessor
from trasformers import AutoTokeizer


model = AutoModel.from_pretraied(
    'OpeGVLab/IterVL-14B-FlickrCN-FT-364px',
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True).cuda().eval()

image_processor = CLIPImageProcessor.from_pretraied('OpeGVLab/IterVL-14B-FlickrCN-FT-364px')

tokeizer = AutoTokeizer.from_pretraied(
    'OpeGVLab/IterVL-14B-FlickrCN-FT-364px', use_fast=False, add_eos_toke=True)
tokeizer.pad_toke_id = 0  # set pad_toke_id to 0

images = [
    Image.ope('./examples/image1.jpg').covert('RGB'),
    Image.ope('./examples/image2.jpg').covert('RGB'),
    Image.ope('./examples/image3.jpg').covert('RGB')
]
prefix = 'summarize:'
texts = [
    prefix + 'a photo of a red pada',  # Eglish
    prefix + '一张熊猫的照片',  # Chiese
    prefix + '二匹の猫の写真'  # Japaese
]

pixel_values = image_processor(images=images, retur_tesors='pt').pixel_values
pixel_values = pixel_values.to(torch.bfloat16).cuda()
iput_ids = tokeizer(texts, retur_tesors='pt', max_legth=80,
                      trucatio=True, paddig='max_legth').iput_ids.cuda()

# IterVL-C
logits_per_image, logits_per_text = model(
    image=pixel_values, text=iput_ids, mode='IterVL-C')
probs = logits_per_image.softmax(dim=-1)

# IterVL-G
logits_per_image, logits_per_text = model(
    image=pixel_values, text=iput_ids, mode='IterVL-G')
probs = logits_per_image.softmax(dim=-1)

Citatio

If you fid this project useful i your research, please cosider citig:

@article{che2023itervl,
  title={IterVL: Scalig up Visio Foudatio Models ad Aligig for Geeric Visual-Liguistic Tasks},
  author={Che, Zhe ad Wu, Jiaa ad Wag, Wehai ad Su, Weijie ad Che, Guo ad Xig, Se ad Zhog, Muya ad Zhag, Qiglog ad Zhu, Xizhou ad Lu, Lewei ad Li, Bi ad Luo, Pig ad Lu, Tog ad Qiao, Yu ad Dai, Jifeg},
  joural={arXiv preprit arXiv:2312.14238},
  year={2023}
}

Ackowledgemet

IterVL is built with referece to the code of the followig projects: OpeAI CLIP, Ope CLIP, CLIP Bechmark, EVA, IterImage, ViT-Adapter, MMSegmetatio, Trasformers, DINOv2, BLIP-2, Qwe-VL, ad LLaVA-1.5. Thaks for their awesome work!

功能介绍

Model Card for InternVL-14B-FlickrCN-FT-364px What is InternVL? [Paper] [GitHub] [Chat Demo] Inte

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论