llava-v1.6-vicuna-13b-hf

我要开发同款
匿名用户2024年07月31日
103阅读

技术信息

官网地址
https://github.com/modelscope/swift/
开源地址
https://modelscope.cn/models/swift/llava-v1.6-vicuna-13b-hf

作品详情

LLaVa-Next, leveragig liuhaotia/llava-v1.6-vicua-13b as LLM

The LLaVA-NeXT model was proposed i LLaVA-NeXT: Improved reasoig, OCR, ad world kowledge by Haotia Liu, Chuyua Li, Yuheg Li, Bo Li, Yuaha Zhag, Sheg She, Yog Jae Lee. LLaVa-NeXT (also called LLaVa-1.6) improves upo LLaVa-1.5 by icreasig the iput image resolutio ad traiig o a improved visual istructio tuig dataset to improve OCR ad commo sese reasoig.

Disclaimer: The team releasig LLaVa-NeXT did ot write a model card for this model so this model card has bee writte by the Huggig Face team.

Model descriptio

LLaVa combies a pre-traied large laguage model with a pre-traied visio ecoder for multimodal chatbot use cases. LLaVA 1.6 improves o LLaVA 1.5 BY:

  • More diverse ad high quality data mixture
  • Dyamic high resolutio

image/pg

Iteded uses & limitatios

You ca use the raw model for tasks like image captioig, visual questio aswerig, multimodal chatbot use cases. See the model hub to look for other versios o a task that iterests you.

How to use

Here's the prompt template for this model:

"A chat betwee a curious huma ad a artificial itelligece assistat. The assistat gives helpful, detailed, ad polite aswers to the huma's questios. USER: <image>\What is show i this image? ASSISTANT:"

You ca load ad use the model like followig:

from trasformers import LlavaNextProcessor, LlavaNextForCoditioalGeeratio
import torch
from PIL import Image
import requests

processor = LlavaNextProcessor.from_pretraied("llava-hf/llava-v1.6-vicua-13b-hf")

model = LlavaNextForCoditioalGeeratio.from_pretraied("llava-hf/llava-v1.6-vicua-13b-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True) 
model.to("cuda:0")

# prepare image ad text prompt, usig the appropriate prompt template
url = "https://github.com/haotia-liu/LLaVA/blob/1a91fc274d7c35a9b50b3cb29c4247ae5837ce39/images/llava_v1_5_radar.jpg?raw=true"
image = Image.ope(requests.get(url, stream=True).raw)
prompt = "A chat betwee a curious huma ad a artificial itelligece assistat. The assistat gives helpful, detailed, ad polite aswers to the huma's questios. USER: <image>\What is show i this image? ASSISTANT:"

iputs = processor(prompt, image, retur_tesors="pt").to("cuda:0")

# autoregressively complete prompt
output = model.geerate(**iputs, max_ew_tokes=100)

prit(processor.decode(output[0], skip_special_tokes=True))

Model optimizatio

4-bit quatizatio through bitsadbytes library

First make sure to istall bitsadbytes, pip istall bitsadbytes ad make sure to have access to a CUDA compatible GPU device. Simply chage the sippet above with:

model = LlavaNextForCoditioalGeeratio.from_pretraied(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True,
+   load_i_4bit=True
)

Use Flash-Attetio 2 to further speed-up geeratio

First make sure to istall flash-att. Refer to the origial repository of Flash Attetio regardig that package istallatio. Simply chage the sippet above with:

model = LlavaNextForCoditioalGeeratio.from_pretraied(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True,
+   use_flash_attetio_2=True
).to(0)

BibTeX etry ad citatio ifo

@misc{liu2023improved,
      title={Improved Baselies with Visual Istructio Tuig}, 
      author={Haotia Liu ad Chuyua Li ad Yuheg Li ad Yog Jae Lee},
      year={2023},
      eprit={2310.03744},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

功能介绍

LLaVa-Next, leveraging liuhaotian/llava-v1.6-vicuna-13b as LLM The LLaVA-NeXT model was proposed in

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论