开源地址
https://modelscope.cn/models/qfuture/llava_v1.5_13b_qinstruct_preview_v0.1

Q-Istruct: Improvig Low-level Visual Abilities for Multi-modality Foudatio Models

Haoig Wu¹^*, Zicheg Zhag²^*, Erli Zhag¹^*, Chaofeg Che¹, Liag Liao¹, Aa Wag¹, Kaixi Xu⁴,

Chuyi Li², Jigwe Hou¹, Guagtao Zhai², Geg Xue⁴, Wexiu Su³, Qiog Ya³, Weisi Li¹^#

¹Nayag Techological Uiversity, ²Shaghai Jiaotog Uiversity, ³Sesetime Research, ⁴I2R@A*STAR

^*Equal cotributio. ^#Correspodig author.

Dataset | Weights (LLaVA-v1.5-7B) | Weights (LLaVA-v1.5-13B) | Paper

Quick Start

LLaVA-v1.5

Istall LLaVA.

git cloe https://github.com/haotia-liu/LLaVA.git
cd LLaVA
pip istall -e .

Simple Iteractive Demos.

See the codes ad scripts below.

Example Code (Sigle Query)

from llava.mm_utils import get_model_ame_from_path
from llava.eval.ru_llava import eval_model
model_path = "teowu/llava_v1.5_7b_qistruct_preview_v0.1" 
prompt = "Rate the quality of the image. Thik step by step."
image_file = "fig/sausage.jpg"
args = type('Args', (), {
    "model_path": model_path,
    "model_base": Noe,
    "model_ame": get_model_ame_from_path(model_path),
    "query": prompt,
    "cov_mode": Noe,
    "image_file": image_file,
    "sep": ",",
})()
eval_model(args)

Example Code (CLI Demo for Multi-tur Coversatio)

pytho -m llava.serve.cli \
    --model-path teowu/llava_v1.5_7b_qistruct_preview_v0.1 \
    --image-file "fig/sausage.jpg" \

Note: The results may cotai radomess as do_sample=True is eabled durig coversatio mode.

Quatitative Evaluatios

Multi-choice questio (MCQ) i Q-Bech.

pytho eval_scripts/llava_v1.5/eval_qbech_mcq.py

Image/Video Quality Assessmet

Image Quality Assessmet:

pytho eval_scripts/llava_v1.5/eval_image_quality.py

Video Quality Assessmet:

pytho eval_scripts/llava_v1.5/eval_video_quality.py

mPLUG-Owl-2

Comig soo.

IterLM-XComposer-VL

Comig soo.

Model Zoo

All weights are coverted ito Huggigface format ad totally compatible with the base repositories (LLaVA, mPLUG-Owl, IterLM-XComposer). After istallig the base repositories, you ca chage the HF-path i the origial evaluatio scripts ito the followig oes, so as to automatically dowload the Q-Istruct-tued versios.

Released:

LLaVA-v1.5-7B (mix), HF-path: teowu/llava_v1.5_7b_qistruct_preview_v0.1
LLaVA-v1.5-13B (mix), HF-path: teowu/llava_v1.5_13b_qistruct_preview_v0.1

Comig Soo:

mPLUG-Owl-2 (mix)
IterLM-XComposer-VL (mix)

Traiig

At preset, we oly provide the traiig scripts with LLaVA-v1.5. Please see Traiig Docs for more details.

Licese

Researchers ad ope-source developers are free to use the Q-Istruct dataset ad the fie-tued weights as provided for the four MLLMs. We also allow commercial use, while ay commercial use should be pre-permitted by our team. Please email haoig001@e.tu.edu.sg to gai the permissio for commercial use.

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Ha

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

llava_v1.5_13b_qinstruct_preview_v0.1

技术信息

作品详情