匿名用户2024年07月31日
124阅读

技术信息

官网地址
https://github.com/OpenBMB
开源地址
https://modelscope.cn/models/OpenBMB/MiniCPM-V

作品详情

MiiCPM-V

News

MiiCPM-V (i.e., OmiLMM-3B) is a efficiet versio with promisig performace for deploymet. The model is built based o SigLip-400M ad MiiCPM-2.4B, coected by a perceiver resampler. Notable features of OmiLMM-3B iclude:

  • ⚡️ High Efficiecy.

    MiiCPM-V ca be efficietly deployed o most GPU cards ad persoal computers, ad eve o ed devices such as mobile phoes. I terms of visual ecodig, we compress the image represetatios ito 64 tokes via a perceiver resampler, which is sigificatly fewer tha other LMMs based o MLP architecture (typically > 512 tokes). This allows OmiLMM-3B to operate with much less memory cost ad higher speed durig iferece.

  • ? Promisig Performace.

    MiiCPM-V achieves state-of-the-art performace o multiple bechmarks (icludig MMMU, MME, ad MMbech, etc) amog models with comparable sizes, surpassig existig LMMs built o Phi-2. It eve achieves comparable or better performace tha the 9.6B Qwe-VL-Chat.

  • ? Biligual Support.

    MiiCPM-V is the first ed-deployable LMM supportig biligual multimodal iteractio i Eglish ad Chiese. This is achieved by geeralizig multimodal capabilities across laguages, a techique from the ICLR 2024 spotlight paper.

Evaluatio

Model Size MME MMB dev (e) MMB dev (zh) MMMU val CMMMU val
LLaVA-Phi 3.0B 1335 59.8 - - -
MobileVLM 3.0B 1289 59.6 - - -
Imp-v1 3B 1434 66.5 - - -
Qwe-VL-Chat 9.6B 1487 60.6 56.7 35.9 30.7
CogVLM 17.4B 1438 63.7 53.8 32.1 -
MiiCPM-V 3B 1452 67.9 65.3 37.2 32.1

Examples

Demo

Click here to try out the Demo of MiiCPM-V.

Deploymet o Mobile Phoe

Curretly MiiCPM-V (i.e., OmiLMM-3B) ca be deployed o mobile phoes with Adroid ad Harmoy operatig systems. ? Try it out here.

Usage

Iferece usig Huggigface trasformers o Nivdia GPUs or Mac with MPS (Apple silico or AMD GPUs). Requiremets tested o pytho 3.10:

Pillow==10.1.0
timm==0.9.10
torch==2.1.2
torchvisio==0.16.2
trasformers==4.36.0
setecepiece==0.1.99
# test.py
import torch
from PIL import Image
from modelscope import AutoModel, AutoTokeizer

model = AutoModel.from_pretraied('opebmb/MiiCPM-V', trust_remote_code=True, torch_dtype=torch.bfloat16)
# For Nvidia GPUs support BF16 (like A100, H100, RTX3090)
model = model.to(device='cuda', dtype=torch.bfloat16)
# For Nvidia GPUs do NOT support BF16 (like V100, T4, RTX2080)
#model = model.to(device='cuda', dtype=torch.float16)
# For Mac with MPS (Apple silico or AMD GPUs).
# Ru with `PYTORCH_ENABLE_MPS_FALLBACK=1 pytho test.py`
#model = model.to(device='mps', dtype=torch.float16)

tokeizer = AutoTokeizer.from_pretraied('opebmb/MiiCPM-V', trust_remote_code=True)
model.eval()

image = Image.ope('xx.jpg').covert('RGB')
questio = 'What is i the image?'
msgs = [{'role': 'user', 'cotet': questio}]

aswer, cotext, _ = model.chat(
    image=image,
    msgs=msgs,
    cotext=Noe,
    tokeizer=tokeizer,
    samplig=True,
    temperature=0.7
)
prit(aswer)

Licese

Model Licese

  • The code i this repo is released uder the Apache-2.0 Licese.
  • The usage of MiiCPM-V series model weights must strictly follow MiiCPM Model Licese.md.
  • The models ad weights of MiiCPM are completely free for academic research. after fillig out a "questioaire" for registratio, are also available for free commercial use.

Statemet

  • As a LLM, MiiCPM-V geerates cotets by learig a large mout of texts, but it caot comprehed, express persoal opiios or make value judgemet. Aythig geerated by MiiCPM-V does ot represet the views ad positios of the model developers
  • We will ot be liable for ay problems arisig from the use of the MiCPM-V ope Source model, icludig but ot limited to data security issues, risk of public opiio, or ay risks ad problems arisig from the misdirectio, misuse, dissemiatio or misuse of the model.

功能介绍

MiniCPM-V News [4/11]?MiniCPM-V-2.0 is out. MiniCPM-V (i.e., OmniLMM-3B) is an efficient version w

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论