混元打标器

我要开发同款
匿名用户2024年07月31日
84阅读

技术信息

开源地址
https://modelscope.cn/models/shiertier/HunyuanCaptioner
授权协议
other

作品详情

Huyua-Captioer

Huyua-Captioer meets the eed of text-to-image techiques by maitaiig a high degree of image-text cosistecy. It ca geerate high-quality image descriptios from a variety of agles, icludig object descriptio, objects relatioships, backgroud iformatio, image style, etc. Our code is based o LLaVA implemetatio.

Istructios

a. Istall depedecies

The depedecies ad istallatio are basically the same as the base model.

b. Data dowload

cd HuyuaDiT
wget -O ./dataset/data_demo.zip https://dit.huyua.tecet.com/dowload/HuyuaDiT/data_demo.zip
uzip ./dataset/data_demo.zip -d ./dataset
mkdir ./dataset/porcelai/arrows ./dataset/porcelai/jsos

c. Model dowload

# Use the huggigface-cli tool to dowload the model.
huggigface-cli dowload Tecet-Huyua/HuyuaCaptioer --local-dir ./ckpts/captioer

Iferece

Curret supported prompt templates:

Mode Prompt template Descriptio
captio_zh 描述这张图片 Captio i Chiese
isert_cotet 根据提示词“{}”,描述这张图片 Isert specific kowledge ito captio
captio_e Please describe the cotet of this image Captio i Eglish

a. Sigle picture iferece i Chiese

pytho mllm/captio_demo.py --mode "captio_zh" --image_file "mllm/images/demo1.pg" --model_path "./ckpts/captioer"

b. Isert specific kowledge ito captio

pytho mllm/captio_demo.py --mode "isert_cotet" --cotet "宫保鸡丁" --image_file "mllm/images/demo2.pg" --model_path "./ckpts/captioer"

c. Sigle picture iferece i Eglish

pytho mllm/captio_demo.py --mode "captio_e" --image_file "mllm/images/demo3.pg" --model_path "./ckpts/captioer"

d. Multiple pictures iferece i Chiese

### Covert multiple pictures to csv file. 
pytho mllm/make_csv.py --img_dir "mllm/images" --iput_file "mllm/images/demo.csv"

### Multiple pictures iferece
pytho mllm/captio_demo.py --mode "captio_zh" --iput_file "mllm/images/demo.csv" --output_file "mllm/images/demo_res.csv" --model_path "./ckpts/captioer"

(Optioal) To covert the output csv file to Arrow format, please refer to Data Preparatio #3 for detailed istructios.

Gradio

To lauch a Gradio demo locally, please execute the followig commads sequetially. Esure each commad is ruig i the backgroud. For more detailed istructios, please refer to LLaVA.

cd mllm
pytho -m llava.serve.cotroller --host 0.0.0.0 --port 10000
pytho -m llava.serve.gradio_web_server --cotroller http://0.0.0.0:10000 --model-list-mode reload --port 443
pytho -m llava.serve.model_worker --host 0.0.0.0 --cotroller http://0.0.0.0:10000 --port 40000 --worker http://0.0.0.0:40000 --model-path "../ckpts/captioer" --model-ame LlavaMistral

The the demo ca be accessed through http://0.0.0.0:443. It should be oted that the 0.0.0.0 here eeds to be X.X.X.X with your server IP.

功能介绍

Hunyuan-Captioner Hunyuan-Captioner meets the need of text-to-image techniques by maintaining a high

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论