Huyua-Captioer meets the eed of text-to-image techiques by maitaiig a high degree of image-text cosistecy. It ca geerate high-quality image descriptios from a variety of agles, icludig object descriptio, objects relatioships, backgroud iformatio, image style, etc. Our code is based o LLaVA implemetatio. a. Istall depedecies The depedecies ad istallatio are basically the same as the b. Data dowload c. Model dowload Curret supported prompt templates: a. Sigle picture iferece i Chiese b. Isert specific kowledge ito captio c. Sigle picture iferece i Eglish d. Multiple pictures iferece i Chiese (Optioal) To covert the output csv file to Arrow format, please refer to Data Preparatio #3 for detailed istructios. To lauch a Gradio demo locally, please execute the followig commads sequetially. Esure each commad is ruig i the backgroud. For more detailed istructios, please refer to LLaVA. The the demo ca be accessed through http://0.0.0.0:443. It should be oted that the 0.0.0.0 here eeds to be X.X.X.X with your server IP.Huyua-Captioer
Istructios
cd HuyuaDiT
wget -O ./dataset/data_demo.zip https://dit.huyua.tecet.com/dowload/HuyuaDiT/data_demo.zip
uzip ./dataset/data_demo.zip -d ./dataset
mkdir ./dataset/porcelai/arrows ./dataset/porcelai/jsos
# Use the huggigface-cli tool to dowload the model.
huggigface-cli dowload Tecet-Huyua/HuyuaCaptioer --local-dir ./ckpts/captioer
Iferece
Mode
Prompt template
Descriptio
captio_zh
描述这张图片
Captio i Chiese
isert_cotet
根据提示词“{}”,描述这张图片
Isert specific kowledge ito captio
captio_e
Please describe the cotet of this image
Captio i Eglish
pytho mllm/captio_demo.py --mode "captio_zh" --image_file "mllm/images/demo1.pg" --model_path "./ckpts/captioer"
pytho mllm/captio_demo.py --mode "isert_cotet" --cotet "宫保鸡丁" --image_file "mllm/images/demo2.pg" --model_path "./ckpts/captioer"
pytho mllm/captio_demo.py --mode "captio_e" --image_file "mllm/images/demo3.pg" --model_path "./ckpts/captioer"
### Covert multiple pictures to csv file.
pytho mllm/make_csv.py --img_dir "mllm/images" --iput_file "mllm/images/demo.csv"
### Multiple pictures iferece
pytho mllm/captio_demo.py --mode "captio_zh" --iput_file "mllm/images/demo.csv" --output_file "mllm/images/demo_res.csv" --model_path "./ckpts/captioer"
Gradio
cd mllm
pytho -m llava.serve.cotroller --host 0.0.0.0 --port 10000
pytho -m llava.serve.gradio_web_server --cotroller http://0.0.0.0:10000 --model-list-mode reload --port 443
pytho -m llava.serve.model_worker --host 0.0.0.0 --cotroller http://0.0.0.0:10000 --port 40000 --worker http://0.0.0.0:40000 --model-path "../ckpts/captioer" --model-ame LlavaMistral
点击空白处退出提示
评论