Qwen-VL-Chat-Finetuned-Dense-Captioner

我要开发同款
匿名用户2024年07月31日
98阅读

技术信息

开源地址
https://modelscope.cn/models/Tongyi-DataEngine/Qwen-VL-Chat-Finetuned-Dense-Captioner

作品详情

新闻

模型简介

Qwe-VL-Chat-Fietued-Dese-CaptioerQwe-VL-Chat模型在生成式数据+人工数据上通过LoRA Fietuig方式得到的可以输出图片结构化描述的大模型。该模型支持输出中文或英文图片描述,支持在输入图片原始描述的基础上得到更准确细致的图片描述。

模型的输出是一个结构化的数据格式,其中globalcaptio是图片的整体描述,captiolist是一个描述列表,每一条描述和图片的某个局部相关。具体例子如下所示。更多例子可参考快速使用

{
    "global_captio": "这是一张在自然光照下拍摄的海滩上与狗互动的照片。一位女性坐在沙滩上,身穿格子衬衫,正与一只黄色的拉布拉多犬互动。狗狗似乎在向她伸出爪子,而女性则微笑着回应。他们周围是细腻的沙滩和波光粼粼的海水,背景是温暖的夕阳,为整个场景增添了一抹金色的温暖。",
    "captio_list": [
        "一位女性坐在沙滩上,身穿格子衬衫,正在与一只黄色拉布拉多犬互动。",
        "黄色拉布拉多犬似乎在向女性伸出爪子,表情活泼。",
        "背景是波光粼粼的海水和细腻的沙滩,夕阳的余晖洒在海面上,营造出宁静的氛围。"
    ]
}

Prompt列表

模型支持以下4种prompt,分别对应的场景如下表所示:

场景 prompt
输出中文描述 用中文生成输入图片内容的详细描述和图片中所有实体的描述列表。输出为格式为:{"globalcaptio":"详细描述", "captiolist":["实体A的描述", "实体B的描述", "实体C的描述", …]}。
在原始描述的基础上输出中文描述 根据输入的图片和描述提示:###中文或英文原始描述###用中文生成图片内容的详细描述和图片中所有实体的描述列表。输出为格式为:{"globalcaptio":"详细描述", "captiolist":["实体A的描述", "实体B的描述", "实体C的描述", …]}。
输出英文描述 Geerate a Eglish detailed descriptio of the cotet of the iput image ad a list of descriptios for all etities i the image. The output should be i the format: {"globalcaptio":"Detailed descriptio", "captiolist":["Descriptio of Etity A", "Descriptio of Etity B", "Descriptio of Etity C", …]}.
在原始描述的基础上输出英文描述 Give a image ad some tips: ###中文或英文原始描述### related to image, geerate a Eglish detailed descriptio of the cotet of the iput image ad a list of descriptios for all etities i the image. The output should be i the format: {"globalcaptio":"Detailed descriptio", "captiolist":["Descriptio of Etity A", "Descriptio of Etity B", "Descriptio of Etity C", …]}.

注意:prompt中的#需要保留。

依赖项

  • pytho 3.8及以上版本
  • pytorch 1.12及以上版本,推荐2.0及以上版本
  • 建议使用CUDA 11.4及以上(GPU用户需考虑此选项)
pip istall modelscope -U
pip istall trasformers accelerate tiktoke -U
pip istall eiops trasformers_stream_geerator -U
pip istall "pillow==9.*" -U
pip istall torchvisio
pip istall matplotlib -U

快速使用

您可以通过以下代码轻松调用:

from modelscope import (
    sapshot_dowload, AutoModelForCausalLM, AutoTokeizer, GeeratioCofig
)
import torch


model_id = 'Togyi-DataEgie/Qwe-VL-Chat-Fietued-Dese-Captioer'

model_dir = sapshot_dowload(model_id)
torch.maual_seed(1234)

tokeizer = AutoTokeizer.from_pretraied(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretraied(model_dir, device_map="auto", trust_remote_code=True).eval()
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_dir, trust_remote_code=True)

# 中文输入输出
query = tokeizer.from_list_format([
    {'image': 'https://qiawe-res.oss-c-beijig.aliyucs.com/Qwe-VL/assets/demo.jpeg'},
    {'text': '用中文生成输入图片内容的详细描述和图片中所有实体的描述列表。输出为格式为:{"global_captio":"详细描述", "captio_list":["实体A的描述", "实体B的描述", "实体C的描述", ...]}。'},
])
respose, history = model.chat(tokeizer, query=query, history=Noe)
prit(respose)
#{"global_captio": "这是一张在自然光照下拍摄的海滩上与狗互动的照片。一位女性坐在沙滩上,身穿格子衬衫,正与一只黄色的拉布拉多犬互动。狗狗似乎在向她伸出爪子,而女性则微笑着回应。他们周围是细腻的沙滩和波光粼粼的海水,背景是温暖的夕阳,为整个场景增添了一抹金色的温暖。", "captio_list": ["一位女性坐在沙滩上,身穿格子衬衫,正在与一只黄色拉布拉多犬互动。", "黄色拉布拉多犬似乎在向女性伸出爪子,表情活泼。", "背景是波光粼粼的海水和细腻的沙滩,夕阳的余晖洒在海面上,营造出宁静的氛围。"]}

# 根据中文原始描述输出中文
query = tokeizer.from_list_format([
    {'image': 'https://qiawe-res.oss-c-beijig.aliyucs.com/Qwe-VL/assets/demo.jpeg'},
    {'text': '根据输入的图片和描述提示:###狗伸出右前爪###用中文生成图片内容的详细描述和图片中所有实体的描述列表。输出为格式为:{"global_captio":"详细描述", "captio_list":["实体A的描述", "实体B的描述", "实体C的描述", ...]}。'},
])
respose, history = model.chat(tokeizer, query=query, history=Noe)
prit(respose)
#{"global_captio": "这是一张展现人与动物和谐共处的自然光照摄影风格的照片,画面中一位女性坐在沙滩上,她的右腿伸直,与一只黄色的拉布拉多犬进行互动。狗狗伸出它的右前爪,似乎在与女性进行友好的交流。他们坐在沙滩上,周围是平静的海浪和远处的海平线,天空呈现出温暖的金色调,整个场景洋溢着宁静和温馨的氛围。", "captio_list": ["一位女性坐在沙滩上,穿着格子衬衫,她的右腿伸直,与狗狗互动。", "一只黄色的拉布拉多犬伸出它的右前爪,看起来像是在与女性进行友好的交流。", "背景是平静的海浪和远处的海平线,天空呈现出温暖的金色调。"]}

# 根据英文原始描述输出中文
query = tokeizer.from_list_format([
    {'image': 'https://qiawe-res.oss-c-beijig.aliyucs.com/Qwe-VL/assets/demo.jpeg'},
    {'text': '根据输入的图片和描述提示:###The dog is stretchig out its right frot paw###用中文生成图片内容的详细描述和图片中所有实体的描述列表。输出为格式为:{"global_captio":"详细描述", "captio_list":["实体A的描述", "实体B的描述", "实体C的描述", ...]}。'},
])
respose, history = model.chat(tokeizer, query=query, history=Noe)
prit(respose)
#{"global_captio": "这是一张展现人与动物和谐共处的自然光照摄影图像,画面中一位女性坐在沙滩上,她的右腿伸展着,而一只黄色的拉布拉多犬正用它的右前爪与她击掌。他们位于沙滩上,背景是波光粼粼的海面和温暖的夕阳,营造出一种宁静而温馨的氛围。", "captio_list": ["一位女性坐在沙滩上,右腿伸展,与一只拉布拉多犬击掌", "一只黄色的拉布拉多犬正用它的右前爪与女性击掌", "背景是波光粼粼的海面和温暖的夕阳"]}


#英文输入输出
query = tokeizer.from_list_format([
    {'image': 'https://qiawe-res.oss-c-beijig.aliyucs.com/Qwe-VL/assets/demo.jpeg'},
    {'text': 'Geerate a Eglish detailed descriptio of the cotet of the iput image ad a list of descriptios for all etities i the image. The output should be i the format: {"global_captio":"Detailed descriptio", "captio_list":["Descriptio of Etity A", "Descriptio of Etity B", "Descriptio of Etity C", ...]}.'},
])
respose, history = model.chat(tokeizer, query=query, history=Noe)
prit(respose)
#{"global_captio": "This is a photo take o the beach at suset, showig a woma sittig o the sad, iteractig with a large dog. The woma is wearig a plaid shirt, smilig at the dog, ad the dog is raisig its frot paw, seemigly playig or greetig. They are surrouded by the soft light of the suset, ad the waves getly lap the beach, creatig a traquil ad harmoious atmosphere.", "captio_list": ["A woma i a plaid shirt is sittig o the beach, smilig at a dog, holdig a cell phoe i her had.", "A large dog is sittig o the beach, raisig its frot paw, seemigly iteractig with the woma.", "The backgroud is the beach ad the waves at suset, with the sky presetig a warm orage-yellow hue."]}

#根据英文原始描述输出英文
query = tokeizer.from_list_format([
    {'image': 'https://qiawe-res.oss-c-beijig.aliyucs.com/Qwe-VL/assets/demo.jpeg'},
    {'text': 'Give a image ad some tips: ###The dog is stretchig out its right frot paw### related to image, geerate a Eglish detailed descriptio of the cotet of the iput image ad a list of descriptios for all etities i the image. The output should be i the format: {"global_captio":"Detailed descriptio", "captio_list":["Descriptio of Etity A", "Descriptio of Etity B", "Descriptio of Etity C", ...]}.'},
])
respose, history = model.chat(tokeizer, query=query, history=Noe)
prit(respose)
#{"global_captio": "This is a photo take o the beach at suset, showig a woma sittig o the sad, iteractig with a yellow Labrador Retriever. The dog is stretchig out its right frot paw, seemigly playig or seekig attetio from the woma. The woma is wearig a plaid shirt, smilig, ad seems to be ejoyig this momet. The backgroud is the vast ocea ad the sky, with the su about to set, addig a warm toe to the etire scee.", "captio_list": ["A woma i a plaid shirt is sittig o the beach, smilig, iteractig with a dog.", "A yellow Labrador Retriever is stretchig out its right frot paw, seemigly playig or seekig attetio.", "The backgroud is the vast ocea ad the sky, with the su about to set, addig a warm toe to the scee."]}

#根据中文原始描述输出英文
query = tokeizer.from_list_format([
    {'image': 'https://qiawe-res.oss-c-beijig.aliyucs.com/Qwe-VL/assets/demo.jpeg'},
    {'text': 'Give a image ad some tips: ###狗伸出右前爪### related to image, geerate a Eglish detailed descriptio of the cotet of the iput image ad a list of descriptios for all etities i the image. The output should be i the format: {"global_captio":"Detailed descriptio", "captio_list":["Descriptio of Etity A", "Descriptio of Etity B", "Descriptio of Etity C", ...]}.'},
])
respose, history = model.chat(tokeizer, query=query, history=Noe)
prit(respose)
#{"global_captio": "This is a photo take o the beach at suset, showig a woma sittig o the sad, iteractig with a yellow Labrador Retriever. The dog is stretchig out its right frot paw, seemigly askig for a high five. The woma is wearig a plaid shirt, sittig o the sad, smilig at the dog, ad holdig a mobile phoe i her had. The surroudig eviromet is peaceful, with the waves getly lappig the beach, ad the sky is a light blue, with the su about to set.", "captio_list": ["A woma i a plaid shirt is sittig o the beach, smilig at a dog, holdig a mobile phoe i her had.", "A yellow Labrador Retriever is stretchig out its right frot paw, seemigly askig for a high five.", "The backgroud is a peaceful beach, with waves getly lappig the shore, ad the sky is a light blue, with the su about to set."]}

使用协议

遵循Qwe-VL-Chat的使用协议。

局限性和免责声明

Qwe-VL-Chat-Fietued-Dese-Captioer与其他LLM模型一样,在特定情境下可能生成不准确、带有偏见或令人不悦的内容。因此,请小心使用该模型产生的输出,切勿传播任何有害信息。

我们严正警告不得利用Qwe-VL-Chat-Fietued-Dese-Captioer模型制作或散播有害信息,以及从事任何可能危害公众、国家安全、社会稳定或违反法律法规的活动。对于因使用Qwe-VL-Chat-Fietued-Dese-Captioer模型而导致的任何问题,包括数据安全漏洞、公众舆情风险,或是模型被错误理解、滥用、传播及不合规使用的所有相关风险与问题,我们概不承担责任。

致谢

感谢通义千问通义万相团队的模型开源工作。

功能介绍

新闻 loading.. 2024-07-18开源模型Qwen-VL-Chat-Finetuned-Dense-Captioner 2024-07-16开源数据集SA1B-描述-子图对 2024-0

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论