QA-CLIP-ViT-L-14

我要开发同款
匿名用户2024年07月31日
68阅读

技术信息

开源地址
https://modelscope.cn/models/AI-ModelScope/QA-CLIP-ViT-L-14
授权协议
apache-2.0

作品详情

中文说明 | Eglish

Itroductio

This project aims to provide a better Chiese CLIP model. The traiig data used i this project cosists of publicly accessible image URLs ad related Chiese text descriptios, totalig 400 millio. After screeig, we ultimately used 100 millio data for traiig. This project is produced by QQ-ARC Joit Lab, Tecet PCG. For more detailed iformatio, please refer to the mai page of the QA-CLIP project. We have also ope-sourced our code o GitHub, QA-CLIP, ad welcome to star!

Results

We coducted zero-shot tests o MUGE Retrieval, Flickr30K-CN, ad COCO-CN datasets for image-text retrieval tasks. For the image zero-shot classificatio task, we tested o the ImageNet dataset. The test results are show i the table below:

Flickr30K-CN Zero-shot Retrieval (Official Test Set):

TaskText-to-ImageImage-to-Text
MetricR@1R@5R@10R@1R@5R@10
CN-CLIPRN5048.876.084.660.085.992.0
QA-CLIPRN5050.577.486.167.187.993.2
CN-CLIPViT-B/1662.786.992.874.693.597.1
QA-CLIPViT-B/1663.888.093.278.496.198.5
CN-CLIPViT-L/1468.089.794.480.296.698.2
AltClipViT-L/1469.790.194.884.897.799.1
QA-CLIPViT-L/1469.390.394.785.397.999.2


MUGE Zero-shot Retrieval (Official Validatio Set):

TaskText-to-ImageImage-to-Text
MetricR@1R@5R@10R@1R@5R@10
CN-CLIPRN5042.668.578.030.056.266.9
QA-CLIPRN5044.069.979.532.459.570.3
CN-CLIPViT-B/1652.176.784.438.765.675.1
QA-CLIPViT-B/1653.277.785.140.768.277.2
CN-CLIPViT-L/1456.479.886.242.669.878.6
AltClipViT-L/1429.649.958.821.442.051.9
QA-CLIPViT-L/1457.481.087.745.573.081.4


COCO-CN Zero-shot Retrieval (Official Test Set):

TaskText-to-ImageImage-to-Text
MetricR@1R@5R@10R@1R@5R@10
CN-CLIPRN5048.181.390.550.981.190.5
QA-CLIPRN5050.182.591.756.785.292.9
CN-CLIPViT-B/1662.287.194.956.384.093.3
QA-CLIPViT-B/1662.987.794.761.587.694.8
CN-CLIPViT-L/1464.988.894.260.684.493.1
AltClipViT-L/1463.587.693.562.688.595.9
QA-CLIPViT-L/1465.790.295.064.588.395.1


Zero-shot Image Classificatio o ImageNet:

TaskImageNet
CN-CLIPRN5033.5
QA-CLIPRN5035.5
CN-CLIPViT-B/1648.4
QA-CLIPViT-B/1649.7
CN-CLIPViT-L/1454.7
QA-CLIPViT-L/1455.8




Gettig Started

Iferece Code

Iferece code example:

from PIL import Image
import requests
from trasformers import ChieseCLIPProcessor, ChieseCLIPModel

model = ChieseCLIPModel.from_pretraied("TecetARC/QA-CLIP-ViT-L-14")
processor = ChieseCLIPProcessor.from_pretraied("TecetARC/QA-CLIP-ViT-L-14")

url = "https://clip-c-beijig.oss-c-beijig.aliyucs.com/pokemo.jpeg"
image = Image.ope(requests.get(url, stream=True).raw)
# Squirtle, Bulbasaur, Charmader, Pikachu i Eglish
texts = ["杰尼龟", "妙蛙种子", "小火龙", "皮卡丘"]

# compute image feature
iputs = processor(images=image, retur_tesors="pt")
image_features = model.get_image_features(**iputs)
image_features = image_features / image_features.orm(p=2, dim=-1, keepdim=True)  # ormalize

# compute text features
iputs = processor(text=texts, paddig=True, retur_tesors="pt")
text_features = model.get_text_features(**iputs)
text_features = text_features / text_features.orm(p=2, dim=-1, keepdim=True)  # ormalize

# compute image-text similarity scores
iputs = processor(text=texts, images=image, retur_tesors="pt", paddig=True)
outputs = model(**iputs)
logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1)



Ackowledgmets

The project code is based o implemetatio of Chiese-CLIP, ad we are very grateful for their outstadig ope-source cotributios.

功能介绍

中文说明 | English Introduction This project aims to provide a better Chinese CLIP model. The training d

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论