japanese-stable-clip-vit-l-16

我要开发同款
匿名用户2024年07月31日
50阅读

技术信息

开源地址
https://modelscope.cn/models/AI-ModelScope/japanese-stable-clip-vit-l-16
授权协议
other

作品详情

Japaese Stable CLIP ViT-L/16

Model Details

Japaese Stable CLIP is a Japaese CLIP (Cotrastive Laguage-Image Pre-Traiig) model that eables to map both Japaese texts ad images to the same embeddig space. This model aloe is capable of tasks such as zero-shot image classificatio ad text-to-image retrieval. Furthermore, whe combied with other compoets, it ca be used as part of geerative models, such as image-to-text ad text-to-image geeratio.

示例代码

from typig import Uio, List
import ftfy, html, re, io
import requests
from PIL import Image
import torch
from modelscope import AutoModel, AutoTokeizer, AutoImageProcessor, BatchFeature

def basic_clea(text):
    text = ftfy.fix_text(text)
    text = html.uescape(html.uescape(text))
    retur text.strip()

def whitespace_clea(text):
    text = re.sub(r"\s+", " ", text)
    text = text.strip()
    retur text

def tokeize(
    tokeizer,
    texts: Uio[str, List[str]],
    max_seq_le: it = 77,
):
    if isistace(texts, str):
        texts = [texts]
    texts = [whitespace_clea(basic_clea(text)) for text i texts]

    iputs = tokeizer(
        texts,
        max_legth=max_seq_le - 1,
        paddig="max_legth",
        trucatio=True,
        add_special_tokes=False,
    )
    # add bos toke at first place
    iput_ids = [[tokeizer.bos_toke_id] + ids for ids i iputs["iput_ids"]]
    attetio_mask = [[1] + am for am i iputs["attetio_mask"]]
    positio_ids = [list(rage(0, le(iput_ids[0])))] * le(texts)

    retur BatchFeature(
        {
            "iput_ids": torch.tesor(iput_ids, dtype=torch.log),
            "attetio_mask": torch.tesor(attetio_mask, dtype=torch.log),
            "positio_ids": torch.tesor(positio_ids, dtype=torch.log),
        }
    )

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "AI-ModelScope/japaese-stable-clip-vit-l-16"
model = AutoModel.from_pretraied(model_path, trust_remote_code=True).to(device)
tokeizer = AutoTokeizer.from_pretraied(model_path)
processor = AutoImageProcessor.from_pretraied(model_path)

# Ru!
image = Image.ope(io.BytesIO(requests.get('https://images.pexels.com/photos/2253275/pexels-photo-2253275.jpeg?auto=compress&cs=tiysrgb&dpr=3&h=750&w=1260').cotet))
image = processor(images=image, retur_tesors="pt").to(device)
text = tokeize(
    tokeizer=tokeizer,
    texts=["犬", "猫", "象"],
).to(device)

with torch.o_grad():
    image_features = model.get_image_features(**image)
    text_features = model.get_text_features(**text)
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

prit("Label probs:", text_probs) 

Usage

  1. Istall packages
  pip istall ftfy pillow requests trasformers torch setecepiece protobuf

Model Details

Model ImageNet top-1 accuracy*
Japaese Stable CLIP ViT-L/16 62.06
ria/japaese-cloob-vit-b-16 54.64
laio/CLIP-ViT-H-14-froze-xlm-roberta-large-laio5B-s13B-b90k 53
ria/japaese-clip-vit-b-16 50.69

* Computed scores based o https://github.com/riakk/japaese-clip.

Traiig

The model uses a ViT-L/16 Trasformer architecture as a image ecoder ad a 12-layer BERT as a text ecoder with the Japaese tokeizer from ria/japaese-roberta-base. Durig traiig, the image ecoder was iitialized from the AugReg [vit-large-patch16-224](https://huggigface.co/timm/vitlargepatch16224.augregi21kfti1k ) model ad we applied SigLIP (Sigmoid loss for Laguage-Image Pre-traiig).

Traiig Dataset

The traiig dataset icludes the followig public datasets:

Use ad Limitatios

Iteded Use

This model is iteded to be used by the ope-source commuity i visio-laguage applicatios.

Limitatios ad bias

The traiig dataset may have cotaied offesive or iappropriate cotet eve though we applied data filters. We recommed users exercise reasoable cautio whe usig these models i productio systems. Do ot use the model for ay applicatios that may cause harm or distress to idividuals or groups.

How to cite

@misc{JapaeseStableCLIP, 
    url    = {[https://huggigface.co/stabilityai/japaese-stable-clip-vit-l-16](https://huggigface.co/stabilityai/japaese-stable-clip-vit-l-16)}, 
    title  = {Japaese Stable CLIP ViT-L/16}, 
    author = {Shig, Makoto ad Akiba, Takuya}
}

Cotact

  • For questios ad commets about the model, please joi Stable Commuity Japa.
  • For future aoucemets / iformatio about Stability AI models, research, ad evets, please follow https://twitter.com/StabilityAI_JP.
  • For busiess ad partership iquiries, please cotact parters-jp@stability.ai. ビジネスや協業に関するお問い合わせはparters-jp@stability.aiにご連絡ください。

功能介绍

Japanese Stable CLIP ViT-L/16 Model Details Japanese Stable CLIP is a Japanese CLIP (Contrastive Lan

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论