开源地址
https://modelscope.cn/models/AI-ModelScope/japanese-stable-clip-vit-l-16授权协议
other

Japaese Stable CLIP ViT-L/16

Model Details

Japaese Stable CLIP is a Japaese CLIP (Cotrastive Laguage-Image Pre-Traiig) model that eables to map both Japaese texts ad images to the same embeddig space. This model aloe is capable of tasks such as zero-shot image classificatio ad text-to-image retrieval. Furthermore, whe combied with other compoets, it ca be used as part of geerative models, such as image-to-text ad text-to-image geeratio.

示例代码

from typig import Uio, List
import ftfy, html, re, io
import requests
from PIL import Image
import torch
from modelscope import AutoModel, AutoTokeizer, AutoImageProcessor, BatchFeature

def basic_clea(text):
    text = ftfy.fix_text(text)
    text = html.uescape(html.uescape(text))
    retur text.strip()

def whitespace_clea(text):
    text = re.sub(r"\s+", " ", text)
    text = text.strip()
    retur text

def tokeize(
    tokeizer,
    texts: Uio[str, List[str]],
    max_seq_le: it = 77,
):
    if isistace(texts, str):
        texts = [texts]
    texts = [whitespace_clea(basic_clea(text)) for text i texts]

    iputs = tokeizer(
        texts,
        max_legth=max_seq_le - 1,
        paddig="max_legth",
        trucatio=True,
        add_special_tokes=False,
    )
    # add bos toke at first place
    iput_ids = [[tokeizer.bos_toke_id] + ids for ids i iputs["iput_ids"]]
    attetio_mask = [[1] + am for am i iputs["attetio_mask"]]
    positio_ids = [list(rage(0, le(iput_ids[0])))] * le(texts)

    retur BatchFeature(
        {
            "iput_ids": torch.tesor(iput_ids, dtype=torch.log),
            "attetio_mask": torch.tesor(attetio_mask, dtype=torch.log),
            "positio_ids": torch.tesor(positio_ids, dtype=torch.log),
        }
    )

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "AI-ModelScope/japaese-stable-clip-vit-l-16"
model = AutoModel.from_pretraied(model_path, trust_remote_code=True).to(device)
tokeizer = AutoTokeizer.from_pretraied(model_path)
processor = AutoImageProcessor.from_pretraied(model_path)

# Ru!
image = Image.ope(io.BytesIO(requests.get('https://images.pexels.com/photos/2253275/pexels-photo-2253275.jpeg?auto=compress&cs=tiysrgb&dpr=3&h=750&w=1260').cotet))
image = processor(images=image, retur_tesors="pt").to(device)
text = tokeize(
    tokeizer=tokeizer,
    texts=["犬", "猫", "象"],
).to(device)

with torch.o_grad():
    image_features = model.get_image_features(**image)
    text_features = model.get_text_features(**text)
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

prit("Label probs:", text_probs)

Usage

Istall packages

  pip istall ftfy pillow requests trasformers torch setecepiece protobuf

Model Details

Developed by: Stability AI
Model type: Cotrastive Image-Text, Zero-Shot Image Classificatio
Laguage(s): Japaese
Licese: STABILITY AI JAPANESE STABLE CLIP COMMUNITY LICENSE.

Model	ImageNet top-1 accuracy*
Japaese Stable CLIP ViT-L/16	62.06
ria/japaese-cloob-vit-b-16	54.64
laio/CLIP-ViT-H-14-froze-xlm-roberta-large-laio5B-s13B-b90k	53
ria/japaese-clip-vit-b-16	50.69

* Computed scores based o https://github.com/riakk/japaese-clip.

Traiig

The model uses a ViT-L/16 Trasformer architecture as a image ecoder ad a 12-layer BERT as a text ecoder with the Japaese tokeizer from ria/japaese-roberta-base. Durig traiig, the image ecoder was iitialized from the AugReg [vit-large-patch16-224](https://huggigface.co/timm/vitlargepatch16224.augregi21kfti1k ) model ad we applied SigLIP (Sigmoid loss for Laguage-Image Pre-traiig).

Traiig Dataset

The traiig dataset icludes the followig public datasets:

CC12M with captios traslated ito Japaese
MS-COCO with STAIR Captios

Use ad Limitatios

Iteded Use

This model is iteded to be used by the ope-source commuity i visio-laguage applicatios.

Limitatios ad bias

The traiig dataset may have cotaied offesive or iappropriate cotet eve though we applied data filters. We recommed users exercise reasoable cautio whe usig these models i productio systems. Do ot use the model for ay applicatios that may cause harm or distress to idividuals or groups.

How to cite

@misc{JapaeseStableCLIP, 
    url    = {[https://huggigface.co/stabilityai/japaese-stable-clip-vit-l-16](https://huggigface.co/stabilityai/japaese-stable-clip-vit-l-16)}, 
    title  = {Japaese Stable CLIP ViT-L/16}, 
    author = {Shig, Makoto ad Akiba, Takuya}
}

Cotact

For questios ad commets about the model, please joi Stable Commuity Japa.
For future aoucemets / iformatio about Stability AI models, research, ad evets, please follow https://twitter.com/StabilityAI_JP.
For busiess ad partership iquiries, please cotact parters-jp@stability.ai. ビジネスや協業に関するお問い合わせはparters-jp@stability.aiにご連絡ください。

Japanese Stable CLIP ViT-L/16 Model Details Japanese Stable CLIP is a Japanese CLIP (Contrastive Lan

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

japanese-stable-clip-vit-l-16

技术信息

作品详情

Japaese Stable CLIP ViT-L/16

Model Details

示例代码

Usage

Model Details

Traiig

Traiig Dataset

Use ad Limitatios

Iteded Use

Limitatios ad bias

How to cite

Cotact

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐