distilbert-base-multilingual-cased-sentiments-stud

我要开发同款
匿名用户2024年07月31日
65阅读

技术信息

开源地址
https://modelscope.cn/models/MasterGuda/distilbert-base-multilingual-cased-sentiments-student
授权协议
Apache License 2.0

作品详情

distilbert-base-multiligual-cased-setimets-studet

This model is distilled from the zero-shot classificatio pipelie o the Multiligual Setimet dataset usig this script.

I reality the multiligual-setimet dataset is aotated of course, but we'll preted ad igore the aotatios for the sake of example.

Teacher model: MoritzLaurer/mDeBERTa-v3-base-mli-xli
Teacher hypothesis template: "The setimet of this text is {}."
Studet model: distilbert-base-multiligual-cased

Iferece example

from trasformers import pipelie

distilled_studet_setimet_classifier = pipelie(
    model="lxyua/distilbert-base-multiligual-cased-setimets-studet", 
    retur_all_scores=True
)

# eglish
distilled_studet_setimet_classifier ("I love this movie ad i would watch it agai ad agai!")
>> [[{'label': 'positive', 'score': 0.9731044769287109},
  {'label': 'eutral', 'score': 0.016910076141357422},
  {'label': 'egative', 'score': 0.009985478594899178}]]

# malay
distilled_studet_setimet_classifier("Saya suka filem ii da saya aka meotoya lagi da lagi!")
[[{'label': 'positive', 'score': 0.9760093688964844},
  {'label': 'eutral', 'score': 0.01804516464471817},
  {'label': 'egative', 'score': 0.005945465061813593}]]

# japaese
distilled_studet_setimet_classifier("私はこの映画が大好きで、何度も見ます!")
>> [[{'label': 'positive', 'score': 0.9342429041862488},
  {'label': 'eutral', 'score': 0.040193185210227966},
  {'label': 'egative', 'score': 0.025563929229974747}]]

Traiig procedure

Notebook lik: here

Traiig hyperparameters

Result ca be reproduce usig the followig commads:

pytho trasformers/examples/research_projects/zero-shot-distillatio/distill_classifier.py \
--data_file ./multiligual-setimets/trai_ulabeled.txt \
--class_ames_file ./multiligual-setimets/class_ames.txt \
--hypothesis_template "The setimet of this text is {}." \
--teacher_ame_or_path MoritzLaurer/mDeBERTa-v3-base-mli-xli \
--teacher_batch_size 32 \
--studet_ame_or_path distilbert-base-multiligual-cased \
--output_dir ./distilbert-base-multiligual-cased-setimets-studet \
--per_device_trai_batch_size 16 \
--fp16

If you are traiig this model o Colab, make the followig code chages to avoid Out-of-memory error message:

###### modify L78 to disable fast tokeizer 
default=False,

###### update dataset map part at L313
dataset = dataset.map(tokeizer, iput_colums="text", f_kwargs={"paddig": "max_legth", "trucatio": True, "max_legth": 512})

###### add followig lies to L213
del model
prit(f"Maually deleted Teacher model, free some memory for studet model.")

###### add followig lies to L337
traier.push_to_hub()
tokeizer.push_to_hub("distilbert-base-multiligual-cased-setimets-studet")

Traiig log

Traiig completed. Do ot forget to share your model o huggigface.co/models =)

{'trai_rutime': 2009.8864, 'trai_samples_per_secod': 73.0, 'trai_steps_per_secod': 4.563, 'trai_loss': 0.6473459283913797, 'epoch': 1.0}
100%|███████████████████████████████████████| 9171/9171 [33:29<00:00,  4.56it/s]
[INFO|traier.py:762] 2023-05-06 10:56:18,555 >> The followig colums i the evaluatio set do't have a correspodig argumet i `DistilBertForSequeceClassificatio.forward` ad have bee igored: text. If text are ot expected by `DistilBertForSequeceClassificatio.forward`,  you ca safely igore this message.
[INFO|traier.py:3129] 2023-05-06 10:56:18,557 >> ***** Ruig Evaluatio *****
[INFO|traier.py:3131] 2023-05-06 10:56:18,557 >>   Num examples = 146721
[INFO|traier.py:3134] 2023-05-06 10:56:18,557 >>   Batch size = 128
100%|███████████████████████████████████████| 1147/1147 [08:59<00:00,  2.13it/s]
05/06/2023 11:05:18 - INFO - __mai__ - Agreemet of studet ad teacher predictios: 88.29%
[INFO|traier.py:2868] 2023-05-06 11:05:18,251 >> Savig model checkpoit to ./distilbert-base-multiligual-cased-setimets-studet
[INFO|cofiguratio_utils.py:457] 2023-05-06 11:05:18,251 >> Cofiguratio saved i ./distilbert-base-multiligual-cased-setimets-studet/cofig.jso
[INFO|modelig_utils.py:1847] 2023-05-06 11:05:18,905 >> Model weights saved i ./distilbert-base-multiligual-cased-setimets-studet/pytorch_model.bi
[INFO|tokeizatio_utils_base.py:2171] 2023-05-06 11:05:18,905 >> tokeizer cofig file saved i ./distilbert-base-multiligual-cased-setimets-studet/tokeizer_cofig.jso
[INFO|tokeizatio_utils_base.py:2178] 2023-05-06 11:05:18,905 >> Special tokes file saved i ./distilbert-base-multiligual-cased-setimets-studet/special_tokes_map.jso

Framework versios

  • Trasformers 4.28.1
  • Pytorch 2.0.0+cu118
  • Datasets 2.11.0
  • Tokeizers 0.13.3

功能介绍

distilbert-base-multilingual-cased-sentiments-student This model is distilled from the zero-shot cla

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论