This model is distilled from the zero-shot classificatio pipelie o the Multiligual Setimet
dataset usig this script. I reality the multiligual-setimet dataset is aotated of course,
but we'll preted ad igore the aotatios for the sake of example. Notebook lik: here Result ca be reproduce usig the followig commads: If you are traiig this model o Colab, make the followig code chages to avoid Out-of-memory error message:distilbert-base-multiligual-cased-setimets-studet
Teacher model: MoritzLaurer/mDeBERTa-v3-base-mli-xli
Teacher hypothesis template: "The setimet of this text is {}."
Studet model: distilbert-base-multiligual-cased
Iferece example
from trasformers import pipelie
distilled_studet_setimet_classifier = pipelie(
model="lxyua/distilbert-base-multiligual-cased-setimets-studet",
retur_all_scores=True
)
# eglish
distilled_studet_setimet_classifier ("I love this movie ad i would watch it agai ad agai!")
>> [[{'label': 'positive', 'score': 0.9731044769287109},
{'label': 'eutral', 'score': 0.016910076141357422},
{'label': 'egative', 'score': 0.009985478594899178}]]
# malay
distilled_studet_setimet_classifier("Saya suka filem ii da saya aka meotoya lagi da lagi!")
[[{'label': 'positive', 'score': 0.9760093688964844},
{'label': 'eutral', 'score': 0.01804516464471817},
{'label': 'egative', 'score': 0.005945465061813593}]]
# japaese
distilled_studet_setimet_classifier("私はこの映画が大好きで、何度も見ます!")
>> [[{'label': 'positive', 'score': 0.9342429041862488},
{'label': 'eutral', 'score': 0.040193185210227966},
{'label': 'egative', 'score': 0.025563929229974747}]]
Traiig procedure
Traiig hyperparameters
pytho trasformers/examples/research_projects/zero-shot-distillatio/distill_classifier.py \
--data_file ./multiligual-setimets/trai_ulabeled.txt \
--class_ames_file ./multiligual-setimets/class_ames.txt \
--hypothesis_template "The setimet of this text is {}." \
--teacher_ame_or_path MoritzLaurer/mDeBERTa-v3-base-mli-xli \
--teacher_batch_size 32 \
--studet_ame_or_path distilbert-base-multiligual-cased \
--output_dir ./distilbert-base-multiligual-cased-setimets-studet \
--per_device_trai_batch_size 16 \
--fp16
###### modify L78 to disable fast tokeizer
default=False,
###### update dataset map part at L313
dataset = dataset.map(tokeizer, iput_colums="text", f_kwargs={"paddig": "max_legth", "trucatio": True, "max_legth": 512})
###### add followig lies to L213
del model
prit(f"Maually deleted Teacher model, free some memory for studet model.")
###### add followig lies to L337
traier.push_to_hub()
tokeizer.push_to_hub("distilbert-base-multiligual-cased-setimets-studet")
Traiig log
Traiig completed. Do ot forget to share your model o huggigface.co/models =)
{'trai_rutime': 2009.8864, 'trai_samples_per_secod': 73.0, 'trai_steps_per_secod': 4.563, 'trai_loss': 0.6473459283913797, 'epoch': 1.0}
100%|███████████████████████████████████████| 9171/9171 [33:29<00:00, 4.56it/s]
[INFO|traier.py:762] 2023-05-06 10:56:18,555 >> The followig colums i the evaluatio set do't have a correspodig argumet i `DistilBertForSequeceClassificatio.forward` ad have bee igored: text. If text are ot expected by `DistilBertForSequeceClassificatio.forward`, you ca safely igore this message.
[INFO|traier.py:3129] 2023-05-06 10:56:18,557 >> ***** Ruig Evaluatio *****
[INFO|traier.py:3131] 2023-05-06 10:56:18,557 >> Num examples = 146721
[INFO|traier.py:3134] 2023-05-06 10:56:18,557 >> Batch size = 128
100%|███████████████████████████████████████| 1147/1147 [08:59<00:00, 2.13it/s]
05/06/2023 11:05:18 - INFO - __mai__ - Agreemet of studet ad teacher predictios: 88.29%
[INFO|traier.py:2868] 2023-05-06 11:05:18,251 >> Savig model checkpoit to ./distilbert-base-multiligual-cased-setimets-studet
[INFO|cofiguratio_utils.py:457] 2023-05-06 11:05:18,251 >> Cofiguratio saved i ./distilbert-base-multiligual-cased-setimets-studet/cofig.jso
[INFO|modelig_utils.py:1847] 2023-05-06 11:05:18,905 >> Model weights saved i ./distilbert-base-multiligual-cased-setimets-studet/pytorch_model.bi
[INFO|tokeizatio_utils_base.py:2171] 2023-05-06 11:05:18,905 >> tokeizer cofig file saved i ./distilbert-base-multiligual-cased-setimets-studet/tokeizer_cofig.jso
[INFO|tokeizatio_utils_base.py:2178] 2023-05-06 11:05:18,905 >> Special tokes file saved i ./distilbert-base-multiligual-cased-setimets-studet/special_tokes_map.jso
Framework versios
点击空白处退出提示







评论