Zephyr is a series of laguage models that are traied to act as helpful assistats. Zephyr-7B-α is the first model i the series, ad is a fie-tued versio of mistralai/Mistral-7B-v0.1 that was traied o o a mix of publicly available, sythetic datasets usig Direct Preferece Optimizatio (DPO). We foud that removig the i-built aligmet of these datasets boosted performace o MT Bech ad made the model more helpful. However, this meas that model is likely to geerate problematic text whe prompted to do so ad should oly be used for educatioal ad research purposes. The model was iitially fie-tued o a variat of the Here's how you ca ru the model usig the Zephyr-7B-α has ot bee aliged to huma prefereces with techiques like RLHF or deployed with i-the-loop filterig of resposes like ChatGPT, so the model ca produce problematic outputs (especially whe prompted to do so).
It is also ukow what the size ad compositio of the corpus was used to trai the base model ( Zephyr 7B Alpha achieves the followig results o the evaluatio set: The followig hyperparameters were used durig traiig:Model Card for Zephyr 7B Alpha
Model descriptio
Model Sources
Iteded uses & limitatios
UltraChat
dataset, which cotais a diverse rage of sythetic dialogues geerated by ChatGPT. We the further aliged the model with ? TRL's DPOTraier
o the opebmb/UltraFeedback dataset, which cotai 64k prompts ad model completios that are raked by GPT-4. As a result, the model ca be used for chat ad you ca check out our demo to test its capabilities. pipelie()
fuctio from ? Trasformers:import torch
from trasformers import pipelie
pipe = pipelie("text-geeratio", model="HuggigFaceH4/zephyr-7b-alpha", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokeizer's chat template to format each message - see https://huggigface.co/docs/trasformers/mai/e/chat_templatig
messages = [
{
"role": "system",
"cotet": "You are a friedly chatbot who always respods i the style of a pirate",
},
{"role": "user", "cotet": "How may helicopters ca a huma eat i oe sittig?"},
]
prompt = pipe.tokeizer.apply_chat_template(messages, tokeize=False, add_geeratio_prompt=True)
outputs = pipe(prompt, max_ew_tokes=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
prit(outputs[0]["geerated_text"])
# <|system|>
# You are a friedly chatbot who always respods i the style of a pirate.</s>
# <|user|>
# How may helicopters ca a huma eat i oe sittig?</s>
# <|assistat|>
# Ah, me hearty matey! But yer questio be a puzzler! A huma caot eat a helicopter i oe sittig, as helicopters are ot edible. They be made of metal, plastic, ad other materials, ot food!
Bias, Risks, ad Limitatios
mistralai/Mistral-7B-v0.1
), however it is likely to have icluded a mix of Web data ad techical sources like books ad code. See the Falco 180B model card for a example of this.Traiig ad evaluatio data
Traiig procedure
Traiig hyperparameters
Traiig results
Traiig Loss
Epoch
Step
Validatio Loss
Rewards/chose
Rewards/rejected
Rewards/accuracies
Rewards/margis
Logps/rejected
Logps/chose
Logits/rejected
Logits/chose
0.5602
0.05
100
0.5589
-0.3359
-0.8168
0.7188
0.4809
-306.2607
-293.7161
-2.6554
-2.6797
0.4852
0.1
200
0.5136
-0.5310
-1.4994
0.8125
0.9684
-319.9124
-297.6181
-2.5762
-2.5957
0.5212
0.15
300
0.5168
-0.1686
-1.1760
0.7812
1.0074
-313.4444
-290.3699
-2.6865
-2.7125
0.5496
0.21
400
0.4835
-0.1617
-1.7170
0.8281
1.5552
-324.2635
-290.2326
-2.7947
-2.8218
0.5209
0.26
500
0.5054
-0.4778
-1.6604
0.7344
1.1826
-323.1325
-296.5546
-2.8388
-2.8667
0.4617
0.31
600
0.4910
-0.3738
-1.5180
0.7656
1.1442
-320.2848
-294.4741
-2.8234
-2.8521
0.4452
0.36
700
0.4838
-0.4591
-1.6576
0.7031
1.1986
-323.0770
-296.1796
-2.7401
-2.7653
0.4674
0.41
800
0.5077
-0.5692
-1.8659
0.7656
1.2967
-327.2416
-298.3818
-2.6740
-2.6945
0.4656
0.46
900
0.4927
-0.5279
-1.6614
0.7656
1.1335
-323.1518
-297.5553
-2.7817
-2.8015
0.4102
0.52
1000
0.4772
-0.5767
-2.0667
0.7656
1.4900
-331.2578
-298.5311
-2.7160
-2.7455
0.4663
0.57
1100
0.4740
-0.8038
-2.1018
0.7656
1.2980
-331.9604
-303.0741
-2.6994
-2.7257
0.4737
0.62
1200
0.4716
-0.3783
-1.7015
0.7969
1.3232
-323.9545
-294.5634
-2.6842
-2.7135
0.4259
0.67
1300
0.4866
-0.6239
-1.9703
0.7812
1.3464
-329.3312
-299.4761
-2.7046
-2.7356
0.4935
0.72
1400
0.4747
-0.5626
-1.7600
0.7812
1.1974
-325.1243
-298.2491
-2.7153
-2.7444
0.4211
0.77
1500
0.4645
-0.6099
-1.9993
0.7656
1.3894
-329.9109
-299.1959
-2.6944
-2.7236
0.4931
0.83
1600
0.4684
-0.6798
-2.1082
0.7656
1.4285
-332.0890
-300.5934
-2.7006
-2.7305
0.5029
0.88
1700
0.4595
-0.5063
-1.8951
0.7812
1.3889
-327.8267
-297.1233
-2.7108
-2.7403
0.4965
0.93
1800
0.4613
-0.5561
-1.9079
0.7812
1.3518
-328.0831
-298.1203
-2.7226
-2.7523
0.4337
0.98
1900
0.4608
-0.5066
-1.8718
0.7656
1.3652
-327.3599
-297.1296
-2.7175
-2.7469
Framework versios
点击空白处退出提示
评论