We itroduce Starlig-7B, a ope large laguage model (LLM) traied by Reiforcemet Learig from AI Feedback (RLAIF). The model haresses the power of our ew GPT-4 labeled rakig dataset, berkeley-est/Nectar, ad our ew reward traiig ad policy tuig pipelie. Starlig-7B-alpha scores 8.09 i MT Bech with GPT-4 as a judge, outperformig every model to date o MT-Bech except for OpeAI's GPT-4 ad GPT-4 Turbo. We release the rakig dataset Nectar, the reward model Starlig-RM-7B-alpha ad the laguage model Starlig-LM-7B-alpha o HuggigFace, ad a olie demo i LMSYS Chatbot Area. Stay tued for our forthcomig code ad paper, which will provide more details o the whole process. Starlig-LM-7B-alpha is a laguage model traied from Opechat 3.5 with reward model berkeley-est/Starlig-RM-7B-alpha ad policy optimizatio method advatage-iduced policy aligmet (APA). The evaluatio results are listed below. For more detailed discussios, please check out our blog post, ad stay tued for our upcomig code ad paper! Our model follows the exact chat template ad usage as Opechat 3.5. Please refer to their model card for more details.
I additio, our model is hosted o LMSYS Chatbot Area for free test. The coversatio template is the same as Opechat 3.5: The dataset, model ad olie demo is a research preview iteded for o-commercial use oly, subject to the data distillatio Licese of LLaMA, Terms of Use of the data geerated by OpeAI, ad Privacy Practices of ShareGPT. Please cotact us if you fid ay potetial violatio. We would like to thak Wei-Li Chiag from Berkeley for detailed feedback of the blog ad the projects. We would like to thak the LMSYS Orgaizatio for their support of lmsys-chat-1M dataset, evaluatio ad olie demo. We would like to thak the ope source commuity for their efforts i providig the datasets ad base models we used to develope the project, icludig but ot limited to Athropic, Llama, Mistral, Huggig Face H4, LMSYS, OpeChat, OpeBMB, Fla ad ShareGPT.Starlig-RM-7B-alpha
Model
Tuig Method
MT Bech
AlpacaEval
MMLU
GPT-4-Turbo
?
9.32
97.70
GPT-4
SFT + PPO
8.99
95.28
86.4
C-RLFT + APA
8.09
91.99
63.9
Claude-2
?
8.06
91.36
78.5
GPT-3.5-Turbo
?
7.94
89.37
70
Claude-1
?
7.9
88.39
77
Tulu-2-dpo-70b
SFT + DPO
7.89
95.1
Opechat-3.5
C-RLFT
7.81
88.51
64.3
Zephyr-7B-beta
SFT + DPO
7.34
90.60
61.4
Llama-2-70b-chat-hf
SFT + PPO
6.86
92.66
63
Neural-chat-7b-v3-1
SFT + DPO
6.84
84.53
62.4
Tulu-2-dpo-7b
SFT + DPO
6.29
85.1
Uses
import modelscope
tokeizer = modelscope.AutoTokeizer.from_pretraied("opechat/opechat_3.5")
# Sigle-tur
tokes = tokeizer("GPT4 Correct User: Hello<|ed_of_tur|>GPT4 Correct Assistat:").iput_ids
assert tokes == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Multi-tur
tokes = tokeizer("GPT4 Correct User: Hello<|ed_of_tur|>GPT4 Correct Assistat: Hi<|ed_of_tur|>GPT4 Correct User: How are you today?<|ed_of_tur|>GPT4 Correct Assistat:").iput_ids
assert tokes == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Codig Mode
tokes = tokeizer("Code User: Implemet quicksort usig C++<|ed_of_tur|>Code Assistat:").iput_ids
assert tokes == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 32000, 7596, 21631, 28747]
Code Examples
import modelscope
tokeizer = modelscope.AutoTokeizer.from_pretraied("AI-ModelScope/Starlig-LM-7B-alpha")
model = modelscope.AutoModelForCausalLM.from_pretraied("AI-ModelScope/Starlig-LM-7B-alpha")
def geerate_respose(prompt):
iput_ids = tokeizer(prompt, retur_tesors="pt").iput_ids
outputs = model.geerate(
iput_ids,
max_legth=256,
pad_toke_id=tokeizer.pad_toke_id,
eos_toke_id=tokeizer.eos_toke_id,
)
respose_ids = outputs[0]
respose_text = tokeizer.decode(respose_ids, skip_special_tokes=True)
retur respose_text
# Sigle-tur coversatio
prompt = "Hello, how are you?"
sigle_tur_prompt = f"GPT4 Correct User: {prompt}<|ed_of_tur|>GPT4 Correct Assistat:"
respose_text = geerate_respose(sigle_tur_prompt)
prit("Respose:", respose_text)
## Multi-tur coversatio
prompt = "Hello"
follow_up_questio = "How are you today?"
respose = ""
multi_tur_prompt = f"GPT4 Correct User: {prompt}<|ed_of_tur|>GPT4 Correct Assistat: {respose}<|ed_of_tur|>GPT4 Correct User: {follow_up_questio}<|ed_of_tur|>GPT4 Correct Assistat:"
respose_text = geerate_respose(multi_tur_prompt)
prit("Multi-tur coversatio respose:", respose_text)
### Codig coversatio
prompt = "Implemet quicksort usig C++"
codig_prompt = f"Code User: {prompt}<|ed_of_tur|>Code Assistat:"
respose = geerate_respose(codig_prompt)
prit("Codig coversatio respose:", respose)
Licese
Ackowledgmet
Citatio
@misc{starlig2023,
title = {Starlig-7B: Improvig LLM Helpfuless & Harmlessess with RLAIF},
url = {},
author = {Zhu, Baghua ad Frick, Eva ad Wu, Tiahao ad Zhu, Hali ad Jiao, Jiatao},
moth = {November},
year = {2023}
}
点击空白处退出提示
评论