Starling-LM-7B-alpha

我要开发同款
匿名用户2024年07月31日
51阅读

技术信息

开源地址
https://modelscope.cn/models/AI-ModelScope/Starling-LM-7B-alpha
授权协议
Apache License 2.0

作品详情

Starlig-RM-7B-alpha

  • Developed by: Baghua Zhu * , Eva Frick * , Tiahao Wu * , Hali Zhu ad Jiatao Jiao.
  • Model type: Laguage Model fietued with RLHF / RLAIF
  • Licese: No commercial licese
  • Fietued from model: Opechat 3.5 (based o Mistral-7B-v0.1)

We itroduce Starlig-7B, a ope large laguage model (LLM) traied by Reiforcemet Learig from AI Feedback (RLAIF). The model haresses the power of our ew GPT-4 labeled rakig dataset, berkeley-est/Nectar, ad our ew reward traiig ad policy tuig pipelie. Starlig-7B-alpha scores 8.09 i MT Bech with GPT-4 as a judge, outperformig every model to date o MT-Bech except for OpeAI's GPT-4 ad GPT-4 Turbo. We release the rakig dataset Nectar, the reward model Starlig-RM-7B-alpha ad the laguage model Starlig-LM-7B-alpha o HuggigFace, ad a olie demo i LMSYS Chatbot Area. Stay tued for our forthcomig code ad paper, which will provide more details o the whole process.

Starlig-LM-7B-alpha is a laguage model traied from Opechat 3.5 with reward model berkeley-est/Starlig-RM-7B-alpha ad policy optimizatio method advatage-iduced policy aligmet (APA). The evaluatio results are listed below.

Model Tuig Method MT Bech AlpacaEval MMLU
GPT-4-Turbo ? 9.32 97.70
GPT-4 SFT + PPO 8.99 95.28 86.4
Starlig-7B C-RLFT + APA 8.09 91.99 63.9
Claude-2 ? 8.06 91.36 78.5
GPT-3.5-Turbo ? 7.94 89.37 70
Claude-1 ? 7.9 88.39 77
Tulu-2-dpo-70b SFT + DPO 7.89 95.1
Opechat-3.5 C-RLFT 7.81 88.51 64.3
Zephyr-7B-beta SFT + DPO 7.34 90.60 61.4
Llama-2-70b-chat-hf SFT + PPO 6.86 92.66 63
Neural-chat-7b-v3-1 SFT + DPO 6.84 84.53 62.4
Tulu-2-dpo-7b SFT + DPO 6.29 85.1

For more detailed discussios, please check out our blog post, ad stay tued for our upcomig code ad paper!

  • Blog: https://starlig.cs.berkeley.edu/
  • Paper: Comig soo!
  • Code: Comig soo!

Uses

Importat: Please use the exact chat template provided below for the model. Otherwise there will be a degrade i the performace. The model output ca be verbose i rare cases. Please cosider settig temperature = 0 to make this happe less.

Our model follows the exact chat template ad usage as Opechat 3.5. Please refer to their model card for more details. I additio, our model is hosted o LMSYS Chatbot Area for free test.

The coversatio template is the same as Opechat 3.5:

import modelscope
tokeizer = modelscope.AutoTokeizer.from_pretraied("opechat/opechat_3.5")

# Sigle-tur
tokes = tokeizer("GPT4 Correct User: Hello<|ed_of_tur|>GPT4 Correct Assistat:").iput_ids
assert tokes == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

# Multi-tur
tokes = tokeizer("GPT4 Correct User: Hello<|ed_of_tur|>GPT4 Correct Assistat: Hi<|ed_of_tur|>GPT4 Correct User: How are you today?<|ed_of_tur|>GPT4 Correct Assistat:").iput_ids
assert tokes == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

# Codig Mode
tokes = tokeizer("Code User: Implemet quicksort usig C++<|ed_of_tur|>Code Assistat:").iput_ids
assert tokes == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 32000, 7596, 21631, 28747]

Code Examples

import modelscope

tokeizer = modelscope.AutoTokeizer.from_pretraied("AI-ModelScope/Starlig-LM-7B-alpha")
model = modelscope.AutoModelForCausalLM.from_pretraied("AI-ModelScope/Starlig-LM-7B-alpha")

def geerate_respose(prompt):
    iput_ids = tokeizer(prompt, retur_tesors="pt").iput_ids
    outputs = model.geerate(
        iput_ids,
        max_legth=256,
        pad_toke_id=tokeizer.pad_toke_id,
        eos_toke_id=tokeizer.eos_toke_id,
    )
    respose_ids = outputs[0]
    respose_text = tokeizer.decode(respose_ids, skip_special_tokes=True)
    retur respose_text

# Sigle-tur coversatio
prompt = "Hello, how are you?"
sigle_tur_prompt = f"GPT4 Correct User: {prompt}<|ed_of_tur|>GPT4 Correct Assistat:"
respose_text = geerate_respose(sigle_tur_prompt)
prit("Respose:", respose_text)

## Multi-tur coversatio
prompt = "Hello"
follow_up_questio =  "How are you today?"
respose = ""
multi_tur_prompt = f"GPT4 Correct User: {prompt}<|ed_of_tur|>GPT4 Correct Assistat: {respose}<|ed_of_tur|>GPT4 Correct User: {follow_up_questio}<|ed_of_tur|>GPT4 Correct Assistat:"
respose_text = geerate_respose(multi_tur_prompt)
prit("Multi-tur coversatio respose:", respose_text)

### Codig coversatio
prompt = "Implemet quicksort usig C++"
codig_prompt = f"Code User: {prompt}<|ed_of_tur|>Code Assistat:"
respose = geerate_respose(codig_prompt)
prit("Codig coversatio respose:", respose)

Licese

The dataset, model ad olie demo is a research preview iteded for o-commercial use oly, subject to the data distillatio Licese of LLaMA, Terms of Use of the data geerated by OpeAI, ad Privacy Practices of ShareGPT. Please cotact us if you fid ay potetial violatio.

Ackowledgmet

We would like to thak Wei-Li Chiag from Berkeley for detailed feedback of the blog ad the projects. We would like to thak the LMSYS Orgaizatio for their support of lmsys-chat-1M dataset, evaluatio ad olie demo. We would like to thak the ope source commuity for their efforts i providig the datasets ad base models we used to develope the project, icludig but ot limited to Athropic, Llama, Mistral, Huggig Face H4, LMSYS, OpeChat, OpeBMB, Fla ad ShareGPT.

Citatio

@misc{starlig2023,
    title = {Starlig-7B: Improvig LLM Helpfuless & Harmlessess with RLAIF},
    url = {},
    author = {Zhu, Baghua ad Frick, Eva ad Wu, Tiahao ad Zhu, Hali ad Jiao, Jiatao},
    moth = {November},
    year = {2023}
}

功能介绍

Starling-RM-7B-alpha Developed by: Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Ji

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论