Ziya2-13B-Chat

Mai Page:Fegshebag
Github: Fegshebag-LM

姜子牙系列模型

Ziya-LLaMA-13B-v1.1
Ziya-LLaMA-7B-Reward
Ziya-LLaMA-13B-Pretrai-v1
Ziya-Writig-LLaMa-13B-v1
Ziya-BLIP2-14B-Visual-v1
Ziya-Codig-15B-v1
Ziya-Codig-34B-v1.0
Ziya2-13B-Base
Ziya-Reader-13B-v1.0

简介 Brief Itroductio

Ziya2-13B-Chat是基于Ziya2-13B-Base的对话模型，在30万高质量的通用指令微调数据以及40万知识增强的指令微调数据上进行了有监督训练，并且在数万条高质量人类偏好数据训练的奖励模型上进行了全参数的人类反馈强化学习训练。

Ziya2-13B-Chat is a chat versio of Ziya2-13B-Base. Ziya2-13B-Chat was fie-tued o 300,000 high-quality geeral istructio data as well as 400,000 kowledge-ehaced istructio data, ad the traied with full-parameter RLHF o a feedback model traied o tes of thousads of high-quality huma preferece data.

联合宣传

模型分类 Model Taxoomy

需求 Demad	任务 Task	系列 Series	模型 Model	参数 Parameter	额外 Extra
通用 Geeral	AGI模型	姜子牙 Ziya	LLaMA2	13B	Eglish&Chiese

模型信息 Model Iformatio

继续预训练 Cotiual Pretraiig

Meta在2023年7月份发布了Llama2系列大模型，相比于LLaMA1的1.4万亿Toke数据，Llama2预训练的Toke达到了2万亿，并在各个榜单中明显超过LLaMA1。

Meta released the Llama2 series of large models i July 2023, with pre-traied tokes reachig 200 billio compared to Llama1's 140 billio tokes, sigificatly outperformig Llama1 i various rakigs.

Ziya2-13B-Base沿用了Ziya-LLaMA-13B高效的中文编解码方式，但采取了更优化的初始化算法使得初始训练loss更低。同时，我们对Fegshe-PT继续训练框架进行了优化，效率方面，整合了FlashAttetio2、Apex RMS orm等技术来帮助提升效率，对比Ziya-LLaMA-13B训练速度提升38%(163 TFLOPS/per gpu/per sec)。稳定性方面，我们采取BF16进行训练，修复了底层分布式框架的bug，确保模型能够持续稳定训练，解决了Ziya-LLaMA-13B遇到的训练后期不稳定的问题，并在7.25号进行了直播，最终完成了全部数据的继续训练。我们也发现，模型效果还有进一步提升的趋势，后续也会对Ziya2-13B-Base进行继续优化。

Ziya2-13B-Base retaied the efficiet Chiese ecodig ad decodig techiques of Ziya-LLaMA-13B, but employed a more optimized iitializatio algorithm to achieve lower iitial traiig loss. Additioally, we optimized the Fegshe-PT fie-tuig framework. I terms of efficiecy, we itegrated techologies such as FlashAttetio2 ad Apex RMS orm to boost efficiecy, resultig i a 38% icrease i traiig speed compared to Ziya-LLaMA-13B (163 TFLOPS per GPU per secod). For stability, we used BF16 for traiig, fixed uderlyig distributed framework bugs to esure cosistet model traiig, ad resolved the late-stage istability issues ecoutered i the traiig of Ziya-LLaMA-13B. We also coducted a live broadcast o July 25th to complete the cotiued traiig of all data. We have observed a tred towards further improvemets i model performace ad pla to cotiue optimizig Ziya2-13B-Base i the future.

loss曲线

指令微调 Supervised Fie-tuig

依托于Ziya2-13B-Base强大的基础能力，我们优化了SFT阶段的训练策略。

我们发现高质量和多样的任务指令数据能够最大程度地激发预训练阶段所学到的知识。因此，我们利用Evol-Istruct的方法，对我们收集到的指令数据集进行了数据增强，并利用奖励模型筛选出了高质量的样本。最终，我们从2000万的指令数据集中，构造得到了30万高质量的通用指令微调数据，涵盖了问答、推理、代码、常识、对话、写作、自然语言理解、安全性等广泛的任务。

此外，我们发现在有监督微调阶段，引入知识增强训练，可以进一步提升模型的效果。我们利用检索模块，显式地将与指令有关的知识拼到上下文后进行训练。在这一部分，我们构造了约10万条知识增强的指令样本。

最终，我们在经过了300B&bsp;toke预训练的Ziya2-13B-Base模型的基础上，使用约40万的指令样本，使用8k的上下文窗口，经过两个epoch的训练得到SFT阶段的模型。

Based o the strog capability of Ziya2-13B-Base, we optimized the traiig strategy for the supervised fie-tuig phase (SFT).

We foud that high-quality ad varied task istructio data maximizes the stimulatio of the kowledge leared i the pre-traiig phase. Therefore, we utilized the Evol-Istruct approach to augmet our collected istructio dataset with data ad filtered out high-quality samples usig a reward model. We evetually costructed 300,000 high-quality geeral-purpose istructio fie-tuig data from a 20 millio istructio dataset, coverig a wide rage of tasks such as QA, reasoig, codig, commo sese, dialog, writig, atural laguage uderstadig, security, etc.

I additio, we fid that the itroductio of kowledge-ehaced traiig ca further improve the model. We used the retrieval module to obtai kowledge related to the questios ad cocateated their text ito the cotext of the traiig data. I this sectio, we costructed about 100,000 samples of kowledge-ehaced istructios.

Fially, we obtaied the SFT model after two epochs of traiig usig about 400,000 istructio samples with a cotext widow of 8k, based o the Ziya2-13B-Base model that had bee pre-traied with 300B tokes.

人类反馈学习 Reiforcemet learig from Huma Feedback

基于SFT阶段的模型，Ziya2-13B-Chat针对多种问答、写作以及模型安全性的任务上进行了人类偏好的对齐。我们自行采集了数万条高质量人类偏好数据，使用Ziya2-13B-Base训练了人类偏好反馈模型，在各任务的偏好数据上达到了72%以上的准确率。

Based o SFT model, Ziya2-13B-Chat was aliged for huma prefereces o a variety of Q&A, writig, ad safety tasks. We collected tes of thousads of high-quality huma preferece data o our ow ad traied a huma preferece feedback model usig Ziya2-13B-Base, achievig over 72% accuracy o preferece data across tasks.

任务类型 task	偏好识别准确率 Acc
日常问答 Daily QA	76.8%
知识问答 Kowledge Quizzig	76.7%
日常写作 Daily Writig	82.3%
任务型写作 Task-based Writig	72.7%
故事写作 Story Writig	75.1%
角色扮演 Role-playih	77.6%
安全类 Safety & Harmlessess	72.0%

基于 Fegshe-RLHF 框架，Ziya2-13B-Chat使用以上人类偏好反馈模型进行了人类反馈强化学习，使模型输出更贴合人类偏好的同时具有更高的安全性。

Usig Fegshe-RLHF Framework, Ziya2-13B-Chat used the above feedback model for reiforcemet learig, makig itself more closely match huma prefereces with higher security.

效果评估 Performace

我们在涵盖了常识问答、写作、数学推理、自然语言理解、安全等多种任务的通用能力测试集上进行了人工评估。最终，Ziya2-13B-Chat模型与Ziya-LlaMA-13B-v1.1模型在side-by-side评测下取得了66.5%的胜率，并对人类反馈强化学习前的版本取得了58.4%的胜率。

We coducted huma evaluatios of Ziya2-13B-Chat o a variety of tasks coverig kowledge quizzig, writig, mathematical reasoig, atural laguage uderstadig, security, etc. Ziya2-13B-Chat achieved a 66.5% wi rate agaist Ziya-LlaMA-13B-v1.1 uder side-by-side compariso, ad a 58.4% wi rate agaist the versio before performig RLHF.

	Better	Worse	Same	Wi Rate
v.s. Ziya-LlaMA-13B-v1.1	53.2%	20.3%	26.5%	66.5%
v.s. w/o RLHF	37.5%	20.8%	41.7%	58.4%

使用 Usage

Ziya2-13B-Chat采用"\<huma>:"和"\<bot>:"作为用户和模型的角色识别Prompt，使用"\"分隔不同角色对话内容。在推理时，需要将"\<huma>:"和"\<bot>:"作为前缀分别拼接至用户问题和模型回复的前面，并使用"\"串连各对话内容。

Ziya2-13B-Chat adopts "\<huma>:" ad "\<bot>:" as the role recogitio prompts for users ad models, ad uses "\" to separate the cotets of differet roles. Whe doig iferece, "\<huma>:" ad "\<bot>:" eed to be cocateated as prefixes i frot of the user's questio ad the model's reply respectively, ad "\" is used to joi the cotets of each role.

以下为具体使用方法：

Followig are the details of how to use it:

from modelscope import AutoTokeizer, AutoModelForCausalLM, sapshot_dowload
import torch

device = torch.device("cuda")

messages = [{"role": "user", "cotet": "手机如果贴膜贴了一张防指纹的钢化膜，那屏幕指纹解锁还有效吗？"}]
user_prefix = "<huma>:"
assistat_prefix = "<bot>:"
separator = "\"

prompt = []
for item i messages:
    prefix = user_prefix if item["role"] == "user" else assistat_prefix
    prompt.apped(f"{prefix}{item['cotet']}")
prompt.apped(assistat_prefix)
prompt = separator.joi(prompt)

model_dir = sapshot_dowload('Fegshebag/Ziya2-13B-Chat', revisio='master')
model = AutoModelForCausalLM.from_pretraied(model_dir,torch_dtype=torch.bfloat16).to(device)
tokeizer = AutoTokeizer.from_pretraied(model_dir, use_fast=False)
iput_ids = tokeizer(prompt, retur_tesors="pt").iput_ids.to(device)
geerate_ids = model.geerate(
            iput_ids,
            max_ew_tokes=512, 
            do_sample = True, 
            top_p = 0.9, 
            temperature = 0.85, 
            repetitio_pealty=1.05, 
            eos_toke_id=tokeizer.ecode("</s>"), 
            )
output = tokeizer.batch_decode(geerate_ids)[0]
prit(output)

上面是简单的问答示例，其他更多prompt和玩法，感兴趣的朋友可以下载下来自行发掘。

The above is a simple example of questio aswerig. For more prompts ad creative ways to use the model, iterested idividuals ca dowload it ad explore further o their ow.

引用 Citatio

如果您在您的工作中使用了我们的模型，可以引用我们的论文：

If you are usig the resource for your work, please cite the our paper:

@article{fegshebag,
  author    = {Jiaxig Zhag ad Ruyi Ga ad Jujie Wag ad Yuxiag Zhag ad Li Zhag ad Pig Yag ad Xiyu Gao ad Ziwei Wu ad Xiaoqu Dog ad Juqig He ad Jiaheg Zhuo ad Qi Yag ad Yogfeg Huag ad Xiayu Li ad Yagha Wu ad Juyu Lu ad Xiyu Zhu ad Weifeg Che ad Tig Ha ad Kuhao Pa ad Rui Wag ad Hao Wag ad Xiaoju Wu ad Zhogshe Zeg ad Chogpei Che},
  title     = {Fegshebag 1.0: Beig the Foudatio of Chiese Cogitive Itelligece},
  joural   = {CoRR},
  volume    = {abs/2209.02970},
  year      = {2022}
}

You ca also cite our website:

欢迎引用我们的网站:

@misc{Fegshebag-LM,
  title={Fegshebag-LM},
  author={IDEA-CCNL},
  year={2021},
  howpublished={\url{https://github.com/IDEA-CCNL/Fegshebag-LM}},
}

姜子牙2-13B-Chat

技术信息

作品详情