Qwe-7B-Chat
? Huggig Face&bsp;&bsp; | &bsp;&bsp;? ModelScope&bsp;&bsp; | &bsp;&bsp; ? Paper &bsp;&bsp; | &bsp;&bsp;?️ Demo
WeChat (微信)&bsp;&bsp; | &bsp;&bsp;Discord&bsp;&bsp; | &bsp;&bsp;API
介绍(Itroductio)
通义千问-7B(Qwe-7B)是阿里云研发的通义千问大模型系列的70亿参数规模的模型。Qwe-7B是基于Trasformer的大语言模型, 在超大规模的预训练数据上进行训练得到。预训练数据类型多样,覆盖广泛,包括大量网络文本、专业书籍、代码等。同时,在Qwe-7B的基础上,我们使用对齐机制打造了基于大语言模型的AI助手Qwe-7B-Chat。相较于最初开源的Qwe-7B模型,我们现已将预训练模型和Chat模型更新到效果更优的版本。本仓库为Qwe-7B-Chat的仓库。
如果您想了解更多关于通义千问-7B开源模型的细节,我们建议您参阅GitHub代码库。
Qwe-7B is the 7B-parameter versio of the large laguage model series, Qwe (abbr. Togyi Qiawe), proposed by Alibaba Cloud. Qwe-7B is a Trasformer-based large laguage model, which is pretraied o a large volume of data, icludig web texts, books, codes, etc. Additioally, based o the pretraied Qwe-7B, we release Qwe-7B-Chat, a large-model-based AI assistat, which is traied with aligmet techiques. Now we have updated both our pretraied ad chat models with better performaces. This repository is the oe for Qwe-7B-Chat.
For more details about Qwe, please refer to the GitHub code repository.
要求(Requiremets)
- pytho 3.8及以上版本
- pytorch 1.12及以上版本,推荐2.0及以上版本
- 建议使用CUDA 11.4及以上(GPU用户、flash-attetio用户等需考虑此选项)
- pytho 3.8 ad above
- pytorch 1.12 ad above, 2.0 ad above are recommeded
- CUDA 11.4 ad above are recommeded (this is for GPU users, flash-attetio users, etc.)
依赖项(Depedecy)
运行Qwe-7B-Chat,请确保满足上述要求,再执行以下pip命令安装依赖库
To ru Qwe-7B-Chat, please make sure you meet the above requiremets, ad the execute the followig pip commads to istall the depedet libraries.
pip istall trasformers==4.32.0 accelerate tiktoke eiops scipy trasformers_stream_geerator==0.0.4 peft deepspeed
另外,推荐安装flash-attetio库(当前已支持flash attetio 2),以实现更高的效率和更低的显存占用。
I additio, it is recommeded to istall the flash-attetio library (we support flash attetio 2 ow.) for higher efficiecy ad lower memory usage.
git cloe https://github.com/Dao-AILab/flash-attetio
cd flash-attetio && pip istall .
# 下方安装可选,安装可能比较缓慢。
# pip istall csrc/layer_orm
# pip istall csrc/rotary
快速使用(Quickstart)
下面我们展示了一个使用Qwe-7B-Chat模型,进行多轮对话交互的样例:
We show a example of multi-tur iteractio with Qwe-7B-Chat i the followig code:
from modelscope import AutoModelForCausalLM, AutoTokeizer
from modelscope import GeeratioCofig
# Note: The default behavior ow has ijectio attack prevetio off.
tokeizer = AutoTokeizer.from_pretraied("qwe/Qwe-7B-Chat", trust_remote_code=True)
# use bf16
# model = AutoModelForCausalLM.from_pretraied("qwe/Qwe-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretraied("qwe/Qwe-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu oly
# model = AutoModelForCausalLM.from_pretraied("qwe/Qwe-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precisio based o the device.
model = AutoModelForCausalLM.from_pretraied("qwe/Qwe-7B-Chat", device_map="auto", trust_remote_code=True).eval()
# Specify hyperparameters for geeratio. But if you use trasformers>=4.32.0, there is o eed to do this.
# model.geeratio_cofig = GeeratioCofig.from_pretraied("Qwe/Qwe-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
# 第一轮对话 1st dialogue tur
respose, history = model.chat(tokeizer, "你好", history=Noe)
prit(respose)
# 你好!很高兴为你提供帮助。
# 第二轮对话 2d dialogue tur
respose, history = model.chat(tokeizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
prit(respose)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。
# 第三轮对话 3rd dialogue tur
respose, history = model.chat(tokeizer, "给这个故事起一个标题", history=history)
prit(respose)
# 《奋斗创业:一个年轻人的成功之路》
关于更多的使用说明,请参考我们的GitHub repo获取更多信息。
For more iformatio, please refer to our GitHub repo for more iformatio.
Tokeizer
注:作为术语的“tokeizatio”在中文中尚无共识的概念对应,本文档采用英文表达以利说明。
基于tiktoke的分词器有别于其他分词器,比如setecepiece分词器。尤其在微调阶段,需要特别注意特殊toke的使用。关于tokeizer的更多信息,以及微调时涉及的相关使用,请参阅文档。
Our tokeizer based o tiktoke is differet from other tokeizers, e.g., setecepiece tokeizer. You eed to pay attetio to special tokes, especially i fietuig. For more detailed iformatio o the tokeizer ad related use i fie-tuig, please refer to the documetatio.
量化 (Quatizatio)
用法 (Usage)
请注意:我们更新量化方案为基于AutoGPTQ的量化,提供Qwe-7B-Chat的It4量化模型点击这里。相比此前方案,该方案在模型评测效果几乎无损,且存储需求更低,推理速度更优。
Note: we provide a ew solutio based o AutoGPTQ, ad release a It4 quatized model for Qwe-7B-Chat Click here, which achieves early lossless model effects but improved performace o both memory costs ad iferece speed, i compariso with the previous solutio.
以下我们提供示例说明如何使用It4量化模型。在开始使用前,请先保证满足要求(如torch 2.0及以上,trasformers版本为4.32.0及以上,等等),并安装所需安装包:
Here we demostrate how to use our provided quatized models for iferece. Before you start, make sure you meet the requiremets of auto-gptq (e.g., torch 2.0 ad above, trasformers 4.32.0 ad above, etc.) ad istall the required packages:
pip istall auto-gptq optimum
如安装auto-gptq遇到问题,我们建议您到官方repo搜索合适的预编译wheel。
随后即可使用和上述一致的用法调用量化模型:
If you meet problems istallig auto-gptq, we advise you to check out the official repo to fid a pre-build wheel.
The you ca load the quatized model easily ad ru iferece as same as usual:
model = AutoModelForCausalLM.from_pretraied(
"Qwe/Qwe-7B-Chat-It4",
device_map="auto",
trust_remote_code=True
).eval()
respose, history = model.chat(tokeizer, "你好", history=Noe)
效果评测
我们对BF16,It8和It4模型在基准评测上做了测试(使用zero-shot设置),发现量化模型效果损失较小,结果如下所示:
We illustrate the zero-shot performace of both BF16, It8 ad It4 models o the bechmark, ad we fid that the quatized model does ot suffer from sigificat performace degradatio. Results are show below:
| Quatizatio |
MMLU |
CEval (val) |
GSM8K |
Humaeval |
| BF16 |
55.8 |
59.7 |
50.3 |
37.2 |
| It8 |
55.4 |
59.4 |
48.3 |
34.8 |
| It4 |
55.1 |
59.2 |
49.7 |
29.9 |
推理速度 (Iferece Speed)
我们测算了不同精度模型以及不同FlashAtt库版本下模型生成2048和8192个toke的平均推理速度。如图所示:
We measured the average iferece speed of geeratig 2048 ad 8192 tokes with differet quatizatio levels ad versios of flash-attetio, respectively.
| Quatizatio |
FlashAtt |
Speed (2048 tokes) |
Speed (8192 tokes) |
| BF16 |
v2 |
40.93 |
36.14 |
| It8 |
v2 |
37.47 |
32.54 |
| It4 |
v2 |
50.09 |
38.61 |
| BF16 |
v1 |
40.75 |
35.34 |
| It8 |
v1 |
37.51 |
32.39 |
| It4 |
v1 |
45.98 |
36.47 |
| BF16 |
Disabled |
37.55 |
33.56 |
| It8 |
Disabled |
37.84 |
32.65 |
| It4 |
Disabled |
48.12 |
36.70 |
具体而言,我们记录在长度为1的上下文的条件下生成8192个toke的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.8。推理速度是生成8192个toke的速度均值。
I detail, the settig of profilig is geeratig 8192 ew tokes with 1 cotext toke. The profilig rus o a sigle A100-SXM4-80G GPU with PyTorch 2.0.1 ad CUDA 11.8. The iferece speed is averaged over the geerated 8192 tokes.
注意:以上It4/It8模型生成速度使用autogptq库给出,当前AutoModelForCausalLM.from_pretraied载入的模型生成速度会慢大约20%。我们已经将该问题汇报给HuggigFace团队,若有解决方案将即时更新。
Note: The geeratio speed of the It4/It8 models metioed above is provided by the autogptq library. The curret speed of the model loaded usig "AutoModelForCausalLM.from_pretraied" will be approximately 20% slower. We have reported this issue to the HuggigFace team ad will update it promptly if a solutio is available.
显存使用 (GPU Memory Usage)
我们还测算了不同模型精度编码2048个toke及生成8192个toke的峰值显存占用情况。(显存消耗在是否使用FlashAtt的情况下均类似。)结果如下所示:
We also profile the peak GPU memory usage for ecodig 2048 tokes as cotext (ad geeratig sigle toke) ad geeratig 8192 tokes (with sigle toke as cotext) uder differet quatizatio levels, respectively. (The GPU memory usage is similar whe usig flash-attetio or ot.)The results are show below.
| Quatizatio Level |
Peak Usage for Ecodig 2048 Tokes |
Peak Usage for Geeratig 8192 Tokes |
| BF16 |
16.99GB |
22.53GB |
| It8 |
11.20GB |
16.62GB |
| It4 |
8.21GB |
13.63GB |
上述性能测算使用此脚本完成。
The above speed ad memory profilig are coducted usig this script.
模型细节(Model)
与Qwe-7B预训练模型相同,Qwe-7B-Chat模型规模基本情况如下所示:
The details of the model architecture of Qwe-7B-Chat are listed as follows:
| Hyperparameter |
Value |
| _layers |
32 |
| _heads |
32 |
| d_model |
4096 |
| vocab size |
151851 |
| sequece legth |
8192 |
在位置编码、FFN激活函数和ormalizatio的实现方式上,我们也采用了目前最流行的做法,
即RoPE相对位置编码、SwiGLU激活函数、RMSNorm(可选安装flash-attetio加速)。
在分词器方面,相比目前主流开源模型以中英词表为主,Qwe-7B-Chat使用了约15万toke大小的词表。
该词表在GPT-4使用的BPE词表cl100k_base基础上,对中文、多语言进行了优化,在对中、英、代码数据的高效编解码的基础上,对部分多语言更加友好,方便用户在不扩展词表的情况下对部分语种进行能力增强。
词表对数字按单个数字位切分。调用较为高效的tiktoke分词库进行分词。
For positio ecodig, FFN activatio fuctio, ad ormalizatio calculatio methods, we adopt the prevalet practices, i.e., RoPE relative positio ecodig, SwiGLU for activatio fuctio, ad RMSNorm for ormalizatio (optioal istallatio of flash-attetio for acceleratio).
For tokeizatio, compared to the curret maistream ope-source models based o Chiese ad Eglish vocabularies, Qwe-7B-Chat uses a vocabulary of over 150K tokes.
It first cosiders efficiet ecodig of Chiese, Eglish, ad code data, ad is also more friedly to multiligual laguages, eablig users to directly ehace the capability of some laguages without expadig the vocabulary.
It segmets umbers by sigle digit, ad calls the tiktoke tokeizer library for efficiet tokeizatio.
评测效果(Evaluatio)
对于Qwe-7B-Chat模型,我们同样评测了常规的中文理解(C-Eval)、英文理解(MMLU)、代码(HumaEval)和数学(GSM8K)等权威任务,同时包含了长序列任务的评测结果。由于Qwe-7B-Chat模型经过对齐后,激发了较强的外部系统调用能力,我们还进行了工具使用能力方面的评测。
提示:由于硬件和框架造成的舍入误差,复现结果如有波动属于正常现象。
For Qwe-7B-Chat, we also evaluate the model o C-Eval, MMLU, HumaEval, GSM8K, etc., as well as the bechmark evaluatio for log-cotext uderstadig, ad tool usage.
Note: Due to roudig errors caused by hardware ad framework, differeces i reproduced results are possible.
中文评测(Chiese Evaluatio)
C-Eval
在C-Eval验证集上,我们评价了Qwe-7B-Chat模型的0-shot & 5-shot准确率
We demostrate the 0-shot & 5-shot accuracy of Qwe-7B-Chat o C-Eval validatio set
| Model |
Avg. Acc. |
| LLaMA2-7B-Chat |
31.9 |
| LLaMA2-13B-Chat |
36.2 |
| LLaMA2-70B-Chat |
44.3 |
| ChatGLM2-6B-Chat |
52.6 |
| IterLM-7B-Chat |
53.6 |
| Baichua2-7B-Chat |
55.6 |
| Baichua2-13B-Chat |
56.7 |
| Qwe-7B-Chat (origial) (0-shot) |
54.2 |
| Qwe-7B-Chat (0-shot) |
59.7 |
| Qwe-7B-Chat (5-shot) |
59.3 |
| Qwe-14B-Chat (0-shot) |
69.8 |
| Qwe-14B-Chat (5-shot) |
71.7 |
C-Eval测试集上,Qwe-7B-Chat模型的zero-shot准确率结果如下:
The zero-shot accuracy of Qwe-7B-Chat o C-Eval testig set is provided below:
| Model |
Avg. |
STEM |
Social Scieces |
Humaities |
Others |
| Chiese-Alpaca-Plus-13B |
41.5 |
36.6 |
49.7 |
43.1 |
41.2 |
| Chiese-Alpaca-2-7B |
40.3 |
- |
- |
- |
- |
| ChatGLM2-6B-Chat |
50.1 |
46.4 |
60.4 |
50.6 |
46.9 |
| Baichua-13B-Chat |
51.5 |
43.7 |
64.6 |
56.2 |
49.2 |
| Qwe-7B-Chat (origial) |
54.6 |
47.8 |
67.6 |
59.3 |
50.6 |
| Qwe-7B-Chat |
58.6 |
53.3 |
72.1 |
62.8 |
52.0 |
| Qwe-14B-Chat |
69.1 |
65.1 |
80.9 |
71.2 |
63.4 |
在7B规模模型上,经过人类指令对齐的Qwe-7B-Chat模型,准确率在同类相近规模模型中仍然处于前列。
Compared with other pretraied models with comparable model size, the huma-aliged Qwe-7B-Chat performs well i C-Eval accuracy.
英文评测(Eglish Evaluatio)
MMLU
MMLU评测集上,Qwe-7B-Chat模型的 0-shot & 5-shot 准确率如下,效果同样在同类对齐模型中同样表现较优。
The 0-shot & 5-shot accuracy of Qwe-7B-Chat o MMLU is provided below.
The performace of Qwe-7B-Chat still o the top betwee other huma-aliged models with comparable size.
| Model |
Avg. Acc. |
| ChatGLM2-6B-Chat |
46.0 |
| LLaMA2-7B-Chat |
46.2 |
| IterLM-7B-Chat |
51.1 |
| Baichua2-7B-Chat |
52.9 |
| LLaMA2-13B-Chat |
54.6 |
| Baichua2-13B-Chat |
57.3 |
| LLaMA2-70B-Chat |
63.8 |
| Qwe-7B-Chat (origial) (0-shot) |
53.9 |
| Qwe-7B-Chat (0-shot) |
55.8 |
| Qwe-7B-Chat (5-shot) |
57.0 |
| Qwe-14B-Chat (0-shot) |
64.6 |
| Qwe-14B-Chat (5-shot) |
66.5 |
代码评测(Codig Evaluatio)
Qwe-7B-Chat在HumaEval的zero-shot Pass@1效果如下
The zero-shot Pass@1 of Qwe-7B-Chat o HumaEval is demostrated below
| Model |
Pass@1 |
| ChatGLM2-6B-Chat |
11.0 |
| LLaMA2-7B-Chat |
12.2 |
| Baichua2-7B-Chat |
13.4 |
| IterLM-7B-Chat |
14.6 |
| Baichua2-13B-Chat |
17.7 |
| LLaMA2-13B-Chat |
18.9 |
| LLaMA2-70B-Chat |
32.3 |
| Qwe-7B-Chat (origial) |
24.4 |
| Qwe-7B-Chat |
37.2 |
| Qwe-14B-Chat |
43.9 |
数学评测(Mathematics Evaluatio)
在评测数学能力的GSM8K上,Qwe-7B-Chat的准确率结果如下
The accuracy of Qwe-7B-Chat o GSM8K is show below
| Model |
Acc. |
| LLaMA2-7B-Chat |
26.3 |
| ChatGLM2-6B-Chat |
28.8 |
| Baichua2-7B-Chat |
32.8 |
| IterLM-7B-Chat |
33.0 |
| LLaMA2-13B-Chat |
37.1 |
| Baichua2-13B-Chat |
55.3 |
| LLaMA2-70B-Chat |
59.3 |
| Qwe-7B-Chat (origial) (0-shot) |
41.1 |
| Qwe-7B-Chat (0-shot) |
50.3 |
| Qwe-7B-Chat (8-shot) |
54.1 |
| Qwe-14B-Chat (0-shot) |
60.1 |
| Qwe-14B-Chat (8-shot) |
59.3 |
长序列评测(Log-Cotext Uderstadig)
通过NTK插值,LogN注意力缩放可以扩展Qwe-7B-Chat的上下文长度。在长文本摘要数据集VCSUM上(文本平均长度在15K左右),Qwe-7B-Chat的Rouge-L结果如下:
(若要启用这些技巧,请将cofig.jso里的use_dyamic_tk和use_log_att设置为true)
We itroduce NTK-aware iterpolatio, LogN attetio scalig to exted the cotext legth of Qwe-7B-Chat. The Rouge-L results of Qwe-7B-Chat o log-text summarizatio dataset VCSUM (The average legth of this dataset is aroud 15K) are show below:
(To use these tricks, please set use_dyamic_tk ad use_log_att to true i cofig.jso.)
| Model |
VCSUM (zh) |
| GPT-3.5-Turbo-16k |
16.0 |
| LLama2-7B-Chat |
0.2 |
| IterLM-7B-Chat |
13.0 |
| ChatGLM2-6B-Chat |
16.3 |
| Qwe-7B-Chat |
16.6 |
工具使用能力的评测(Tool Usage)
ReAct Promptig
千问支持通过 ReAct Promptig 调用插件/工具/API。ReAct 也是 LagChai 框架采用的主要方式之一。在我们开源的、用于评估工具使用能力的评测基准上,千问的表现如下:
Qwe-Chat supports callig plugis/tools/APIs through ReAct Promptig. ReAct is also oe of the mai approaches used by the LagChai framework. I our evaluatio bechmark for assessig tool usage capabilities, Qwe-Chat's performace is as follows:
| Chiese Tool-Use Bechmark |
| Model | Tool Selectio (Acc.↑) | Tool Iput (Rouge-L↑) | False Positive Error↓ |
| GPT-4 | 95% | 0.90 | 15.0% |
| GPT-3.5 | 85% | 0.88 | 75.0% |
| Qwe-7B-Chat | 98% | 0.91 | 7.3% |
| Qwe-14B-Chat | 98% | 0.93 | 2.4% |
评测基准中出现的插件均没有出现在千问的训练集中。该基准评估了模型在多个候选插件中选择正确插件的准确率、传入插件的参数的合理性、以及假阳率。假阳率(False Positive)定义:在处理不该调用插件的请求时,错误地调用了插件。
The plugis that appear i the evaluatio set do ot appear i the traiig set of Qwe. This bechmark evaluates the accuracy of the model i selectig the correct plugi from multiple cadidate plugis, the ratioality of the parameters passed ito the plugi, ad the false positive rate. False Positive: Icorrectly ivokig a plugi whe it should ot have bee called whe respodig to a query.

Code Iterpreter
为了考察Qwe使用Pytho Code Iterpreter完成数学解题、数据可视化、及文件处理与爬虫等任务的能力,我们专门建设并开源了一个评测这方面能力的评测基准。
我们发现Qwe在生成代码的可执行率、结果正确性上均表现较好:
To assess Qwe's ability to use the Pytho Code Iterpreter for tasks such as mathematical problem solvig, data visualizatio, ad other geeral-purpose tasks such as file hadlig ad web scrapig, we have created ad ope-sourced a bechmark specifically desiged for evaluatig these capabilities. You ca fid the bechmark at this lik.
We have observed that Qwe performs well i terms of code executability ad result accuracy whe geeratig code:
| Executable Rate of Geerated Code (%) |
| Model | Math↑ | Visualizatio↑ | Geeral↑ |
| GPT-4 | 91.9 | 85.9 | 82.8 |
| GPT-3.5 | 89.2 | 65.0 | 74.1 |
| LLaMA2-7B-Chat |
41.9 |
33.1 |
24.1 |
| LLaMA2-13B-Chat |
50.0 |
40.5 |
48.3 |
| CodeLLaMA-7B-Istruct |
85.1 |
54.0 |
70.7 |
| CodeLLaMA-13B-Istruct |
93.2 |
55.8 |
74.1 |
| IterLM-7B-Chat-v1.1 |
78.4 |
44.2 |
62.1 |
| IterLM-20B-Chat |
70.3 |
44.2 |
65.5 |
| Qwe-7B-Chat |
82.4 |
64.4 |
67.2 |
| Qwe-14B-Chat |
89.2 |
84.1 |
65.5 |
| Accuracy of Code Executio Results (%) |
| Model | Math↑ | Visualizatio-Hard↑ | Visualizatio-Easy↑ |
| GPT-4 | 82.8 | 66.7 | 60.8 |
| GPT-3.5 | 47.3 | 33.3 | 55.7 |
| LLaMA2-7B-Chat |
3.9 |
14.3 |
39.2 |
| LLaMA2-13B-Chat |
8.3 |
8.3 |
40.5 |
| CodeLLaMA-7B-Istruct |
14.3 |
26.2 |
60.8 |
| CodeLLaMA-13B-Istruct |
28.2 |
27.4 |
62.0 |
| IterLM-7B-Chat-v1.1 |
28.5 |
4.8 |
40.5 |
| IterLM-20B-Chat |
34.6 |
21.4 |
45.6 |
| Qwe-7B-Chat |
41.9 |
40.5 |
54.4 |
| Qwe-14B-Chat |
58.4 |
53.6 |
59.5 |
Huggigface Aget
千问还具备作为 HuggigFace Aget 的能力。它在 Huggigface 提供的ru模式评测基准上的表现如下:
Qwe-Chat also has the capability to be used as a HuggigFace Aget. Its performace o the ru-mode bechmark provided by HuggigFace is as follows:
| HuggigFace Aget Bechmark- Ru Mode |
| Model | Tool Selectio↑ | Tool Used↑ | Code↑ |
| GPT-4 | 100 | 100 | 97.4 |
| GPT-3.5 | 95.4 | 96.3 | 87.0 |
| StarCoder-Base-15B | 86.1 | 87.0 | 68.9 |
| StarCoder-15B | 87.0 | 88.0 | 68.9 |
| Qwe-7B-Chat | 87.0 | 87.0 | 71.5 |
| Qwe-14B-Chat | 93.5 | 94.4 | 87.0 |
| HuggigFace Aget Bechmark - Chat Mode |
| Model | Tool Selectio↑ | Tool Used↑ | Code↑ |
| GPT-4 | 97.9 | 97.9 | 98.5 |
| GPT-3.5 | 97.3 | 96.8 | 89.6 |
| StarCoder-Base-15B | 97.9 | 97.9 | 91.1 |
| StarCoder-15B | 97.9 | 97.9 | 89.6 |
| Qwe-7B-Chat | 94.7 | 94.7 | 85.1 |
| Qwe-14B-Chat | 97.9 | 97.9 | 95.5 |
x86 平台 (x86 Platforms)
在 酷睿™/至强® 可扩展处理器或 Arc™ GPU 上部署量化模型时,建议使用 OpeVINO™ Toolkit以充分利用硬件,实现更好的推理性能。您可以安装并运行此example otebook。相关问题,您可在OpeVINO repo中提交。
Whe deploy o Core™/Xeo® Scalable Processors or with Arc™ GPU, OpeVINO™ Toolkit is recommeded. You ca istall ad ru this example otebook. For related issues, you are welcome to file a issue at OpeVINO repo.
FAQ
如遇到问题,敬请查阅FAQ以及issue区,如仍无法解决再提交issue。
If you meet problems, please refer to FAQ ad the issues first to search a solutio before you lauch a ew issue.
引用 (Citatio)
如果你觉得我们的工作对你有帮助,欢迎引用!
If you fid our work helpful, feel free to give us a cite.
@article{qwe,
title={Qwe Techical Report},
author={Jize Bai ad Shuai Bai ad Yufei Chu ad Zeyu Cui ad Kai Dag ad Xiaodog Deg ad Yag Fa ad Webi Ge ad Yu Ha ad Fei Huag ad Biyua Hui ad Luo Ji ad Mei Li ad Juyag Li ad Ruji Li ad Dayiheg Liu ad Gao Liu ad Chegqiag Lu ad Kemig Lu ad Jiaxi Ma ad Rui Me ad Xigzhag Re ad Xuacheg Re ad Chuaqi Ta ad Sia Ta ad Jiahog Tu ad Peg Wag ad Shijie Wag ad Wei Wag ad Shegguag Wu ad Befeg Xu ad Ji Xu ad A Yag ad Hao Yag ad Jia Yag ad Shusheg Yag ad Yag Yao ad Bowe Yu ad Hogyi Yua ad Zheg Yua ad Jiawei Zhag ad Xigxua Zhag ad Yichag Zhag ad Zheru Zhag ad Chag Zhou ad Jigre Zhou ad Xiaohua Zhou ad Tiahag Zhu},
joural={arXiv preprit arXiv:2309.16609},
year={2023}
}
使用协议(Licese Agreemet)
我们的代码和模型权重对学术研究完全开放,并支持商用。请查看LICENSE了解具体的开源协议细节。如需商用,请填写问卷申请。
Our code ad checkpoits are ope to research purpose, ad they are allowed for commercial purposes. Check LICENSE for more details about the licese. If you have requiremets for commercial use, please fill out the form to apply.
联系我们(Cotact Us)
如果你想给我们的研发团队和产品团队留言,欢迎加入我们的微信群、钉钉群以及Discord!同时,也欢迎通过邮件(qiawe_opesource@alibabacloud.com)联系我们。
If you are iterested to leave a message to either our research team or product team, joi our Discord or WeChat groups! Also, feel free to sed a email to qiawe_opesource@alibabacloud.com.
评论