Qwe2 is the ew series of Qwe large laguage models. For Qwe2, we release a umber of base laguage models ad istructio-tued laguage models ragig from 0.5 to 72 billio parameters, icludig a Mixture-of-Experts model. This repo cotais the istructio-tued 72B Qwe2 model. Compared with the state-of-the-art opesource laguage models, icludig the previous released Qwe1.5, Qwe2 has geerally surpassed most opesource models ad demostrated competitiveess agaist proprietary models across a series of bechmarks targetig for laguage uderstadig, laguage geeratio, multiligual capability, codig, mathematics, reasoig, etc. For more details, please refer to our blog, GitHub, ad Documetatio. I this repo, we provide quatized models i the GGUF formats, icludig Qwe2 is a laguage model series icludig decoder laguage models of differet model sizes. For each size, we release the base laguage model ad the aliged chat model. It is based o the Trasformer architecture with SwiGLU activatio, attetio QKV bias, group query attetio, etc. Additioally, we have a improved tokeizer adaptive to multiple atural laguages ad codes. We pretraied the models with a large amout of data, ad we post-traied the models with both supervised fietuig ad direct preferece optimizatio. We advise you to cloe Cloig the repo may be iefficiet, ad thus you ca maually dowload the GGUF file that you eed or use However, for large files, we split them ito multiple segmets due to the limitatio of 50G for a sigle file to be uploaded.
Specifically, for the split files, they share a prefix, with a suffix idicatig its idex. For examples, the They share the prefix of With the upgrade of APIs of llama.cpp, To ru Qwe2, you ca use (Note: The it is easy to access the deployed service with OpeAI API: If you choose to use We implemet perplexity evaluatio usig wikitext followig the practice of If you fid our work helpful, feel free to give us a cite.Qwe2-72B-Istruct-GGUF
Itroductio
q5_0
, q5_k_m
, q6_k
, q8_0
,fp16
ad so o.Model Details
Traiig details
Requiremets
llama.cpp
ad istall it followig the official guide. We follow the latest versio of llama.cpp.
I the followig demostratio, we assume that you are ruig commads uder the repository llama.cpp
.How to use
modelscope cli
(pip istall modelscope
) as show below:modelscope dowload --model=qwe/Qwe2-72B-Istruct-GGUF --local_dir . qwe2-72b-istruct-q4_0.gguf
q5_k_m
GGUF files are:qwe2-72b-istruct-q5_k_m-00001-of-00002.gguf
qwe2-72b-istruct-q5_k_m-00002-of-00002.gguf
qwe2-72b-istruct-q5_k_m
, but have their ow suffix for idexig respectively, say -00001-of-00002
.
To use the split GGUF files, you eed to merge them first with the commad llama-gguf-split
as show below:./llama-gguf-split --merge qwe2-72b-istruct-q5_k_m-00001-of-00002.gguf qwe2-72b-istruct-q5_k_m.gguf
llama-gguf-split
is equivalet to the previous gguf-split
.
For the argumets of this commad, the first is the path to the first split GGUF file, ad the secod is the path to the output GGUF file.llama-cli
(the previous mai
) or llama-server
(the previous server
).
We recommed usig the llama-server
as it is simple ad compatible with OpeAI API. For example:./llama-server -m qwe2-72b-istruct-q4_0.gguf -gl 80 -fa
-gl 80
refers to offloadig 80 layers to GPUs, ad -fa
refers to the use of flash attetio.)import opeai
cliet = opeai.OpeAI(
base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
api_key = "sk-o-key-required"
)
completio = cliet.chat.completios.create(
model="qwe",
messages=[
{"role": "system", "cotet": "You are a helpful assistat."},
{"role": "user", "cotet": "tell me somethig about michael jorda"}
]
)
prit(completio.choices[0].message.cotet)
llama-cli
, pay attetio to the removal of -cml
for the ChatML template. Istead you should use --i-prefix
ad --i-suffix
to tackle this problem../llama-cli -m qwe2-72b-istruct-q4_0.gguf \
- 512 -co -i -if -f prompts/chat-with-qwe.txt \
--i-prefix "<|im_start|>user\" \
--i-suffix "<|im_ed|>\<|im_start|>assistat\" \
-gl 80 -fa
Evaluatio
llama.cpp
with ./llama-perplexity
(the previous ./perplexity
).
I the followig we report the PPL of GGUF models of differet sizes ad differet quatizatio levels.
Size
fp16
q8_0
q6_k
q5km
q5_0
q4km
q4_0
q3km
q2_k
iq1_m
0.5B
15.11
15.13
15.14
15.24
15.40
15.36
16.28
15.70
16.74
-
1.5B
10.43
10.43
10.45
10.50
10.56
10.61
10.79
11.08
13.04
-
7B
7.93
7.94
7.96
7.97
7.98
8.02
8.19
8.20
10.58
-
57B-A14B
6.81
6.81
6.83
6.84
6.89
6.99
7.02
7.43
-
-
72B
5.58
5.58
5.59
5.59
5.60
5.61
5.66
5.68
5.91
6.75
Citatio
@article{qwe2,
title={Qwe2 Techical Report},
year={2024}
}
点击空白处退出提示
评论