开源地址
https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct-GGUF授权协议
apache-2.0

Qwe2-57B-A14B-Istruct-GGUF

Itroductio

Qwe2 is the ew series of Qwe large laguage models. For Qwe2, we release a umber of base laguage models ad istructio-tued laguage models ragig from 0.5 to 72 billio parameters, icludig a Mixture-of-Experts model (57B-A14B).

Compared with the state-of-the-art opesource laguage models, icludig the previous released Qwe1.5, Qwe2 has geerally surpassed most opesource models ad demostrated competitiveess agaist proprietary models across a series of bechmarks targetig for laguage uderstadig, laguage geeratio, multiligual capability, codig, mathematics, reasoig, etc.

For more details, please refer to our blog, GitHub, ad Documetatio.

I this repo, we provide quatized models i the GGUF formats, icludig q2_k, q3_k_m, q4_0, q4_k_m, q5_0, q5_k_m, q6_k ad q8_0.

This is the GGUF repo for Qwe2-57B-A14B-Istruct, which is a MoE model.

Model Details

Qwe2 is a laguage model series icludig decoder laguage models of differet model sizes. For each size, we release the base laguage model ad the aliged chat model. It is based o the Trasformer architecture with SwiGLU activatio, attetio QKV bias, group query attetio, etc. Additioally, we have a improved tokeizer adaptive to multiple atural laguages ad codes.

Traiig details

We pretraied the models with a large amout of data, ad we post-traied the models with both supervised fietuig ad direct preferece optimizatio.

Requiremets

We advise you to cloe llama.cpp ad istall it followig the official guide. We follow the latest versio of llama.cpp. I the followig demostratio, we assume that you are ruig commads uder the repository llama.cpp.

How to use

Cloig the repo may be iefficiet, ad thus you ca maually dowload the GGUF file that you eed or use modelscope cli (pip istall modelscope) as show below:

modelscope dowload --model=qwe/Qwe2-57B-A14B-Istruct-GGUF --local_dir . qwe2-57b-a14b-istruct-q4_0.gguf

However, for large files, we split them ito multiple segmets due to the limitatio of 50G for a sigle file to be uploaded. Specifically, for the split files, they share a prefix, with a suffix idicatig its idex. For examples, the q8_0 GGUF files are:

qwe2-57b-a14b-istruct-q8_0-00001-of-00002.gguf
qwe2-57b-a14b-istruct-q8_0-00002-of-00002.gguf

They share the prefix of qwe2-57b-a14b-istruct-q5_k_m, but have their ow suffix for idexig respectively, say -00001-of-00002. To use the split GGUF files, you eed to merge them first with the commad llama-gguf-split as show below:

./llama-gguf-split --merge qwe2-57b-a14b-istruct-q8_0-00001-of-00002.gguf qwe2-57b-a14b-istruct-q8_0.gguf

With the upgrade of APIs of llama.cpp, llama-gguf-split is equivalet to the previous gguf-split. For the argumets of this commad, the first is the path to the first split GGUF file, ad the secod is the path to the output GGUF file.

To ru Qwe2, you ca use llama-cli (the previous mai) or llama-server (the previous server). We recommed usig the llama-server as it is simple ad compatible with OpeAI API. For example:

./llama-server -m qwe2-57b-a14b-istruct-q5_0.gguf -gl 28 -fa

(Note: -gl 28 refers to offloadig 28 layers to GPUs, ad -fa refers to the use of flash attetio.)

The it is easy to access the deployed service with OpeAI API:

import opeai

cliet = opeai.OpeAI(
    base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-o-key-required"
)

completio = cliet.chat.completios.create(
    model="qwe",
    messages=[
        {"role": "system", "cotet": "You are a helpful assistat."},
        {"role": "user", "cotet": "tell me somethig about michael jorda"}
    ]
)
prit(completio.choices[0].message.cotet)

If you choose to use llama-cli, pay attetio to the removal of -cml for the ChatML template. Istead you should use --i-prefix ad --i-suffix to tackle this problem.

./llama-cli -m qwe2-57b-a14b-istruct-q5_0.gguf \
  - 512 -co -i -if -f prompts/chat-with-qwe.txt \
  --i-prefix "<|im_start|>user\" \
  --i-suffix "<|im_ed|>\<|im_start|>assistat\" \
  -gl 28 -fa

Evaluatio

We implemet perplexity evaluatio usig wikitext followig the practice of llama.cpp with ./llama-perplexity (the previous ./perplexity). I the followig we report the PPL of GGUF models of differet sizes ad differet quatizatio levels.

Size	fp16	q8_0	q6_k	q5km	q5_0	q4km	q4_0	q3km	q2_k	iq1_m
0.5B	15.11	15.13	15.14	15.24	15.40	15.36	16.28	15.70	16.74	-
1.5B	10.43	10.43	10.45	10.50	10.56	10.61	10.79	11.08	13.04	-
7B	7.93	7.94	7.96	7.97	7.98	8.02	8.19	8.20	10.58	-
57B-A14B	6.81	6.81	6.83	6.84	6.89	6.99	7.02	7.43	-	-
72B	5.58	5.58	5.59	5.59	5.60	5.61	5.66	5.68	5.91	6.75

Citatio

If you fid our work helpful, feel free to give us a cite.

@article{qwe2,
  title={Qwe2 Techical Report},
  year={2024}
}

Qwen2-57B-A14B-Instruct-GGUF Introduction Qwen2 is the new series of Qwen large language models. For

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

Qwen2-57B-A14B-Instruct-GGUF

技术信息

作品详情