开源地址
https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-GGUF授权协议
apache-2.0

Qwe2-7B-Istruct-GGUF

Itroductio

Qwe2 is the ew series of Qwe large laguage models. For Qwe2, we release a umber of base laguage models ad istructio-tued laguage models ragig from 0.5 to 72 billio parameters, icludig a Mixture-of-Experts model. This repo cotais the istructio-tued 7B Qwe2 model.

Compared with the state-of-the-art opesource laguage models, icludig the previous released Qwe1.5, Qwe2 has geerally surpassed most opesource models ad demostrated competitiveess agaist proprietary models across a series of bechmarks targetig for laguage uderstadig, laguage geeratio, multiligual capability, codig, mathematics, reasoig, etc.

For more details, please refer to our blog, GitHub, ad Documetatio.

I this repo, we provide fp16 model ad quatized models i the GGUF formats, icludig q5_0, q5_k_m, q6_k ad q8_0.

Model Details

Qwe2 is a laguage model series icludig decoder laguage models of differet model sizes. For each size, we release the base laguage model ad the aliged chat model. It is based o the Trasformer architecture with SwiGLU activatio, attetio QKV bias, group query attetio, etc. Additioally, we have a improved tokeizer adaptive to multiple atural laguages ad codes.

Traiig details

We pretraied the models with a large amout of data, ad we post-traied the models with both supervised fietuig ad direct preferece optimizatio.

Requiremets

We advise you to cloe llama.cpp ad istall it followig the official guide. We follow the latest versio of llama.cpp. I the followig demostratio, we assume that you are ruig commads uder the repository llama.cpp.

How to use

Cloig the repo may be iefficiet, ad thus you ca maually dowload the GGUF file that you eed or use modelscope cli (pip istall modelscope) as show below:

modelscope dowload --model=qwe/Qwe2-7B-Istruct-GGUF --local_dir . qwe2-7b-istruct-q5_k_m.gguf

To ru Qwe2, you ca use llama-cli (the previous mai) or llama-server (the previous server). We recommed usig the llama-server as it is simple ad compatible with OpeAI API. For example:

./llama-server -m qwe2-7b-istruct-q5_k_m.gguf -gl 28 -fa

(Note: -gl 28 refers to offloadig 24 layers to GPUs, ad -fa refers to the use of flash attetio.)

The it is easy to access the deployed service with OpeAI API:

import opeai

cliet = opeai.OpeAI(
    base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-o-key-required"
)

completio = cliet.chat.completios.create(
    model="qwe",
    messages=[
        {"role": "system", "cotet": "You are a helpful assistat."},
        {"role": "user", "cotet": "tell me somethig about michael jorda"}
    ]
)
prit(completio.choices[0].message.cotet)

If you choose to use llama-cli, pay attetio to the removal of -cml for the ChatML template. Istead you should use --i-prefix ad --i-suffix to tackle this problem.

./llama-cli -m qwe2-7b-istruct-q5_k_m.gguf \
  - 512 -co -i -if -f prompts/chat-with-qwe.txt \
  --i-prefix "<|im_start|>user\" \
  --i-suffix "<|im_ed|>\<|im_start|>assistat\" \
  -gl 24 -fa

Evaluatio

We implemet perplexity evaluatio usig wikitext followig the practice of llama.cpp with ./llama-perplexity (the previous ./perplexity). I the followig we report the PPL of GGUF models of differet sizes ad differet quatizatio levels.

Size	fp16	q8_0	q6_k	q5km	q5_0	q4km	q4_0	q3km	q2_k	iq1_m
0.5B	15.11	15.13	15.14	15.24	15.40	15.36	16.28	15.70	16.74	-
1.5B	10.43	10.43	10.45	10.50	10.56	10.61	10.79	11.08	13.04	-
7B	7.93	7.94	7.96	7.97	7.98	8.02	8.19	8.20	10.58	-
57B-A14B	6.81	6.81	6.83	6.84	6.89	6.99	7.02	7.43	-	-
72B	5.58	5.58	5.59	5.59	5.60	5.61	5.66	5.68	5.91	6.75

Citatio

If you fid our work helpful, feel free to give us a cite.

@article{qwe2,
  title={Qwe2 Techical Report},
  year={2024}
}

Qwen2-7B-Instruct-GGUF Introduction Qwen2 is the new series of Qwen large language models. For Qwen2

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

Qwen2-7B-Instruct-GGUF

技术信息

作品详情