Qwen1.5-1.8B-GPTQ-4-Bit

我要开发同款
匿名用户2024年07月31日
65阅读

技术信息

开源地址
https://modelscope.cn/models/skyxiaobaibai/Qwen1.5-1.8B-GPTQ-4-Bit
授权协议
mit

作品详情

Itroductio

Qwe1.5 is the beta versio of Qwe2, a trasformer-based decoder-oly laguage model pretraied o a large amout of data. I compariso with the previous released Qwe, the improvemets iclude:

8 model sizes, icludig 0.5B, 1.8B, 4B, 7B, 14B, 32B ad 72B dese models, ad a MoE model of 14B with 2.7B activated;
Sigificat performace improvemet i Chat models;
Multiligual support of both base ad chat models;
Stable support of 32K cotext legth for models of all sizes
No eed of trust_remote_code.

For more details, please refer to our blog post ad GitHub repo. Model Details

Qwe1.5 is a laguage model series icludig decoder laguage models of differet model sizes. For each size, we release the base laguage model ad the aliged chat model. It is based o the Trasformer architecture with SwiGLU activatio, attetio QKV bias, group query attetio, mixture of slidig widow attetio ad full attetio, etc. Additioally, we have a improved tokeizer adaptive to multiple atural laguages ad codes. For the beta versio, temporarily we did ot iclude GQA (except for 32B) ad the mixture of SWA ad full attetio. Requiremets

The code of Qwe1.5 has bee i the latest Huggig face trasformers ad we advise you to istall trasformers>=4.37.0, or you might ecouter the followig error:

from trasformers import AutoTokeizer, TextGeeratioPipelie from auto_gptq import AutoGPTQForCausalLM, BaseQuatizeCofig import loggig

loggig.basicCofig( format="%(asctime)s %(levelame)s [%(ame)s] %(message)s", level=loggig.INFO, datefmt="%Y-%m-%d %H:%M:%S" )

pretraiedmodeldir = "Qwe/Qwe1.5-1.8B" quatizedmodeldir = "local "

tokeizer = AutoTokeizer.frompretraied(pretraiedmodeldir, usefast=True) examples = [ tokeizer( "auto-gptq is a easy-to-use model quatizatio library with user-friedly apis, based o GPTQ algorithm." ) ]

quatizecofig = BaseQuatizeCofig( bits=4, # quatize model to 4-bit groupsize=128, # it is recommeded to set the value to 128 desc_act=False, # set to False ca sigificatly speed up iferece but the perplexity may slightly bad )

load u-quatized model, by default, the model will always be loaded ito CPU memory

model = AutoGPTQForCausalLM.frompretraied(pretraiedmodeldir, quatizecofig)

quatize model, the examples should be list of dict whose keys ca oly be "iputids" ad "attetiomask"

model.quatize(examples)

save quatized model

model.savequatized(quatizedmodel_dir)

save quatized model usig safetesors

model.savequatized(quatizedmodeldir, usesafetesors=True)

push quatized model to Huggig Face Hub.

to use useauthtoke=True, Logi first via huggigface-cli logi.

or pass explcit toke with: useauthtoke="hf_xxxxxxx"

(ucommet the followig three lies to eable this feature)

repoid = f"YourUserName/{quatizedmodel_dir}"

commitmessage = f"AutoGPTQ model for {pretraiedmodeldir}: {quatizecofig.bits}bits, gr{quatizecofig.groupsize}, descact={quatizecofig.desc_act}"

model.pushtohub(repoid, commitmessage=commitmessage, useauth_toke=True)

alteratively you ca save ad push at the same time

(ucommet the followig three lies to eable this feature)

repoid = f"YourUserName/{quatizedmodel_dir}"

commitmessage = f"AutoGPTQ model for {pretraiedmodeldir}: {quatizecofig.bits}bits, gr{quatizecofig.groupsize}, descact={quatizecofig.desc_act}"

model.pushtohub(repoid, savedir=quatizedmodeldir, usesafetesors=True, commitmessage=commitmessage, useauth_toke=True)

load quatized model to the first GPU

model = AutoGPTQForCausalLM.fromquatized(quatizedmodel_dir, device="cuda:0")

dowload quatized model from Huggig Face Hub ad load to the first GPU

model = AutoGPTQForCausalLM.fromquatized(repoid, device="cuda:0", usesafetesors=True, usetrito=False)

iferece with model.geerate

prit(tokeizer.decode(model.geerate(**tokeizer("autogptq is", returtesors="pt").to(model.device))[0]))

or you ca also use pipelie

GGUF add fuctio callig & C++ quatize INT4

pipelie = TextGeeratioPipelie(model=model, tokeizer=tokeizer) prit(pipelie("auto-gptq is")[0]["geerated_text"])

功能介绍

Introduction Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model p

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论