开源地址
https://modelscope.cn/models/skyxiaobaibai/Qwen1.5-1.8B-GPTQ-4-Bit授权协议
mit

Itroductio

Qwe1.5 is the beta versio of Qwe2, a trasformer-based decoder-oly laguage model pretraied o a large amout of data. I compariso with the previous released Qwe, the improvemets iclude:

8 model sizes, icludig 0.5B, 1.8B, 4B, 7B, 14B, 32B ad 72B dese models, ad a MoE model of 14B with 2.7B activated;
Sigificat performace improvemet i Chat models;
Multiligual support of both base ad chat models;
Stable support of 32K cotext legth for models of all sizes
No eed of trust_remote_code.

For more details, please refer to our blog post ad GitHub repo. Model Details

Qwe1.5 is a laguage model series icludig decoder laguage models of differet model sizes. For each size, we release the base laguage model ad the aliged chat model. It is based o the Trasformer architecture with SwiGLU activatio, attetio QKV bias, group query attetio, mixture of slidig widow attetio ad full attetio, etc. Additioally, we have a improved tokeizer adaptive to multiple atural laguages ad codes. For the beta versio, temporarily we did ot iclude GQA (except for 32B) ad the mixture of SWA ad full attetio. Requiremets

The code of Qwe1.5 has bee i the latest Huggig face trasformers ad we advise you to istall trasformers>=4.37.0, or you might ecouter the followig error:

from trasformers import AutoTokeizer, TextGeeratioPipelie from auto_gptq import AutoGPTQForCausalLM, BaseQuatizeCofig import loggig

loggig.basicCofig( format="%(asctime)s %(levelame)s [%(ame)s] %(message)s", level=loggig.INFO, datefmt="%Y-%m-%d %H:%M:%S" )

pretraiedmodeldir = "Qwe/Qwe1.5-1.8B" quatizedmodeldir = "local "

tokeizer = AutoTokeizer.frompretraied(pretraiedmodeldir, usefast=True) examples = [ tokeizer( "auto-gptq is a easy-to-use model quatizatio library with user-friedly apis, based o GPTQ algorithm." ) ]

quatizecofig = BaseQuatizeCofig( bits=4, # quatize model to 4-bit groupsize=128, # it is recommeded to set the value to 128 desc_act=False, # set to False ca sigificatly speed up iferece but the perplexity may slightly bad )

load u-quatized model, by default, the model will always be loaded ito CPU memory

model = AutoGPTQForCausalLM.frompretraied(pretraiedmodeldir, quatizecofig)

quatize model, the examples should be list of dict whose keys ca oly be "iputids" ad "attetiomask"

model.quatize(examples)

save quatized model

model.savequatized(quatizedmodel_dir)

save quatized model usig safetesors

model.savequatized(quatizedmodeldir, usesafetesors=True)

push quatized model to Huggig Face Hub.

to use useauthtoke=True, Logi first via huggigface-cli logi.

or pass explcit toke with: useauthtoke="hf_xxxxxxx"

(ucommet the followig three lies to eable this feature)

repoid = f"YourUserName/{quatizedmodel_dir}"

commitmessage = f"AutoGPTQ model for {pretraiedmodeldir}: {quatizecofig.bits}bits, gr{quatizecofig.groupsize}, descact={quatizecofig.desc_act}"

model.pushtohub(repoid, commitmessage=commitmessage, useauth_toke=True)

alteratively you ca save ad push at the same time

(ucommet the followig three lies to eable this feature)

repoid = f"YourUserName/{quatizedmodel_dir}"

commitmessage = f"AutoGPTQ model for {pretraiedmodeldir}: {quatizecofig.bits}bits, gr{quatizecofig.groupsize}, descact={quatizecofig.desc_act}"

model.pushtohub(repoid, savedir=quatizedmodeldir, usesafetesors=True, commitmessage=commitmessage, useauth_toke=True)

load quatized model to the first GPU

model = AutoGPTQForCausalLM.fromquatized(quatizedmodel_dir, device="cuda:0")

dowload quatized model from Huggig Face Hub ad load to the first GPU

model = AutoGPTQForCausalLM.fromquatized(repoid, device="cuda:0", usesafetesors=True, usetrito=False)

iferece with model.geerate

prit(tokeizer.decode(model.geerate(**tokeizer("autogptq is", returtesors="pt").to(model.device))[0]))

or you ca also use pipelie

GGUF add fuctio callig & C++ quatize INT4

pipelie = TextGeeratioPipelie(model=model, tokeizer=tokeizer) prit(pipelie("auto-gptq is")[0]["geerated_text"])

Introduction Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model p

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

Qwen1.5-1.8B-GPTQ-4-Bit

技术信息

作品详情

load u-quatized model, by default, the model will always be loaded ito CPU memory

quatize model, the examples should be list of dict whose keys ca oly be "iputids" ad "attetiomask"

save quatized model

save quatized model usig safetesors

push quatized model to Huggig Face Hub.

to use useauthtoke=True, Logi first via huggigface-cli logi.

or pass explcit toke with: useauthtoke="hf_xxxxxxx"

(ucommet the followig three lies to eable this feature)

repoid = f"YourUserName/{quatizedmodel_dir}"

commitmessage = f"AutoGPTQ model for {pretraiedmodeldir}: {quatizecofig.bits}bits, gr{quatizecofig.groupsize}, descact={quatizecofig.desc_act}"

model.pushtohub(repoid, commitmessage=commitmessage, useauth_toke=True)

alteratively you ca save ad push at the same time

(ucommet the followig three lies to eable this feature)

repoid = f"YourUserName/{quatizedmodel_dir}"

commitmessage = f"AutoGPTQ model for {pretraiedmodeldir}: {quatizecofig.bits}bits, gr{quatizecofig.groupsize}, descact={quatizecofig.desc_act}"

model.pushtohub(repoid, savedir=quatizedmodeldir, usesafetesors=True, commitmessage=commitmessage, useauth_toke=True)

load quatized model to the first GPU

dowload quatized model from Huggig Face Hub ad load to the first GPU

model = AutoGPTQForCausalLM.fromquatized(repoid, device="cuda:0", usesafetesors=True, usetrito=False)

iferece with model.geerate

or you ca also use pipelie

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐