开源地址
https://modelscope.cn/models/aitejiu/xhs_createation授权协议
Apache License 2.0

INT4 Weight-oly Quatizatio ad Deploymet (W4A16)

LMDeploy adopts AWQ algorithm for 4bit weight-oly quatizatio. By developed the high-performace cuda kerel, the 4bit quatized model iferece achieves up to 2.4x faster tha FP16.

LMDeploy supports the followig NVIDIA GPU for W4A16 iferece:

Turig(sm75): 20 series, T4
Ampere(sm80,sm86): 30 series, A10, A16, A30, A100
Ada Lovelace(sm90): 40 series

Before proceedig with the quatizatio ad iferece, please esure that lmdeploy is istalled.

pip istall lmdeploy[all]

This article comprises the followig sectios:

Iferece
Evaluatio
Service

Iferece

Please dowload iterlm2-chat-20b-4bit model as follows,

git-lfs istall
git cloe --depth=1 https://www.modelscope.c/Shaghai_AI_Laboratory/iterlm2-chat-20b-4bits.git

Tryig the followig codes, you ca perform the batched offlie iferece with the quatized model:

from lmdeploy import pipelie, TurbomidEgieCofig
egie_cofig = TurbomidEgieCofig(model_format='awq')
pipe = pipelie("./iterlm2-chat-20b-4bits", backed_cofig=egie_cofig)
respose = pipe(["Hi, pls itro yourself", "Shaghai is"])
prit(respose)

For more iformatio about the pipelie parameters, please refer to here.

Evaluatio

Please overview this guide about model evaluatio with LMDeploy.

Service

LMDeploy's api_server eables models to be easily packed ito services with a sigle commad. The provided RESTful APIs are compatible with OpeAI's iterfaces. Below are a example of service startup:

lmdeploy serve api_server ./iterlm2-chat-20b-4bits --backed turbomid --model-format awq

The default port of api_server is 23333. After the server is lauched, you ca commuicate with server o termial through api_cliet:

lmdeploy serve api_cliet http://0.0.0.0:23333

You ca overview ad try out api_server APIs olie by swagger UI at http://0.0.0.0:23333, or you ca also read the API specificatio from here.

INT4 Weight-only Quantization and Deployment (W4A16) LMDeploy adopts AWQ algorithm for 4bit weight-o

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

小红书文本

技术信息

作品详情

INT4 Weight-oly Quatizatio ad Deploymet (W4A16)

Iferece

Evaluatio

Service

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐