开源地址
https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat授权协议
other

Qwe1.5-MoE-A2.7B-Chat

Itroductio

Qwe1.5-MoE is a trasformer-based MoE decoder-oly laguage model pretraied o a large amout of data.

For more details, please refer to our blog post ad GitHub repo.

Model Details

Qwe1.5-MoE employs Mixture of Experts (MoE) architecture, where the models are upcycled from dese laguage models. For istace, Qwe1.5-MoE-A2.7B is upcycled from Qwe-1.8B. It has 14.3B parameters i total ad 2.7B activated parameters durig rutime, while achiechig comparable performace to Qwe1.5-7B, it oly requires 25% of the traiig resources. We also observed that the iferece speed is 1.74 times that of Qwe1.5-7B.

Traiig details

We pretraied the models with a large amout of data, ad we post-traied the models with both supervised fietuig ad direct preferece optimizatio. However, DPO leads to improvemets i huma preferece evaluatio but degradatio i bechmark evaluatio. I the very ear future, we will fix both problems.

Requiremets

The code of Qwe1.5-MoE has bee i the latest Huggig face trasformers ad we advise you to build from source with commad pip istall git+https://github.com/huggigface/trasformers, or you might ecouter the followig error:

KeyError: 'qwe2_moe'.

Quickstart

Here provides a code sippet with apply_chat_template to show you how to load the tokeizer ad model ad how to geerate cotets.

from modelscope import AutoModelForCausalLM, AutoTokeizer
device = "cuda" # the device to load the model oto

model = AutoModelForCausalLM.from_pretraied(
    "qwe/Qwe1.5-MoE-A2.7B-Chat",
    torch_dtype="auto",
    device_map="auto"
)
tokeizer = AutoTokeizer.from_pretraied("qwe/Qwe1.5-MoE-A2.7B-Chat")

prompt = "Give me a short itroductio to large laguage model."
messages = [
    {"role": "system", "cotet": "You are a helpful assistat."},
    {"role": "user", "cotet": prompt}
]
text = tokeizer.apply_chat_template(
    messages,
    tokeize=False,
    add_geeratio_prompt=True
)
model_iputs = tokeizer([text], retur_tesors="pt").to(device)

geerated_ids = model.geerate(
    model_iputs.iput_ids,
    max_ew_tokes=512
)
geerated_ids = [
    output_ids[le(iput_ids):] for iput_ids, output_ids i zip(model_iputs.iput_ids, geerated_ids)
]

respose = tokeizer.batch_decode(geerated_ids, skip_special_tokes=True)[0]
prit(respose)

For quatized models, we advise you to use the GPTQ correspodets, amely Qwe1.5-MoE-A2.7B-Chat-GPTQ-It4.

Tips

If you ecouter code switchig or other bad cases, we advise you to use our provided hyper-parameters i geeratio_cofig.jso.

Qwen1.5-MoE-A2.7B-Chat Introduction Qwen1.5-MoE is a transformer-based MoE decoder-only language mod

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

通义千问1.5-MoE-A2.7B-Chat

技术信息

作品详情