Qwe2 is the ew series of Qwe large laguage models. For Qwe2, we release a umber of base laguage models ad istructio-tued laguage models ragig from 0.5 to 72 billio parameters, icludig a Mixture-of-Experts model. This repo cotais the istructio-tued 1.5B Qwe2 model. Compared with the state-of-the-art opesource laguage models, icludig the previous released Qwe1.5, Qwe2 has geerally surpassed most opesource models ad demostrated competitiveess agaist proprietary models across a series of bechmarks targetig for laguage uderstadig, laguage geeratio, multiligual capability, codig, mathematics, reasoig, etc. For more details, please refer to our blog ad GitHub. Qwe2 is a laguage model series icludig decoder laguage models of differet model sizes. For each size, we release the base laguage model ad the aliged chat model. It is based o the Trasformer architecture with SwiGLU activatio, attetio QKV bias, group query attetio, etc. Additioally, we have a improved tokeizer adaptive to multiple atural laguages ad codes. We pretraied the models with a large amout of data, ad we post-traied the models with both supervised fietuig ad direct preferece optimizatio. The code of Qwe2 has bee i the latest Huggig face trasformers ad we advise you to istall Here provides a code sippet with To compare the geeratio performace betwee bfloat16 (bf16) ad quatized models such as GPTQ-It8, GPTQ-It4, ad AWQ, please cosult our Bechmark of Quatized Models. This bechmark provides isights ito how differet quatizatio techiques affect model performace. For those iterested i uderstadig the iferece speed ad memory cosumptio whe deployig these models with either If you fid our work helpful, feel free to give us a cite.Qwe2-1.5B-Istruct-GPTQ-It8
Itroductio
RutimeError: probability tesor cotais either `if`, `a` or elemet < 0
durig iferece with trasformer
, we recommad deployig this model with vLLM.
Model Details
Traiig details
Requiremets
trasformers>=4.37.0
, or you might ecouter the followig error:KeyError: 'qwe2'
Quickstart
apply_chat_template
to show you how to load the tokeizer ad model ad how to geerate cotets.from modelscope import AutoModelForCausalLM, AutoTokeizer
device = "cuda" # the device to load the model oto
model = AutoModelForCausalLM.from_pretraied(
"qwe/Qwe2-1.5B-Istruct-GPTQ-It8",
torch_dtype="auto",
device_map="auto"
)
tokeizer = AutoTokeizer.from_pretraied("qwe/Qwe2-1.5B-Istruct-GPTQ-It8")
prompt = "Give me a short itroductio to large laguage model."
messages = [
{"role": "system", "cotet": "You are a helpful assistat."},
{"role": "user", "cotet": prompt}
]
text = tokeizer.apply_chat_template(
messages,
tokeize=False,
add_geeratio_prompt=True
)
model_iputs = tokeizer([text], retur_tesors="pt").to(device)
geerated_ids = model.geerate(
model_iputs.iput_ids,
max_ew_tokes=512
)
geerated_ids = [
output_ids[le(iput_ids):] for iput_ids, output_ids i zip(model_iputs.iput_ids, geerated_ids)
]
respose = tokeizer.batch_decode(geerated_ids, skip_special_tokes=True)[0]
prit(respose)
Bechmark ad Speed
trasformer
or vLLM
, we have compiled a extesive Speed Bechmark.Citatio
@article{qwe2,
title={Qwe2 Techical Report},
year={2024}
}
点击空白处退出提示
评论