The Mixtral-8x7B Large Laguage Model (LLM) is a pretraied geerative Sparse Mixture of Experts. The Mistral-8x7B outperforms Llama 2 70B o most bechmarks we tested. For full details of this model please read our release blog post. This repo cotais weights that are compatible with vLLM servig of the model as well as Huggig Face trasformers library. It is based o the origial Mixtral torret release, but the file format ad parameter ames are differet. Please ote that model caot (yet) be istatiated with HF. By default, trasformers will load the model i full precisio. Therefore you might be iterested to further reduce dow the memory requiremets to ru the model through the optimizatios we offer i HF ecosystem: Note Mixtral-8x7B is a pretraied base model ad therefore does ot have ay moderatio mechaisms. Albert Jiag, Alexadre Sablayrolles, Arthur Mesch, Blache Savary, Chris Bamford, Devedra Sigh Chaplot, Diego de las Casas, Emma Bou Haa, Floria Bressad, Giaa Legyel, Guillaume Bour, Guillaume Lample, Lélio Reard Lavaud, Louis Tero, Lucile Saulier, Marie-Ae Lachaux, Pierre Stock, Teve Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wag, Timothée Lacroix, William El Sayed.Model Card for Mixtral-8x7B
Warig
Ru the model
from modelscope import AutoModelForCausalLM, AutoTokeizer
model_id = "AI-ModelScope/Mixtral-8x7B-v0.1"
tokeizer = AutoTokeizer.from_pretraied(model_id)
model = AutoModelForCausalLM.from_pretraied(model_id, device_map='auto')
text = "Hello my ame is"
iputs = tokeizer(text, retur_tesors="pt")
outputs = model.geerate(**iputs, max_ew_tokes=20)
prit(tokeizer.decode(outputs[0], skip_special_tokes=True))
I half-precisio
float16
precisio oly works o GPU devices Click to expad
+ import torch
from trasformers import AutoModelForCausalLM, AutoTokeizer
model_id = "mistralai/Mixtral-8x7B-v0.1"
tokeizer = AutoTokeizer.from_pretraied(model_id)
+ model = AutoModelForCausalLM.from_pretraied(model_id, torch_dtype=torch.float16).to(0)
text = "Hello my ame is"
+ iputs = tokeizer(text, retur_tesors="pt").to(0)
outputs = model.geerate(**iputs, max_ew_tokes=20)
prit(tokeizer.decode(outputs[0], skip_special_tokes=True))
Lower precisio usig (8-bit & 4-bit) usig
bitsadbytes
Click to expad
+ import torch
from trasformers import AutoModelForCausalLM, AutoTokeizer
model_id = "mistralai/Mixtral-8x7B-v0.1"
tokeizer = AutoTokeizer.from_pretraied(model_id)
+ model = AutoModelForCausalLM.from_pretraied(model_id, load_i_4bit=True)
text = "Hello my ame is"
+ iputs = tokeizer(text, retur_tesors="pt").to(0)
outputs = model.geerate(**iputs, max_ew_tokes=20)
prit(tokeizer.decode(outputs[0], skip_special_tokes=True))
Load the model with Flash Attetio 2
Click to expad
+ import torch
from trasformers import AutoModelForCausalLM, AutoTokeizer
model_id = "mistralai/Mixtral-8x7B-v0.1"
tokeizer = AutoTokeizer.from_pretraied(model_id)
+ model = AutoModelForCausalLM.from_pretraied(model_id, use_flash_attetio_2=True)
text = "Hello my ame is"
+ iputs = tokeizer(text, retur_tesors="pt").to(0)
outputs = model.geerate(**iputs, max_ew_tokes=20)
prit(tokeizer.decode(outputs[0], skip_special_tokes=True))
Notice
The Mistral AI Team
点击空白处退出提示
评论