☨Correspodig author Existig research has demostrated that refiig large laguage models (LLMs) through the utilizatio of machie-geerated istructio-followig data empowers these models to exhibit impressive zero-shot capabilities for ovel tasks, without requirig huma-authored istructios. I this paper, we systematically ivestigate, preprocess, ad itegrate three Chiese istructio-followig datasets with the aim of ehacig the Chiese coversatioal capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through istructio fie-tuig o this carefully processed dataset, we successfully costruct the Mixtral-8x7B sparse Mixture-of-Experts model amed "Aurora." To assess the performace of Aurora, we utilize three widely recogized bechmark tests: C-Eval, MMLU, ad CMMLU. Empirical studies validate the effectiveess of istructio fie-tuig applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioeerig i the executio of istructio fie-tuig o a sparse expert-mixed model, markig a sigificat breakthrough i ehacig the capabilities of this model architecture. It is kow that LLM evaluatio remais a sigificat challege. We use three public bechmarks i our study. Next are some refereces we gave you about GPU memory usage durig the traiig ad iferece stage. |Stage|GPU Memory Usage|
|:-|:-|
|Traiig|~43 GiB|
|Iferece|~25 GiB| Base Model:
|Model|Dowload|
|:-|:-|
|Mixtral-8x7B-Istruct-v0.1|[HuggigFace] [HuggigFace-mirror] [ModelScope]| LoRA Model:
|Model|Dowload|
|:-|:-|
|Aurora|[HuggigFace]| The huge model parameters are ot coveiet for you to maage your task, so we provide LoRA weights, which will be merged with the base model before iferece. You do't have to worry about it. Web: The you ca visit: http://127.0.0.1:7860/ CLI: API: If you eed to load weights for specific checkpoits, you ca set them up like this: If you have a sigle GPU ad its GPU memory size is larger tha 48GB, you ca trai your ow models. This work is maily doe by the Faculty of Applied Scieces of the Macao Polytechic Uiversity. The computatioal resources used i this work were obtaied from AWS servers. The fie-tuig framework we used is LLaMA-Factory, which brigs a lot of coveiece to our work. We also thak the public datasets from the ope source commuity, such as shareAI, staford_alpaca ad GPT-4-LLM. Most importatly we are very grateful to Mistral AI, who are leadig a ew techology boom that will dramatically chage the future of techology developmet. If you fid our work helpful, feel free to give us a cite. Please follow the Apache 2.0 Licese.
Aurora: Activatig chiese chat capability for Mistral-8x7B sparse Mixture-of-Experts through Istructio-Tuig
Please follow our Github: https://github.com/WagRogsheg/Aurora
Overview
Evaluatio
Quick-Use
import gradio as gr
import torch
from trasformers import AutoModelForCausalLM, AutoTokeizer, StoppigCriteria, StoppigCriteriaList, TextIteratorStreamer
from threadig import Thread
from peft import PeftModel
import time
model_ame_or_path = "mistralai/Mixtral-8x7B-Istruct-v0.1" # dowload weights from https://huggigface.co/mistralai/Mixtral-8x7B-Istruct-v0.1
lora_weights = "wagrogsheg/Aurora-Mixtral-8x7B" # dowload weights from https://modelscope.c/models/wagrogsheg/Aurora-Mixtral-8x7B
tokeizer = AutoTokeizer.from_pretraied(model_ame_or_path)
model0 = AutoModelForCausalLM.from_pretraied(model_ame_or_path, load_i_4bit=True, device_map="auto", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretraied(
model0,
lora_weights,
)
class StopOTokes(StoppigCriteria):
def __call__(self, iput_ids: torch.LogTesor, scores: torch.FloatTesor, **kwargs) -> bool:
stop_ids = [0,]
for stop_id i stop_ids:
if iput_ids[0][-1] == stop_id:
retur True
retur False
def covert_history_to_text(history):
text = ""
if le(history) > 1:
text = "<s> " + "".joi(
[
"".joi(
[
f"[INST]{item[0]}[/INST] {item[1]} ",
]
)
for item i history[:-1]
]
) + "</s> "
text += "".joi(
[
"".joi(
[
f"[INST]{history[-1][0]}[/INST]",
]
)
]
)
retur text
def predict(message, history):
history_trasformer_format = history + [[message, ""]]
stop = StopOTokes()
messages = covert_history_to_text(history_trasformer_format)
model_iputs = tokeizer([messages], retur_tesors="pt").to("cuda")
streamer = TextIteratorStreamer(tokeizer, timeout=10., skip_prompt=True, skip_special_tokes=True)
geerate_kwargs = dict(
model_iputs,
streamer=streamer,
max_ew_tokes=4096,
do_sample=True,
top_p=0.95,
top_k=1000,
temperature=1.0,
um_beams=1,
pad_toke_id=tokeizer.eos_toke_id,
stoppig_criteria=StoppigCriteriaList([stop])
)
t = Thread(target=model.geerate, kwargs=geerate_kwargs)
t.start()
partial_message = ""
t1 = time.time()
cout = 0
for ew_toke i streamer:
if ew_toke != '<':
partial_message += ew_toke
cout += 1
yield partial_message
t2 = time.time()
speed = cout/(t2-t1)
prit("iferece speed: %f tok/s" % speed)
gr.ChatIterface(predict,chatbot=gr.Chatbot(height=600,),title="MoE").queue().lauch()
Easy-to-Use
1. Cloe ad Set up
https://github.com/WagRogsheg/Aurora.git
cd Aurora
pip istall -r requiremets.txt
2. Dowload Model
3. Iferece
CUDA_VISIBLE_DEVICES=0 pytho src/web_demo.py \
--model_ame_or_path ./Mixtral-8x7B-Istruct-v0.1 \
--checkpoit_dir Aurora \
--fietuig_type lora \
--quatizatio_bit 4 \
--template mistral
CUDA_VISIBLE_DEVICES=0 pytho src/cli_demo.py \
--model_ame_or_path ./Mixtral-8x7B-Istruct-v0.1 \
--checkpoit_dir Aurora \
--fietuig_type lora \
--quatizatio_bit 4 \
--template mistral
CUDA_VISIBLE_DEVICES=0 pytho src/api_demo.py \
--model_ame_or_path ./Mixtral-8x7B-Istruct-v0.1 \
--checkpoit_dir Aurora \
--fietuig_type lora \
--quatizatio_bit 4 \
--template mistral
--checkpoit_dir Aurora/checkpoit-5000
.Trai
Trai your MoE model
CUDA_VISIBLE_DEVICES=5 pytho src/trai_bash.py \
--stage sft \
--model_ame_or_path ./Mixtral-8x7B-Istruct-v0.1 \
--do_trai \
--dataset alpaca_zh,alpaca_gpt4_zh,sharegpt \
--fietuig_type lora \
--quatizatio_bit 4 \
--overwrite_cache \
--output_dir output/ \
--per_device_trai_batch_size 2 \
--gradiet_accumulatio_steps 4 \
--lr_scheduler_type cosie \
--loggig_steps 100 \
--save_steps 1000 \
--learig_rate 5e-5 \
--um_trai_epochs 3.0 \
--plot_loss \
--fp16 \
--template mistral \
--lora_target q_proj,v_proj
--quatizatio_bit 4
meas you will use QLoRA
, If you have a larger GPU memory size you ca remove it ad use LoRA
.Evaluatio your MoE model
CUDA_VISIBLE_DEVICES=0 pytho src/evaluate.py \
--model_ame_or_path ./Mixtral-8x7B-Istruct-v0.1 \
--checkpoit_dir Aurora/checkpoit-5000 \
--fietuig_type lora \
--quatizatio_bit 4 \
--template mistral \
--task cmmlu \ # cmmlu, mmlu, ceval
--split test \
--lag e \ # zh, e
--_shot 5 \
--batch_size 8
Ackowledgmets
Citatio
@misc{wag2023auroraactivatig,
title={Aurora:Activatig Chiese chat capability for Mistral-8x7B sparse Mixture-of-Experts through Istructio-Tuig},
author={Rogsheg Wag ad Haomig Che ad Ruizhe Zhou ad Yaofei Dua ad Kuya Cai ad Ha Ma ad Jiaxi Cui ad Jia Li ad Patrick Cheog-Iao Pag ad Yapeg Wag ad Tao Ta},
year={2023},
eprit={2312.14557},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Licese
点击空白处退出提示
评论