Buildig the Next Geeratio of Ope-Source ad Biligual LLMs
? Huggig Face • ? ModelScope • ✡️ WiseModel
? Joi us ? WeChat (Chiese) !
GPTQ quatized versio of Yi-9B-200K model.
? Table of Cotets
What is Yi?
Itroductio
? The Yi series models are the ext geeratio of ope-source large laguage models traied from scratch by 01.AI.
? Targeted as a biligual laguage model ad traied o 3T multiligual corpus, the Yi series models become oe of the strogest LLM worldwide, showig promise i laguage uderstadig, commosese reasoig, readig comprehesio, ad more. For example,
Yi-34B-Chat model laded i secod place (followig GPT-4 Turbo), outperformig other LLMs (such as GPT-4, Mixtral, Claude) o the AlpacaEval Leaderboard (based o data available up to Jauary 2024).
Yi-34B model raked first amog all existig ope-source models (such as Falco-180B, Llama-70B, Claude) i both Eglish ad Chiese o various bechmarks, icludig Huggig Face Ope LLM Leaderboard (pre-traied) ad C-Eval (based o data available up to November 2023).
? (Credits to Llama) Thaks to the Trasformer ad Llama ope-source commuities, as they reduce the efforts required to build from scratch ad eable the utilizatio of the same tools withi the AI ecosystem.
If you're iterested i Yi's adoptio of Llama architecture ad licese usage policy, see Yi's relatio with Llama. ⬇️
? TL;DR
The Yi series models adopt the same model architecture as Llama but are NOT derivatives of Llama.
Both Yi ad Llama are all based o the Trasformer structure, which has bee the stadard architecture for large laguage models sice 2018.
Grouded i the Trasformer architecture, Llama has become a ew corerstoe for the majority of state-of-the-art ope-source models due to its excellet stability, reliable covergece, ad robust compatibility. This positios Llama as the recogized foudatioal framework for models icludig Yi.
Thaks to the Trasformer ad Llama architectures, other models ca leverage their power, reducig the effort required to build from scratch ad eablig the utilizatio of the same tools withi their ecosystems.
However, the Yi series models are NOT derivatives of Llama, as they do ot use Llama's weights.
As Llama's structure is employed by the majority of ope-source models, the key factors of determiig model performace are traiig datasets, traiig pipelies, ad traiig ifrastructure.
Developig i a uique ad proprietary way, Yi has idepedetly created its ow high-quality traiig datasets, efficiet traiig pipelies, ad robust traiig ifrastructure etirely from the groud up. This effort has led to excellet performace with Yi series models rakig just behid GPT4 ad surpassig Llama o the Alpaca Leaderboard i Dec 2023.
[
Back to top ⬆️ ]
News
? 2024-03-07: The log text capability of the Yi-34B-200K has bee ehaced.
I the "Needle-i-a-Haystack" test, the Yi-34B-200K's performace is improved by 10.5%, risig from 89.3% to a impressive 99.8%. We cotiue to pretrai the model o 5B tokes log-cotext data mixture ad demostrate a ear-all-gree performace.
? 2024-03-06: The Yi-9B
is ope-sourced ad available to the public.
Yi-9B
stads out as the top performer amog a rage of similar-sized ope-source models (icludig Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 ad more), particularly excellig i code, math, commo-sese reasoig, ad readig comprehesio.
? 2024-01-23: The Yi-VL models, <a href="https://huggigface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a>
ad <a href="https://huggigface.co/01-ai/Yi-VL-6B">Yi-VL-6B</a>
, are ope-sourced ad available to the public.
<a href="https://huggigface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a>
has raked first amog all existig ope-source models i the latest bechmarks, icludig MMMU ad CMMMU (based o data available up to Jauary 2024).
? 2023-11-23: Chat models are ope-sourced ad available to the public.
This release cotais two chat models based o previously released base models, two 8-bit models quatized by GPTQ, ad two 4-bit models quatized by AWQ.
Yi-34B-Chat
Yi-34B-Chat-4bits
Yi-34B-Chat-8bits
Yi-6B-Chat
Yi-6B-Chat-4bits
Yi-6B-Chat-8bits
You ca try some of them iteractively at:
? 2023-11-23: The Yi Series Models Commuity Licese Agreemet is updated to v2.1.
? 2023-11-08: Ivited test of Yi-34B chat model.
Applicatio form:
? 2023-11-05: The base models, Yi-6B-200K
ad Yi-34B-200K
, are ope-sourced ad available to the public.
This release cotais two base models with the same parameter sizes as the previous
release, except that the cotext widow is exteded to 200K.
? 2023-11-02: The base models, Yi-6B
ad Yi-34B
, are ope-sourced ad available to the public.
The first public release cotais two biligual (Eglish/Chiese) base models
with the parameter sizes of 6B ad 34B. Both of them are traied with 4K
sequece legth ad ca be exteded to 32K durig iferece time.
[
Back to top ⬆️ ]
Models
Yi models come i multiple sizes ad cater to differet use cases. You ca also fie-tue Yi models to meet your specific requiremets.
If you wat to deploy Yi models, make sure you meet the software ad hardware requiremets.
Chat models
- 4-bit series models are quatized by AWQ.
- 8-bit series models are quatized by GPTQ
- All quatized models have a low barrier to use sice they ca be deployed o cosumer-grade GPUs (e.g., 3090, 4090).
Base models
- 200k is roughly equivalet to 400,000 Chiese characters.
- If you wat to use the previous versio of the Yi-34B-200K (released o Nov 5, 2023), ru git checkout 069cd341d60f4ce4b07ec394e82b79e94f656cf
to dowload the weight.
Model ifo
Model |
Itro |
Default cotext widow |
Pretraied tokes |
Traiig Data Date |
6B series models |
They are suitable for persoal ad academic use. |
4K |
3T |
Up to Jue 2023 |
9B model |
It is the best at codig ad math i the Yi series models. |
4K |
Yi-9B is cotiuously traied based o Yi-6B, usig 0.8T tokes. |
Up to Jue 2023 |
34B series models |
They are suitable for persoal, academic, ad commercial (particularly for small ad medium-sized eterprises) purposes. It's a cost-effective solutio that's affordable ad equipped with emerget ability. |
4K |
3T |
Up to Jue 2023 |
For chat models
For chat model limitatios, see the explaatios below. ⬇️
The released chat model has udergoe exclusive traiig usig Supervised Fie-Tuig (SFT). Compared to other stadard chat models, our model produces more diverse resposes, makig it suitable for various dowstream tasks, such as creative scearios. Furthermore, this diversity is expected to ehace the likelihood of geeratig higher quality resposes, which will be advatageous for subsequet Reiforcemet Learig (RL) traiig.
However, this higher diversity might amplify certai existig issues, icludig:
- Halluciatio: This refers to the model geeratig factually icorrect or osesical iformatio. With the model's resposes beig more varied, there's a higher chace of halluciatio that are ot based o accurate data or logical reasoig.
- No-determiism i re-geeratio: Whe attemptig to regeerate or sample resposes, icosistecies i the outcomes may occur. The icreased diversity ca lead to varyig results eve uder similar iput coditios.
- Cumulative Error: This occurs whe errors i the model's resposes compoud over time. As the model geerates more diverse resposes, the likelihood of small iaccuracies buildig up ito larger errors icreases, especially i complex tasks like exteded reasoig, mathematical problem-solvig, etc.
- To achieve more coheret ad cosistet resposes, it is advisable to adjust geeratio cofiguratio parameters such as temperature, top_p, or top_k. These adjustmets ca help i the balace betwee creativity ad coherece i the model's outputs.
[
Back to top ⬆️ ]
How to use Yi?
Quick start
Gettig up ad ruig with Yi models is simple with multiple choices available.
Choose your path
Select oe of the followig paths to begi your jourey with Yi!

? Deploy Yi locally
If you prefer to deploy Yi models locally,
?♀️ ad you have sufficiet resources (for example, NVIDIA A800 80GB), you ca choose oe of the followig methods:
?♀️ ad you have limited resources (for example, a MacBook Pro), you ca use llama.cpp.
? Not to deploy Yi locally
If you prefer ot to deploy Yi models locally, you ca explore Yi's capabilities usig ay of the followig optios.
?♀️ Ru Yi with APIs
If you wat to explore more features of Yi, you ca adopt oe of these methods:
?♀️ Ru Yi i playgroud
If you wat to chat with Yi with more customizable optios (e.g., system prompt, temperature, repetitio pealty, etc.), you ca try oe of the followig optios:
?♀️ Chat with Yi
If you wat to chat with Yi, you ca use oe of these olie services, which offer a similar user experiece:
Yi-34B-Chat (Yi official o Huggig Face)
No registratio is required.
Yi-34B-Chat (Yi official beta)
Access is available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese).
[
Back to top ⬆️ ]
Quick start - pip
This tutorial guides you through every step of ruig Yi-34B-Chat locally o a A800 (80G) ad the performig iferece.
Step 0: Prerequisites
Step 1: Prepare your eviromet
To set up the eviromet ad istall the required packages, execute the followig commad.
git cloe https://github.com/01-ai/Yi.git
cd yi
pip istall -r requiremets.txt
Step 2: Dowload the Yi model
You ca dowload the weights ad tokeizer of Yi models from the followig sources:
Step 3: Perform iferece
You ca perform iferece with Yi chat or base models as below.
Perform iferece with Yi chat model
Create a file amed quick_start.py
ad copy the followig cotet to it.
from trasformers import AutoModelForCausalLM, AutoTokeizer
model_path = '<your-model-path>'
tokeizer = AutoTokeizer.from_pretraied(model_path, use_fast=False)
# Sice trasformers 4.35.0, the GPT-Q/AWQ model ca be loaded usig AutoModelForCausalLM.
model = AutoModelForCausalLM.from_pretraied(
model_path,
device_map="auto",
torch_dtype='auto'
).eval()
# Prompt cotet: "hi"
messages = [
{"role": "user", "cotet": "hi"}
]
iput_ids = tokeizer.apply_chat_template(coversatio=messages, tokeize=True, add_geeratio_prompt=True, retur_tesors='pt')
output_ids = model.geerate(iput_ids.to('cuda'))
respose = tokeizer.decode(output_ids[0][iput_ids.shape[1]:], skip_special_tokes=True)
# Model respose: "Hello! How ca I assist you today?"
prit(respose)
Ru quick_start.py
.
pytho quick_start.py
The you ca see a output similar to the oe below. ?
Hello! How ca I assist you today?
Perform iferece with Yi base model
pytho demo/text_geeratio.py --model <your-model-path>
The you ca see a output similar to the oe below. ?
Output. ⬇️
Prompt: Let me tell you a iterestig story about cat Tom ad mouse Jerry,
Geeratio: Let me tell you a iterestig story about cat Tom ad mouse Jerry, which happeed i my childhood. My father had a big house with two cats livig iside it to kill mice. Oe day whe I was playig at home aloe, I foud oe of the tomcats lyig o his back ear our kitche door, lookig very much like he wated somethig from us but could’t get up because there were too may people aroud him! He kept tryig for several miutes before fially givig up…
from trasformers import AutoModelForCausalLM, AutoTokeizer
MODEL_DIR = "01-ai/Yi-9B"
model = AutoModelForCausalLM.from_pretraied(MODEL_DIR, torch_dtype="auto")
tokeizer = AutoTokeizer.from_pretraied(MODEL_DIR, use_fast=False)
iput_text = "# write the quick sort algorithm"
iputs = tokeizer(iput_text, retur_tesors="pt").to(model.device)
outputs = model.geerate(**iputs, max_legth=256)
prit(tokeizer.decode(outputs[0], skip_special_tokes=True))
Output
# write the quick sort algorithm
def quick_sort(arr):
if le(arr) <= 1:
retur arr
pivot = arr[le(arr) // 2]
left = [x for x i arr if x < pivot]
middle = [x for x i arr if x == pivot]
right = [x for x i arr if x > pivot]
retur quick_sort(left) + middle + quick_sort(right)
# test the quick sort algorithm
prit(quick_sort([3, 6, 8, 10, 1, 2, 1]))
<p alig="right"> [
<a href="#top">Back to top ⬆️ </a> ]
Quick start - Docker
Ru Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️
This tutorial guides you through every step of ruig Yi-34B-Chat o a A800 GPU or 4*4090 locally ad the performig iferece.
Step 0: Prerequisites
Make sure you've istalled Docker ad vidia-cotaier-toolkit.
Step 1: Start Docker
docker ru -it --gpus all \
-v <your-model-path>: /models
ghcr.io/01-ai/yi:latest
Alteratively, you ca pull the Yi Docker image from registry.ligyiwawu.com/ci/01-ai/yi:latest
.
Step 2: Perform iferece
<p>You ca perform iferece with Yi chat or base models as below.</p>
Perform iferece with Yi chat model
<p>The steps are similar to <a href="#perform-iferece-with-yi-chat-model">pip - Perform iferece with Yi chat model</a>.</p>
<p><strog>Note</strog> that the oly differece is to set model_path = '<your-model-mout-path>'
istead of model_path = '<your-model-path>'
.</p>
Perform iferece with Yi base model
<p>The steps are similar to <a href="#perform-iferece-with-yi-base-model">pip - Perform iferece with Yi base model</a>.</p>
<p><strog>Note</strog> that the oly differece is to set --model <your-model-mout-path>'
istead of model <your-model-path>
.</p>
Quick start - coda-lock
You ca use <a href="https://github.com/coda/coda-lock">coda-lock</a>
to geerate fully reproducible lock files for coda eviromets. ⬇️
You ca refer to coda-lock.yml for the exact versios of the depedecies. Additioally, you ca utilize <a href="https://mamba.readthedocs.io/e/latest/user_guide/micromamba.html">micromamba</a>
for istallig these depedecies.
To istall the depedecies, follow these steps:
Istall micromamba by followig the istructios available here.
Execute micromamba istall -y - yi -f coda-lock.yml
to create a coda eviromet amed yi
ad istall the ecessary depedecies.
Quick start - llama.cpp
Ru Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️
This tutorial guides you through every step of ruig a quatized model (Yi-chat-6B-2bits) locally ad the performig iferece.
Step 0: Prerequisites
Step 1: Dowload llama.cpp
To cloe the llama.cpp
repository, ru the followig commad.
git cloe git@github.com:ggergaov/llama.cpp.git
Step 2: Dowload Yi model
2.1 To cloe XeIaso/yi-chat-6B-GGUF with just poiters, ru the followig commad.
GIT_LFS_SKIP_SMUDGE=1 git cloe https://huggigface.co/XeIaso/yi-chat-6B-GGUF
2.2 To dowload a quatized Yi model (yi-chat-6b.Q2_K.gguf), ru the followig commad.
git-lfs pull --iclude yi-chat-6b.Q2_K.gguf
Step 3: Perform iferece
To perform iferece with the Yi model, you ca use oe of the followig methods.
Method 1: Perform iferece i termial
To compile llama.cpp
usig 4 threads ad the coduct iferece, avigate to the llama.cpp
directory, ad ru the followig commad.
Tips
Replace /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf
with the actual path of your model.
By default, the model operates i completio mode.
For additioal output customizatio optios (for example, system prompt, temperature, repetitio pealty, etc.), ru ./mai -h
to check detailed descriptios ad usage.
make -j4 && ./mai -m /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf -p "How do you feed your pet fox? Please aswer this questio i 6 simple steps:\Step 1:" - 384 -e
...
How do you feed your pet fox? Please aswer this questio i 6 simple steps:
Step 1: Select the appropriate food for your pet fox. You should choose high-quality, balaced prey items that are suitable for their uique dietary eeds. These could iclude live or froze mice, rats, pigeos, or other small mammals, as well as fresh fruits ad vegetables.
Step 2: Feed your pet fox oce or twice a day, depedig o the species ad its idividual prefereces. Always esure that they have access to fresh water throughout the day.
Step 3: Provide a appropriate eviromet for your pet fox. Esure it has a comfortable place to rest, plety of space to move aroud, ad opportuities to play ad exercise.
Step 4: Socialize your pet with other aimals if possible. Iteractios with other creatures ca help them develop social skills ad prevet boredom or stress.
Step 5: Regularly check for sigs of illess or discomfort i your fox. Be prepared to provide veteriary care as eeded, especially for commo issues such as parasites, detal health problems, or ifectios.
Step 6: Educate yourself about the eeds of your pet fox ad be aware of ay potetial risks or cocers that could affect their well-beig. Regularly cosult with a veteriaria to esure you are providig the best care.
...
Now you have successfully asked a questio to the Yi model ad got a aswer! ?
Method 2: Perform iferece i web
To iitialize a lightweight ad swift chatbot, ru the followig commad.
cd llama.cpp
./server --ctx-size 2048 --host 0.0.0.0 ---gpu-layers 64 --model /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf
The you ca get a output like this:
...
llama_ew_cotext_with_model: _ctx = 2048
llama_ew_cotext_with_model: freq_base = 5000000.0
llama_ew_cotext_with_model: freq_scale = 1
ggml_metal_iit: allocatig
ggml_metal_iit: foud device: Apple M2 Pro
ggml_metal_iit: pickig default device: Apple M2 Pro
ggml_metal_iit: ggml.metallib ot foud, loadig from source
ggml_metal_iit: GGML_METAL_PATH_RESOURCES = il
ggml_metal_iit: loadig '/Users/yu/llama.cpp/ggml-metal.metal'
ggml_metal_iit: GPU ame: Apple M2 Pro
ggml_metal_iit: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_iit: hasUifiedMemory = true
ggml_metal_iit: recommededMaxWorkigSetSize = 11453.25 MB
ggml_metal_iit: maxTrasferRate = built-i GPU
ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size = 128.00 MiB, ( 2629.44 / 10922.67)
llama_ew_cotext_with_model: KV self size = 128.00 MiB, K (f16): 64.00 MiB, V (f16): 64.00 MiB
ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 2629.45 / 10922.67)
llama_build_graph: o-view tesors processed: 676/676
llama_ew_cotext_with_model: compute buffer total size = 159.19 MiB
ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size = 156.02 MiB, ( 2785.45 / 10922.67)
Available slots:
-> Slot 0 - max cotext: 2048
llama server listeig at http://0.0.0.0:8080
To access the chatbot iterface, ope your web browser ad eter http://0.0.0.0:8080
ito the address bar.

Eter a questio, such as "How do you feed your pet fox? Please aswer this questio i 6 simple steps" ito the prompt widow, ad you will receive a correspodig aswer.

[
Back to top ⬆️ ]
Web demo
You ca build a web UI demo for Yi chat models (ote that Yi base models are ot supported i this seario).
Step 1: Prepare your eviromet.
Step 2: Dowload the Yi model.
Step 3. To start a web service locally, ru the followig commad.
pytho demo/web_demo.py -c <your-model-path>
You ca access the web UI by eterig the address provided i the cosole ito your browser.

[
Back to top ⬆️ ]
Fie-tuig
bash fietue/scripts/ru_sft_Yi_6b.sh
Oce fiished, you ca compare the fietued model ad the base model with the followig commad:
bash fietue/scripts/ru_eval.sh
For advaced usage (like fie-tuig based o your custom data), see the explaatios below. ⬇️
Fietue code for Yi 6B ad 34B
Preparatio
From Image
By default, we use a small dataset from BAAI/COIG to fietue the base model.
You ca also prepare your customized dataset i the followig jsol
format:
{ "prompt": "Huma: Who are you? Assistat:", "chose": "I'm Yi." }
Ad the mout them i the cotaier to replace the default oes:
docker ru -it \
-v /path/to/save/fietued/model/:/fietued-model \
-v /path/to/trai.jsol:/yi/fietue/data/trai.jso \
-v /path/to/eval.jsol:/yi/fietue/data/eval.jso \
ghcr.io/01-ai/yi:latest \
bash fietue/scripts/ru_sft_Yi_6b.sh
From Local Server
Make sure you have coda. If ot, use
mkdir -p ~/miicoda3
wget https://repo.aacoda.com/miicoda/Miicoda3-latest-Liux-x86_64.sh -O ~/miicoda3/miicoda.sh
bash ~/miicoda3/miicoda.sh -b -u -p ~/miicoda3
rm -rf ~/miicoda3/miicoda.sh
~/miicoda3/bi/coda iit bash
source ~/.bashrc
The, create a coda ev:
coda create - dev_ev pytho=3.10 -y
coda activate dev_ev
pip istall torch==2.0.1 deepspeed==0.10 tesorboard trasformers datasets setecepiece accelerate ray==2.7
Hardware Setup
For the Yi-6B model, a ode with 4 GPUs, each has GPU mem larger tha 60GB is recommeded.
For the Yi-34B model, because the usage of zero-offload techique takes a lot CPU memory, please be careful to limit the GPU umbers i 34B fietue traiig. Please use CUDAVISIBLEDEVICES to limit the GPU umber (as show i scripts/rusftYi_34b.sh).
A typical hardware setup for fietuig 34B model is a ode with 8GPUS (limit to 4 i ruig by CUDAVISIBLEDEVICES=0,1,2,3), each has GPU mem larger tha 80GB, with total CPU mem larger tha 900GB.
Quick Start
Dowload a LLM-base model to MODEL_PATH (6B ad 34B). A typical folder of models is like:
|-- $MODEL_PATH
| |-- cofig.jso
| |-- pytorch_model-00001-of-00002.bi
| |-- pytorch_model-00002-of-00002.bi
| |-- pytorch_model.bi.idex.jso
| |-- tokeizer_cofig.jso
| |-- tokeizer.model
| |-- ...
Dowload a dataset from huggigface to local storage DATA_PATH, e.g. Dahoas/rm-static.
|-- $DATA_PATH
| |-- data
| | |-- trai-00000-of-00001-2a1df75c6bce91ab.parquet
| | |-- test-00000-of-00001-8c7c51afc6d45980.parquet
| |-- dataset_ifos.jso
| |-- README.md
fietue/yi_example_dataset
has example datasets, which are modified from BAAI/COIG
|-- $DATA_PATH
|--data
|-- trai.jsol
|-- eval.jsol
cd
ito the scripts folder, copy ad paste the script, ad ru. For example:
cd fietue/scripts
bash ru_sft_Yi_6b.sh
For the Yi-6B base model, settig traiigdebugsteps=20 ad umtraiepochs=4 ca output a chat model, which takes about 20 miutes.
For the Yi-34B base model, it takes a relatively log time for iitializatio. Please be patiet.
Evaluatio
cd fietue/scripts
bash ru_eval.sh
The you'll see the aswer from both the base model ad the fietued model.
[
Back to top ⬆️ ]
Quatizatio
GPT-Q
pytho quatizatio/gptq/quat_autogptq.py \
--model /base_model \
--output_dir /quatized_model \
--trust_remote_code
Oce fiished, you ca the evaluate the resultig model as follows:
pytho quatizatio/gptq/eval_quatized_model.py \
--model /quatized_model \
--trust_remote_code
For a more detailed explaatio, see the explaatios below. ⬇️
GPT-Q quatizatio
GPT-Q is a PTQ(Post-Traiig Quatizatio)
method. It's memory savig ad provides potetial speedups while retaiig the accuracy
of the model.
Yi models ca be GPT-Q quatized without a lot of efforts.
We provide a step-by-step tutorial below.
To ru GPT-Q, we will use AutoGPTQ ad
exllama.
Ad the huggigface trasformers has itegrated optimum ad auto-gptq to perform
GPTQ quatizatio o laguage models.
Do Quatizatio
The quat_autogptq.py
script is provided for you to perform GPT-Q quatizatio:
pytho quat_autogptq.py --model /base_model \
--output_dir /quatized_model --bits 4 --group_size 128 --trust_remote_code
Ru Quatized Model
You ca ru a quatized model usig the eval_quatized_model.py
:
pytho eval_quatized_model.py --model /quatized_model --trust_remote_code
AWQ
pytho quatizatio/awq/quat_autoawq.py \
--model /base_model \
--output_dir /quatized_model \
--trust_remote_code
Oce fiished, you ca the evaluate the resultig model as follows:
pytho quatizatio/awq/eval_quatized_model.py \
--model /quatized_model \
--trust_remote_code
For detailed explaatios, see the explaatios below. ⬇️
AWQ quatizatio
AWQ is a PTQ(Post-Traiig Quatizatio)
method. It's a efficiet ad accurate low-bit weight quatizatio (INT3/4) for LLMs.
Yi models ca be AWQ quatized without a lot of efforts.
We provide a step-by-step tutorial below.
To ru AWQ, we will use AutoAWQ.
Do Quatizatio
The quat_autoawq.py
script is provided for you to perform AWQ quatizatio:
pytho quat_autoawq.py --model /base_model \
--output_dir /quatized_model --bits 4 --group_size 128 --trust_remote_code
Ru Quatized Model
You ca ru a quatized model usig the eval_quatized_model.py
:
pytho eval_quatized_model.py --model /quatized_model --trust_remote_code
[
Back to top ⬆️ ]
Deploymet
If you wat to deploy Yi models, make sure you meet the software ad hardware requiremets.
Software requiremets
Before usig Yi quatized models, make sure you've istalled the correct software listed below.
Hardware requiremets
Before deployig Yi i your eviromet, make sure your hardware meets the followig requiremets.
Chat models
Model |
Miimum VRAM |
Recommeded GPU Example |
Yi-6B-Chat |
15 GB |
1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB) |
Yi-6B-Chat-4bits |
4 GB |
1 x RTX 3060 (12 GB) 1 x RTX 4060 (8 GB) |
Yi-6B-Chat-8bits |
8 GB |
1 x RTX 3070 (8 GB) 1 x RTX 4060 (8 GB) |
Yi-34B-Chat |
72 GB |
4 x RTX 4090 (24 GB) 1 x A800 (80GB) |
Yi-34B-Chat-4bits |
20 GB |
1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB) 1 x A100 (40 GB) |
Yi-34B-Chat-8bits |
38 GB |
2 x RTX 3090 (24 GB) 2 x RTX 4090 (24 GB) 1 x A800 (40 GB) |
Below are detailed miimum VRAM requiremets uder differet batch use cases.
Model |
batch=1 |
batch=4 |
batch=16 |
batch=32 |
Yi-6B-Chat |
12 GB |
13 GB |
15 GB |
18 GB |
Yi-6B-Chat-4bits |
4 GB |
5 GB |
7 GB |
10 GB |
Yi-6B-Chat-8bits |
7 GB |
8 GB |
10 GB |
14 GB |
Yi-34B-Chat |
65 GB |
68 GB |
76 GB |
> 80 GB |
Yi-34B-Chat-4bits |
19 GB |
20 GB |
30 GB |
40 GB |
Yi-34B-Chat-8bits |
35 GB |
37 GB |
46 GB |
58 GB |
Base models
Model |
Miimum VRAM |
Recommeded GPU Example |
Yi-6B |
15 GB |
1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB) |
Yi-6B-200K |
50 GB |
1 x A800 (80 GB) |
Yi-9B |
20 GB |
1 x RTX 4090 (24 GB) |
Yi-34B |
72 GB |
4 x RTX 4090 (24 GB) 1 x A800 (80 GB) |
Yi-34B-200K |
200 GB |
4 x A800 (80 GB) |
[
Back to top ⬆️ ]
Learig hub
If you wat to lear Yi, you ca fid a wealth of helpful educatioal resources here. ⬇️
Welcome to the Yi learig hub!
Whether you're a seasoed developer or a ewcomer, you ca fid a wealth of helpful educatioal resources to ehace your uderstadig ad skills with Yi models, icludig isightful blog posts, comprehesive video tutorials, hads-o guides, ad more.
The cotet you fid here has bee geerously cotributed by kowledgeable Yi experts ad passioate ethusiasts. We exted our heartfelt gratitude for your ivaluable cotributios!
At the same time, we also warmly ivite you to joi our collaborative effort by cotributig to Yi. If you have already made cotributios to Yi, please do't hesitate to showcase your remarkable work i the table below.
With all these resources at your figertips, you're ready to start your excitig jourey with Yi. Happy learig! ?
Tutorials
Eglish tutorials
Chiese tutorials
Why Yi?
Ecosystem
Yi has a comprehesive ecosystem, offerig a rage of tools, services, ad models to erich your experieces ad maximize productivity.
Upstream
The Yi series models follow the same model architecture as Llama. By choosig Yi, you ca leverage existig tools, libraries, ad resources withi the Llama ecosystem, elimiatig the eed to create ew tools ad ehacig developmet efficiecy.
For example, the Yi series models are saved i the format of the Llama model. You ca directly use LlamaForCausalLM
ad LlamaTokeizer
to load the model. For more iformatio, see Use the chat model.
from trasformers import AutoModelForCausalLM, AutoTokeizer
tokeizer = AutoTokeizer.from_pretraied("01-ai/Yi-34b", use_fast=False)
model = AutoModelForCausalLM.from_pretraied("01-ai/Yi-34b", device_map="auto")
[
Back to top ⬆️ ]
Dowstream
? Tip
Feel free to create a PR ad share the fatastic work you've built usig the Yi series models.
To help others quickly uderstad your work, it is recommeded to use the format of <model-ame>: <model-itro> + <model-highlights>
.
Servig
If you wat to get up with Yi i a few miutes, you ca use the followig services built upo Yi.
Yi-34B-Chat: you ca chat with Yi usig oe of the followig platforms:
Yi-34B-Chat | Huggig Face
Yi-34B-Chat | Yi Platform: Note that curretly it's available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese) ad experiece it firsthad!
Yi-6B-Chat (Replicate): you ca use this model with more optios by settig additioal parameters ad callig APIs.
ScaleLLM: you ca use this service to ru Yi models locally with added flexibility ad customizatio.
Quatizatio
If you have limited computatioal capabilities, you ca use Yi's quatized models as follows.
These quatized models have reduced precisio but offer icreased efficiecy, such as faster iferece speed ad smaller RAM usage.
Fie-tuig
If you're seekig to explore the diverse capabilities withi Yi's thrivig family, you ca delve ito Yi's fie-tued models as below.
API
- amazig-opeai-api: this tool coverts Yi model APIs ito the OpeAI API format out of the box.
- LlamaEdge: this tool builds a OpeAI-compatible API server for Yi-34B-Chat usig a portable Wasm (WebAssembly) file, powered by Rust.
[
Back to top ⬆️ ]
Bechmarks
Chat model performace
Yi-34B-Chat model demostrates exceptioal performace, rakig first amog all existig ope-source models i the bechmarks icludig MMLU, CMMLU, BBH, GSM8k, ad more.
Evaluatio methods ad challeges. ⬇️
- Evaluatio methods: we evaluated various bechmarks usig both zero-shot ad few-shot methods, except for TruthfulQA.
- Zero-shot vs. few-shot: i chat models, the zero-shot approach is more commoly employed.
- Evaluatio strategy: our evaluatio strategy ivolves geeratig resposes while followig istructios explicitly or implicitly (such as usig few-shot examples). We the isolate relevat aswers from the geerated text.
- Challeges faced: some models are ot well-suited to produce output i the specific format required by istructios i few datasets, which leads to suboptimal results.
*: C-Eval results are evaluated o the validatio datasets
Base model performace
Yi-34B ad Yi-34B-200K
The Yi-34B ad Yi-34B-200K models stad out as the top performers amog ope-source models, especially excellig i MMLU, CMMLU, commo-sese reasoig, readig comprehesio, ad more.

Evaluatio methods. ⬇️
- Disparity i results: while bechmarkig ope-source models, a disparity has bee oted betwee results from our pipelie ad those reported by public sources like OpeCompass.
- Ivestigatio fidigs: a deeper ivestigatio reveals that variatios i prompts, post-processig strategies, ad samplig techiques across models may lead to sigificat outcome differeces.
- Uiform bechmarkig process: our methodology aligs with the origial bechmarks—cosistet prompts ad post-processig strategies are used, ad greedy decodig is applied durig evaluatios without ay post-processig for the geerated cotet.
- Efforts to retrieve ureported scores: for scores that were ot reported by the origial authors (icludig scores reported with differet settigs), we try to get results with our pipelie.
- Extesive model evaluatio: to evaluate the model’s capability extesively, we adopted the methodology outlied i Llama2. Specifically, we icluded PIQA, SIQA, HellaSwag, WioGrade, ARC, OBQA, ad CSQA to assess commo sese reasoig. SquAD, QuAC, ad BoolQ were icorporated to evaluate readig comprehesio.
- Special cofiguratios: CSQA was exclusively tested usig a 7-shot setup, while all other tests were coducted with a 0-shot cofiguratio. Additioally, we itroduced GSM8K (8-shot@1), MATH (4-shot@1), HumaEval (0-shot@1), ad MBPP (3-shot@1) uder the category "Math & Code".
- Falco-180B caveat: Falco-180B was ot tested o QuAC ad OBQA due to techical costraits. Its performace score is a average from other tasks, ad cosiderig the geerally lower scores of these two tasks, Falco-180B's capabilities are likely ot uderestimated.
Yi-9B
Yi-9B is almost the best amog a rage of similar-sized ope-source models (icludig Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 ad more), particularly excellig i code, math, commo-sese reasoig, ad readig comprehesio.

- I terms of overall ability (Mea-All), Yi-9B performs the best amog similarly sized ope-source models, surpassig DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, ad Gemma-7B.

- I terms of codig ability (Mea-Code), Yi-9B's performace is secod oly to DeepSeek-Coder-7B, surpassig Yi-34B, SOLAR-10.7B, Mistral-7B, ad Gemma-7B.

- I terms of math ability (Mea-Math), Yi-9B's performace is secod oly to DeepSeek-Math-7B, surpassig SOLAR-10.7B, Mistral-7B, ad Gemma-7B.

- I terms of commo sese ad reasoig ability (Mea-Text), Yi-9B's performace is o par with Mistral-7B, SOLAR-10.7B, ad Gemma-7B.

[
Back to top ⬆️ ]
Who ca use Yi?
Everyoe! ? ✅
[
Back to top ⬆️ ]
Misc.
Ackowledgmets
A heartfelt thak you to each of you who have made cotributios to the Yi commuity! You have helped Yi ot just a project, but a vibrat, growig home for iovatio.
[
Back to top ⬆️ ]
Disclaimer
We use data compliace checkig algorithms durig the traiig process, to
esure the compliace of the traied model to the best of our ability. Due to
complex data ad the diversity of laguage model usage scearios, we caot
guaratee that the model will geerate correct, ad reasoable output i all
scearios. Please be aware that there is still a risk of the model producig
problematic outputs. We will ot be resposible for ay risks ad issues
resultig from misuse, misguidace, illegal usage, ad related misiformatio,
as well as ay associated data security cocers.
[
Back to top ⬆️ ]
Licese
The source code i this repo is licesed uder the Apache 2.0
licese. The Yi series models are fully ope for academic research ad free for commercial use, with automatic permissio grated upo applicatio. All usage must adhere to the Yi Series Models Commuity Licese Agreemet 2.1.
For free commercial use, you oly eed to sed a email to get official commercial permissio.
[
Back to top ⬆️ ]
评论