Buildig the Next Geeratio of Ope-Source ad Biligual LLMs

? Huggig Face • ? ModelScope • ✡️ WiseModel

? Joi us ? WeChat (Chiese) !

GPTQ quatized versio of Yi-9B-200K model.

? Table of Cotets

What is Yi?
Itroductio
News
Models
How to use Yi?
Quick start
Why Yi?
Ecosystem
- Upstream
- Dowstream
- Servig
- Quatizatio
- Fie-tuig
- API
Bechmarks
Who ca use Yi?
Misc.

What is Yi?

Itroductio

? The Yi series models are the ext geeratio of ope-source large laguage models traied from scratch by 01.AI.
? Targeted as a biligual laguage model ad traied o 3T multiligual corpus, the Yi series models become oe of the strogest LLM worldwide, showig promise i laguage uderstadig, commosese reasoig, readig comprehesio, ad more. For example,
Yi-34B-Chat model laded i secod place (followig GPT-4 Turbo), outperformig other LLMs (such as GPT-4, Mixtral, Claude) o the AlpacaEval Leaderboard (based o data available up to Jauary 2024).
Yi-34B model raked first amog all existig ope-source models (such as Falco-180B, Llama-70B, Claude) i both Eglish ad Chiese o various bechmarks, icludig Huggig Face Ope LLM Leaderboard (pre-traied) ad C-Eval (based o data available up to November 2023).
? (Credits to Llama) Thaks to the Trasformer ad Llama ope-source commuities, as they reduce the efforts required to build from scratch ad eable the utilizatio of the same tools withi the AI ecosystem.
If you're iterested i Yi's adoptio of Llama architecture ad licese usage policy, see Yi's relatio with Llama. ⬇️
? TL;DR

The Yi series models adopt the same model architecture as Llama but are NOT derivatives of Llama.
- Both Yi ad Llama are all based o the Trasformer structure, which has bee the stadard architecture for large laguage models sice 2018.
- Grouded i the Trasformer architecture, Llama has become a ew corerstoe for the majority of state-of-the-art ope-source models due to its excellet stability, reliable covergece, ad robust compatibility. This positios Llama as the recogized foudatioal framework for models icludig Yi.
- Thaks to the Trasformer ad Llama architectures, other models ca leverage their power, reducig the effort required to build from scratch ad eablig the utilizatio of the same tools withi their ecosystems.
- However, the Yi series models are NOT derivatives of Llama, as they do ot use Llama's weights.
- As Llama's structure is employed by the majority of ope-source models, the key factors of determiig model performace are traiig datasets, traiig pipelies, ad traiig ifrastructure.
- Developig i a uique ad proprietary way, Yi has idepedetly created its ow high-quality traiig datasets, efficiet traiig pipelies, ad robust traiig ifrastructure etirely from the groud up. This effort has led to excellet performace with Yi series models rakig just behid GPT4 ad surpassig Llama o the Alpaca Leaderboard i Dec 2023.

[ Back to top ⬆️ ]

News

? 2024-03-07: The log text capability of the Yi-34B-200K has bee ehaced.

I the "Needle-i-a-Haystack" test, the Yi-34B-200K's performace is improved by 10.5%, risig from 89.3% to a impressive 99.8%. We cotiue to pretrai the model o 5B tokes log-cotext data mixture ad demostrate a ear-all-gree performace.

? 2024-03-06: The Yi-9B is ope-sourced ad available to the public.

Yi-9B stads out as the top performer amog a rage of similar-sized ope-source models (icludig Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 ad more), particularly excellig i code, math, commo-sese reasoig, ad readig comprehesio.

? 2024-01-23: The Yi-VL models, <a href="https://huggigface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a> ad <a href="https://huggigface.co/01-ai/Yi-VL-6B">Yi-VL-6B</a>, are ope-sourced ad available to the public.

<a href="https://huggigface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a> has raked first amog all existig ope-source models i the latest bechmarks, icludig MMMU ad CMMMU (based o data available up to Jauary 2024).

? 2023-11-23: Chat models are ope-sourced ad available to the public.

This release cotais two chat models based o previously released base models, two 8-bit models quatized by GPTQ, ad two 4-bit models quatized by AWQ.

Yi-34B-Chat
Yi-34B-Chat-4bits
Yi-34B-Chat-8bits
Yi-6B-Chat
Yi-6B-Chat-4bits
Yi-6B-Chat-8bits

You ca try some of them iteractively at:

Huggig Face
Replicate

? 2023-11-23: The Yi Series Models Commuity Licese Agreemet is updated to v2.1.

? 2023-11-08: Ivited test of Yi-34B chat model.

Applicatio form:

Eglish
Chiese

? 2023-11-05: The base models, Yi-6B-200K ad Yi-34B-200K, are ope-sourced ad available to the public.

This release cotais two base models with the same parameter sizes as the previous release, except that the cotext widow is exteded to 200K.

? 2023-11-02: The base models, Yi-6B ad Yi-34B, are ope-sourced ad available to the public.

The first public release cotais two biligual (Eglish/Chiese) base models with the parameter sizes of 6B ad 34B. Both of them are traied with 4K sequece legth ad ca be exteded to 32K durig iferece time.

[ Back to top ⬆️ ]

Models

Yi models come i multiple sizes ad cater to differet use cases. You ca also fie-tue Yi models to meet your specific requiremets.

If you wat to deploy Yi models, make sure you meet the software ad hardware requiremets.

Chat models

Model	Dowload
Yi-34B-Chat	• ? Huggig Face • ? ModelScope
Yi-34B-Chat-4bits	• ? Huggig Face • ? ModelScope
Yi-34B-Chat-8bits	• ? Huggig Face • ? ModelScope
Yi-6B-Chat	• ? Huggig Face • ? ModelScope
Yi-6B-Chat-4bits	• ? Huggig Face • ? ModelScope
Yi-6B-Chat-8bits	• ? Huggig Face • ? ModelScope

_{^{- 4-bit series models are quatized by AWQ.
- 8-bit series models are quatized by GPTQ
- All quatized models have a low barrier to use sice they ca be deployed o cosumer-grade GPUs (e.g., 3090, 4090).}}

Base models

Model	Dowload
Yi-34B	• ? Huggig Face • ? ModelScope
Yi-34B-200K	• ? Huggig Face • ? ModelScope
Yi-9B	• ? Huggig Face
Yi-6B	• ? Huggig Face • ? ModelScope
Yi-6B-200K	• ? Huggig Face • ? ModelScope

_{^{- 200k is roughly equivalet to 400,000 Chiese characters.
- If you wat to use the previous versio of the Yi-34B-200K (released o Nov 5, 2023), ru git checkout 069cd341d60f4ce4b07ec394e82b79e94f656cf to dowload the weight.}}

Model ifo

For chat ad base models

Model	Itro	Default cotext widow	Pretraied tokes	Traiig Data Date
6B series models	They are suitable for persoal ad academic use.	4K	3T	Up to Jue 2023
9B model	It is the best at codig ad math i the Yi series models.	4K	Yi-9B is cotiuously traied based o Yi-6B, usig 0.8T tokes.	Up to Jue 2023
34B series models	They are suitable for persoal, academic, ad commercial (particularly for small ad medium-sized eterprises) purposes. It's a cost-effective solutio that's affordable ad equipped with emerget ability.	4K	3T	Up to Jue 2023

For chat models

For chat model limitatios, see the explaatios below. ⬇️

Halluciatio: This refers to the model geeratig factually icorrect or osesical iformatio. With the model's resposes beig more varied, there's a higher chace of halluciatio that are ot based o accurate data or logical reasoig.
No-determiism i re-geeratio: Whe attemptig to regeerate or sample resposes, icosistecies i the outcomes may occur. The icreased diversity ca lead to varyig results eve uder similar iput coditios.
Cumulative Error: This occurs whe errors i the model's resposes compoud over time. As the model geerates more diverse resposes, the likelihood of small iaccuracies buildig up ito larger errors icreases, especially i complex tasks like exteded reasoig, mathematical problem-solvig, etc.
To achieve more coheret ad cosistet resposes, it is advisable to adjust geeratio cofiguratio parameters such as temperature, top_p, or top_k. These adjustmets ca help i the balace betwee creativity ad coherece i the model's outputs.

[ Back to top ⬆️ ]

How to use Yi?

Quick start
Choose your path
pip
docker
coda-lock
llama.cpp
Web demo
Fie-tuig
Quatizatio
Deploymet
Learig hub

Quick start

Gettig up ad ruig with Yi models is simple with multiple choices available.

Choose your path

Select oe of the followig paths to begi your jourey with Yi!

Quick start - Choose your path

? Deploy Yi locally

If you prefer to deploy Yi models locally,

?‍♀️ ad you have sufficiet resources (for example, NVIDIA A800 80GB), you ca choose oe of the followig methods:
- pip
- Docker
- coda-lock
?‍♀️ ad you have limited resources (for example, a MacBook Pro), you ca use llama.cpp.

? Not to deploy Yi locally

If you prefer ot to deploy Yi models locally, you ca explore Yi's capabilities usig ay of the followig optios.

?‍♀️ Ru Yi with APIs

If you wat to explore more features of Yi, you ca adopt oe of these methods:

Yi APIs (Yi official)
Early access has bee grated to some applicats. Stay tued for the ext roud of access!
Yi APIs (Replicate)

?‍♀️ Ru Yi i playgroud

If you wat to chat with Yi with more customizable optios (e.g., system prompt, temperature, repetitio pealty, etc.), you ca try oe of the followig optios:

Yi-34B-Chat-Playgroud (Yi official)
- Access is available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese).
Yi-34B-Chat-Playgroud (Replicate)

?‍♀️ Chat with Yi

If you wat to chat with Yi, you ca use oe of these olie services, which offer a similar user experiece:

Yi-34B-Chat (Yi official o Huggig Face)
No registratio is required.
Yi-34B-Chat (Yi official beta)
Access is available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese).

[ Back to top ⬆️ ]

Quick start - pip

This tutorial guides you through every step of ruig Yi-34B-Chat locally o a A800 (80G) ad the performig iferece.

Step 0: Prerequisites

Make sure Pytho 3.10 or a later versio is istalled.
If you wat to ru other Yi models, see software ad hardware requiremets.

Step 1: Prepare your eviromet

To set up the eviromet ad istall the required packages, execute the followig commad.

git cloe https://github.com/01-ai/Yi.git
cd yi
pip istall -r requiremets.txt

Step 2: Dowload the Yi model

You ca dowload the weights ad tokeizer of Yi models from the followig sources:

Huggig Face
ModelScope
WiseModel

Step 3: Perform iferece

You ca perform iferece with Yi chat or base models as below.

Perform iferece with Yi chat model

Create a file amed quick_start.py ad copy the followig cotet to it.

from trasformers import AutoModelForCausalLM, AutoTokeizer

model_path = '<your-model-path>'

tokeizer = AutoTokeizer.from_pretraied(model_path, use_fast=False)

# Sice trasformers 4.35.0, the GPT-Q/AWQ model ca be loaded usig AutoModelForCausalLM.
model = AutoModelForCausalLM.from_pretraied(
    model_path,
    device_map="auto",
    torch_dtype='auto'
).eval()

# Prompt cotet: "hi"
messages = [
    {"role": "user", "cotet": "hi"}
]

iput_ids = tokeizer.apply_chat_template(coversatio=messages, tokeize=True, add_geeratio_prompt=True, retur_tesors='pt')
output_ids = model.geerate(iput_ids.to('cuda'))
respose = tokeizer.decode(output_ids[0][iput_ids.shape[1]:], skip_special_tokes=True)

# Model respose: "Hello! How ca I assist you today?"
prit(respose)

Ru quick_start.py.
```
pytho quick_start.py
```
The you ca see a output similar to the oe below. ?
```
Hello! How ca I assist you today?
```

Perform iferece with Yi base model

Yi-34B

The steps are similar to pip - Perform iferece with Yi chat model.

You ca use the existig file text_geeratio.py.

  pytho demo/text_geeratio.py  --model <your-model-path>

The you ca see a output similar to the oe below. ?

Output. ⬇️

Prompt: Let me tell you a iterestig story about cat Tom ad mouse Jerry,

Geeratio: Let me tell you a iterestig story about cat Tom ad mouse Jerry, which happeed i my childhood. My father had a big house with two cats livig iside it to kill mice. Oe day whe I was playig at home aloe, I foud oe of the tomcats lyig o his back ear our kitche door, lookig very much like he wated somethig from us but could’t get up because there were too may people aroud him! He kept tryig for several miutes before fially givig up…

Yi-9B

Iput

  from trasformers import AutoModelForCausalLM, AutoTokeizer

  MODEL_DIR = "01-ai/Yi-9B"
  model = AutoModelForCausalLM.from_pretraied(MODEL_DIR, torch_dtype="auto")
  tokeizer = AutoTokeizer.from_pretraied(MODEL_DIR, use_fast=False)

  iput_text = "# write the quick sort algorithm"
  iputs = tokeizer(iput_text, retur_tesors="pt").to(model.device)
  outputs = model.geerate(**iputs, max_legth=256)
  prit(tokeizer.decode(outputs[0], skip_special_tokes=True))

Output

  # write the quick sort algorithm
  def quick_sort(arr):
      if le(arr) <= 1:
          retur arr
      pivot = arr[le(arr) // 2]
      left = [x for x i arr if x < pivot]
      middle = [x for x i arr if x == pivot]
      right = [x for x i arr if x > pivot]
      retur quick_sort(left) + middle + quick_sort(right)

  # test the quick sort algorithm
  prit(quick_sort([3, 6, 8, 10, 1, 2, 1]))

<p alig="right"> [
<a href="#top">Back to top ⬆️ </a>  ]

Quick start - Docker

Ru Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️

This tutorial guides you through every step of ruig Yi-34B-Chat o a A800 GPU or 4*4090 locally ad the performig iferece.

Step 0: Prerequisites

Make sure you've istalled Docker ad vidia-cotaier-toolkit.

Step 1: Start Docker

docker ru -it --gpus all \
-v &lt;your-model-path&gt;: /models
ghcr.io/01-ai/yi:latest

Alteratively, you ca pull the Yi Docker image from registry.ligyiwawu.com/ci/01-ai/yi:latest.

Step 2: Perform iferece

<p>You ca perform iferece with Yi chat or base models as below.</p>

Perform iferece with Yi chat model

<p>The steps are similar to <a href="#perform-iferece-with-yi-chat-model">pip - Perform iferece with Yi chat model</a>.</p>
<p><strog>Note</strog> that the oly differece is to set model_path = '&lt;your-model-mout-path&gt;' istead of model_path = '&lt;your-model-path&gt;'.</p>

Perform iferece with Yi base model

<p>The steps are similar to <a href="#perform-iferece-with-yi-base-model">pip - Perform iferece with Yi base model</a>.</p>
<p><strog>Note</strog> that the oly differece is to set --model &lt;your-model-mout-path&gt;' istead of model &lt;your-model-path&gt;.</p>

Quick start - coda-lock

You ca use <a href="https://github.com/coda/coda-lock">coda-lock</a> to geerate fully reproducible lock files for coda eviromets. ⬇️

You ca refer to coda-lock.yml for the exact versios of the depedecies. Additioally, you ca utilize <a href="https://mamba.readthedocs.io/e/latest/user_guide/micromamba.html">micromamba</a> for istallig these depedecies.
To istall the depedecies, follow these steps:

Istall micromamba by followig the istructios available here.
Execute micromamba istall -y - yi -f coda-lock.yml to create a coda eviromet amed yi ad istall the ecessary depedecies.

Quick start - llama.cpp

Ru Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️

This tutorial guides you through every step of ruig a quatized model (Yi-chat-6B-2bits) locally ad the performig iferece.

Step 0: Prerequisites
Step 1: Dowload llama.cpp
Step 2: Dowload Yi model
Step 3: Perform iferece

Step 0: Prerequisites

This tutorial assumes you use a MacBook Pro with 16GB of memory ad a Apple M2 Pro chip.
Make sure git-lfs is istalled o your machie.

Step 1: Dowload `llama.cpp`

To cloe the llama.cpp repository, ru the followig commad.

git cloe git@github.com:ggergaov/llama.cpp.git

Step 2: Dowload Yi model

2.1 To cloe XeIaso/yi-chat-6B-GGUF with just poiters, ru the followig commad.

GIT_LFS_SKIP_SMUDGE=1 git cloe https://huggigface.co/XeIaso/yi-chat-6B-GGUF

2.2 To dowload a quatized Yi model (yi-chat-6b.Q2_K.gguf), ru the followig commad.

git-lfs pull --iclude yi-chat-6b.Q2_K.gguf

Step 3: Perform iferece

To perform iferece with the Yi model, you ca use oe of the followig methods.

Method 1: Perform iferece i termial
Method 2: Perform iferece i web

Method 1: Perform iferece i termial

To compile llama.cpp usig 4 threads ad the coduct iferece, avigate to the llama.cpp directory, ad ru the followig commad.

Tips

Replace /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf with the actual path of your model.

By default, the model operates i completio mode.

For additioal output customizatio optios (for example, system prompt, temperature, repetitio pealty, etc.), ru ./mai -h to check detailed descriptios ad usage.

make -j4 && ./mai -m /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf -p "How do you feed your pet fox? Please aswer this questio i 6 simple steps:\Step 1:" - 384 -e

...

How do you feed your pet fox? Please aswer this questio i 6 simple steps:

Step 1: Select the appropriate food for your pet fox. You should choose high-quality, balaced prey items that are suitable for their uique dietary eeds. These could iclude live or froze mice, rats, pigeos, or other small mammals, as well as fresh fruits ad vegetables.

Step 2: Feed your pet fox oce or twice a day, depedig o the species ad its idividual prefereces. Always esure that they have access to fresh water throughout the day.

Step 3: Provide a appropriate eviromet for your pet fox. Esure it has a comfortable place to rest, plety of space to move aroud, ad opportuities to play ad exercise.

Step 4: Socialize your pet with other aimals if possible. Iteractios with other creatures ca help them develop social skills ad prevet boredom or stress.

Step 5: Regularly check for sigs of illess or discomfort i your fox. Be prepared to provide veteriary care as eeded, especially for commo issues such as parasites, detal health problems, or ifectios.

Step 6: Educate yourself about the eeds of your pet fox ad be aware of ay potetial risks or cocers that could affect their well-beig. Regularly cosult with a veteriaria to esure you are providig the best care.

...

Now you have successfully asked a questio to the Yi model ad got a aswer! ?

Method 2: Perform iferece i web

To iitialize a lightweight ad swift chatbot, ru the followig commad.

cd llama.cpp
./server --ctx-size 2048 --host 0.0.0.0 ---gpu-layers 64 --model /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf

The you ca get a output like this:

...

llama_ew_cotext_with_model: _ctx      = 2048
llama_ew_cotext_with_model: freq_base  = 5000000.0
llama_ew_cotext_with_model: freq_scale = 1
ggml_metal_iit: allocatig
ggml_metal_iit: foud device: Apple M2 Pro
ggml_metal_iit: pickig default device: Apple M2 Pro
ggml_metal_iit: ggml.metallib ot foud, loadig from source
ggml_metal_iit: GGML_METAL_PATH_RESOURCES = il
ggml_metal_iit: loadig '/Users/yu/llama.cpp/ggml-metal.metal'
ggml_metal_iit: GPU ame:   Apple M2 Pro
ggml_metal_iit: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_iit: hasUifiedMemory              = true
ggml_metal_iit: recommededMaxWorkigSetSize  = 11453.25 MB
ggml_metal_iit: maxTrasferRate               = built-i GPU
ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size =   128.00 MiB, ( 2629.44 / 10922.67)
llama_ew_cotext_with_model: KV self size  =  128.00 MiB, K (f16):   64.00 MiB, V (f16):   64.00 MiB
ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 2629.45 / 10922.67)
llama_build_graph: o-view tesors processed: 676/676
llama_ew_cotext_with_model: compute buffer total size = 159.19 MiB
ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size =   156.02 MiB, ( 2785.45 / 10922.67)
Available slots:
-> Slot 0 - max cotext: 2048

llama server listeig at http://0.0.0.0:8080

To access the chatbot iterface, ope your web browser ad eter http://0.0.0.0:8080 ito the address bar.
Eter a questio, such as "How do you feed your pet fox? Please aswer this questio i 6 simple steps" ito the prompt widow, ad you will receive a correspodig aswer.

[ Back to top ⬆️ ]

Web demo

You ca build a web UI demo for Yi chat models (ote that Yi base models are ot supported i this seario).

Step 1: Prepare your eviromet.

Step 2: Dowload the Yi model.

Step 3. To start a web service locally, ru the followig commad.

pytho demo/web_demo.py -c <your-model-path>

You ca access the web UI by eterig the address provided i the cosole ito your browser.

Quick start - web demo

[ Back to top ⬆️ ]

Fie-tuig

bash fietue/scripts/ru_sft_Yi_6b.sh

Oce fiished, you ca compare the fietued model ad the base model with the followig commad:

bash fietue/scripts/ru_eval.sh

For advaced usage (like fie-tuig based o your custom data), see the explaatios below. ⬇️

Fietue code for Yi 6B ad 34B

Preparatio

From Image

By default, we use a small dataset from BAAI/COIG to fietue the base model. You ca also prepare your customized dataset i the followig jsol format:

{ "prompt": "Huma: Who are you? Assistat:", "chose": "I'm Yi." }

Ad the mout them i the cotaier to replace the default oes:

docker ru -it \
    -v /path/to/save/fietued/model/:/fietued-model \
    -v /path/to/trai.jsol:/yi/fietue/data/trai.jso \
    -v /path/to/eval.jsol:/yi/fietue/data/eval.jso \
    ghcr.io/01-ai/yi:latest \
    bash fietue/scripts/ru_sft_Yi_6b.sh

From Local Server

Make sure you have coda. If ot, use

mkdir -p ~/miicoda3
wget https://repo.aacoda.com/miicoda/Miicoda3-latest-Liux-x86_64.sh -O ~/miicoda3/miicoda.sh
bash ~/miicoda3/miicoda.sh -b -u -p ~/miicoda3
rm -rf ~/miicoda3/miicoda.sh
~/miicoda3/bi/coda iit bash
source ~/.bashrc

The, create a coda ev:

coda create - dev_ev pytho=3.10 -y
coda activate dev_ev
pip istall torch==2.0.1 deepspeed==0.10 tesorboard trasformers datasets setecepiece accelerate ray==2.7

Hardware Setup

For the Yi-6B model, a ode with 4 GPUs, each has GPU mem larger tha 60GB is recommeded.

For the Yi-34B model, because the usage of zero-offload techique takes a lot CPU memory, please be careful to limit the GPU umbers i 34B fietue traiig. Please use CUDAVISIBLEDEVICES to limit the GPU umber (as show i scripts/rusftYi_34b.sh).

A typical hardware setup for fietuig 34B model is a ode with 8GPUS (limit to 4 i ruig by CUDAVISIBLEDEVICES=0,1,2,3), each has GPU mem larger tha 80GB, with total CPU mem larger tha 900GB.

Quick Start

Dowload a LLM-base model to MODEL_PATH (6B ad 34B). A typical folder of models is like:

|-- $MODEL_PATH
|   |-- cofig.jso
|   |-- pytorch_model-00001-of-00002.bi
|   |-- pytorch_model-00002-of-00002.bi
|   |-- pytorch_model.bi.idex.jso
|   |-- tokeizer_cofig.jso
|   |-- tokeizer.model
|   |-- ...

Dowload a dataset from huggigface to local storage DATA_PATH, e.g. Dahoas/rm-static.

|-- $DATA_PATH
|   |-- data
|   |   |-- trai-00000-of-00001-2a1df75c6bce91ab.parquet
|   |   |-- test-00000-of-00001-8c7c51afc6d45980.parquet
|   |-- dataset_ifos.jso
|   |-- README.md

fietue/yi_example_dataset has example datasets, which are modified from BAAI/COIG

|-- $DATA_PATH
    |--data
        |-- trai.jsol
        |-- eval.jsol

cd ito the scripts folder, copy ad paste the script, ad ru. For example:

cd fietue/scripts

bash ru_sft_Yi_6b.sh

For the Yi-6B base model, settig traiigdebugsteps=20 ad umtraiepochs=4 ca output a chat model, which takes about 20 miutes.

For the Yi-34B base model, it takes a relatively log time for iitializatio. Please be patiet.

Evaluatio

cd fietue/scripts

bash ru_eval.sh

The you'll see the aswer from both the base model ad the fietued model.

[ Back to top ⬆️ ]

Quatizatio

GPT-Q

pytho quatizatio/gptq/quat_autogptq.py \
  --model /base_model                      \
  --output_dir /quatized_model            \
  --trust_remote_code

Oce fiished, you ca the evaluate the resultig model as follows:

pytho quatizatio/gptq/eval_quatized_model.py \
  --model /quatized_model                       \
  --trust_remote_code

For a more detailed explaatio, see the explaatios below. ⬇️

GPT-Q quatizatio

GPT-Q is a PTQ(Post-Traiig Quatizatio) method. It's memory savig ad provides potetial speedups while retaiig the accuracy of the model.

Yi models ca be GPT-Q quatized without a lot of efforts. We provide a step-by-step tutorial below.

To ru GPT-Q, we will use AutoGPTQ ad exllama. Ad the huggigface trasformers has itegrated optimum ad auto-gptq to perform GPTQ quatizatio o laguage models.

Do Quatizatio

The quat_autogptq.py script is provided for you to perform GPT-Q quatizatio:

pytho quat_autogptq.py --model /base_model \
    --output_dir /quatized_model --bits 4 --group_size 128 --trust_remote_code

Ru Quatized Model

You ca ru a quatized model usig the eval_quatized_model.py:

pytho eval_quatized_model.py --model /quatized_model --trust_remote_code

AWQ

pytho quatizatio/awq/quat_autoawq.py \
  --model /base_model                      \
  --output_dir /quatized_model            \
  --trust_remote_code

Oce fiished, you ca the evaluate the resultig model as follows:

pytho quatizatio/awq/eval_quatized_model.py \
  --model /quatized_model                       \
  --trust_remote_code

For detailed explaatios, see the explaatios below. ⬇️

AWQ quatizatio

AWQ is a PTQ(Post-Traiig Quatizatio) method. It's a efficiet ad accurate low-bit weight quatizatio (INT3/4) for LLMs.

Yi models ca be AWQ quatized without a lot of efforts. We provide a step-by-step tutorial below.

To ru AWQ, we will use AutoAWQ.

Do Quatizatio

The quat_autoawq.py script is provided for you to perform AWQ quatizatio:

pytho quat_autoawq.py --model /base_model \
    --output_dir /quatized_model --bits 4 --group_size 128 --trust_remote_code

Ru Quatized Model

You ca ru a quatized model usig the eval_quatized_model.py:

pytho eval_quatized_model.py --model /quatized_model --trust_remote_code

[ Back to top ⬆️ ]

Deploymet

If you wat to deploy Yi models, make sure you meet the software ad hardware requiremets.

Software requiremets

Before usig Yi quatized models, make sure you've istalled the correct software listed below.

Model	Software
Yi 4-bit quatized models	AWQ ad CUDA
Yi 8-bit quatized models	GPTQ ad CUDA

Hardware requiremets

Before deployig Yi i your eviromet, make sure your hardware meets the followig requiremets.

Chat models

Model	Miimum VRAM	Recommeded GPU Example
Yi-6B-Chat	15 GB	1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB)
Yi-6B-Chat-4bits	4 GB	1 x RTX 3060 (12 GB) 1 x RTX 4060 (8 GB)
Yi-6B-Chat-8bits	8 GB	1 x RTX 3070 (8 GB) 1 x RTX 4060 (8 GB)
Yi-34B-Chat	72 GB	4 x RTX 4090 (24 GB) 1 x A800 (80GB)
Yi-34B-Chat-4bits	20 GB	1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB) 1 x A100 (40 GB)
Yi-34B-Chat-8bits	38 GB	2 x RTX 3090 (24 GB) 2 x RTX 4090 (24 GB) 1 x A800 (40 GB)

Below are detailed miimum VRAM requiremets uder differet batch use cases.

Model	batch=1	batch=4	batch=16	batch=32
Yi-6B-Chat	12 GB	13 GB	15 GB	18 GB
Yi-6B-Chat-4bits	4 GB	5 GB	7 GB	10 GB
Yi-6B-Chat-8bits	7 GB	8 GB	10 GB	14 GB
Yi-34B-Chat	65 GB	68 GB	76 GB	> 80 GB
Yi-34B-Chat-4bits	19 GB	20 GB	30 GB	40 GB
Yi-34B-Chat-8bits	35 GB	37 GB	46 GB	58 GB

Base models

Model	Miimum VRAM	Recommeded GPU Example
Yi-6B	15 GB	1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB)
Yi-6B-200K	50 GB	1 x A800 (80 GB)
Yi-9B	20 GB	1 x RTX 4090 (24 GB)
Yi-34B	72 GB	4 x RTX 4090 (24 GB) 1 x A800 (80 GB)
Yi-34B-200K	200 GB	4 x A800 (80 GB)

[ Back to top ⬆️ ]

Learig hub

If you wat to lear Yi, you ca fid a wealth of helpful educatioal resources here. ⬇️

Welcome to the Yi learig hub!

Whether you're a seasoed developer or a ewcomer, you ca fid a wealth of helpful educatioal resources to ehace your uderstadig ad skills with Yi models, icludig isightful blog posts, comprehesive video tutorials, hads-o guides, ad more.

The cotet you fid here has bee geerously cotributed by kowledgeable Yi experts ad passioate ethusiasts. We exted our heartfelt gratitude for your ivaluable cotributios!

At the same time, we also warmly ivite you to joi our collaborative effort by cotributig to Yi. If you have already made cotributios to Yi, please do't hesitate to showcase your remarkable work i the table below.

With all these resources at your figertips, you're ready to start your excitig jourey with Yi. Happy learig! ?

Tutorials

Eglish tutorials

Type	Deliverable	Date	Author
Video	Ru dolphi-2.2-yi-34b o IoT Devices	2023-11-30	Secod State
Blog	Ruig Yi-34B-Chat locally usig LlamaEdge	2023-11-30	Secod State
Video	Istall Yi 34B Locally - Chiese Eglish Biligual LLM	2023-11-05	Fahd Mirza
Video	Dolphi Yi 34b - Brad New Foudatioal Model TESTED	2023-11-27	Matthew Berma

Chiese tutorials

Type	Deliverable	Date	Author
Blog	实测零一万物Yi-VL多模态语言模型：能准确“识图吃瓜”	2024-02-02	苏洋
Blog	本地运行零一万物 34B 大模型，使用 Llama.cpp & 21G 显存	2023-11-26	苏洋
Blog	零一万物模型折腾笔记：官方 Yi-34B 模型基础使用	2023-12-10	苏洋
Blog	CPU 混合推理，非常见大模型量化方案：“二三五六” 位量化方案	2023-12-12	苏洋
Blog	单卡 3 小时训练 Yi-6B 大模型 Aget：基于 Llama Factory 实战	2024-01-22	郑耀威
Blog	零一万物开源Yi-VL多模态大模型，魔搭社区推理&微调最佳实践来啦！	2024-01-26	ModelScope
Video	只需 24G 显存，用 vllm 跑起来 Yi-34B 中英双语大模型	2023-12-28	漆妮妮
Video	Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来	2023-01-28	漆妮妮

Why Yi?

Ecosystem
- Upstream
- Dowstream
- Servig
- Quatizatio
- Fie-tuig
- API
Bechmarks

Ecosystem

Yi has a comprehesive ecosystem, offerig a rage of tools, services, ad models to erich your experieces ad maximize productivity.

Upstream
Dowstream
Servig
Quatizatio
Fie-tuig
API

Upstream

The Yi series models follow the same model architecture as Llama. By choosig Yi, you ca leverage existig tools, libraries, ad resources withi the Llama ecosystem, elimiatig the eed to create ew tools ad ehacig developmet efficiecy.

For example, the Yi series models are saved i the format of the Llama model. You ca directly use LlamaForCausalLM ad LlamaTokeizer to load the model. For more iformatio, see Use the chat model.

from trasformers import AutoModelForCausalLM, AutoTokeizer

tokeizer = AutoTokeizer.from_pretraied("01-ai/Yi-34b", use_fast=False)

model = AutoModelForCausalLM.from_pretraied("01-ai/Yi-34b", device_map="auto")

[ Back to top ⬆️ ]

Dowstream

? Tip

Feel free to create a PR ad share the fatastic work you've built usig the Yi series models.

To help others quickly uderstad your work, it is recommeded to use the format of <model-ame>: <model-itro> + <model-highlights>.

Servig

If you wat to get up with Yi i a few miutes, you ca use the followig services built upo Yi.

Yi-34B-Chat: you ca chat with Yi usig oe of the followig platforms:
Yi-34B-Chat | Huggig Face
Yi-34B-Chat | Yi Platform: Note that curretly it's available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese) ad experiece it firsthad!
Yi-6B-Chat (Replicate): you ca use this model with more optios by settig additioal parameters ad callig APIs.
ScaleLLM: you ca use this service to ru Yi models locally with added flexibility ad customizatio.

Quatizatio

If you have limited computatioal capabilities, you ca use Yi's quatized models as follows.

These quatized models have reduced precisio but offer icreased efficiecy, such as faster iferece speed ad smaller RAM usage.

TheBloke/Yi-34B-GPTQ
TheBloke/Yi-34B-GGUF
TheBloke/Yi-34B-AWQ

Fie-tuig

If you're seekig to explore the diverse capabilities withi Yi's thrivig family, you ca delve ito Yi's fie-tued models as below.

TheBloke Models: this site hosts umerous fie-tued models derived from various LLMs icludig Yi.

This is ot a exhaustive list for Yi, but to ame a few sorted o dowloads:
TheBloke/dolphi-2_2-yi-34b-AWQ
TheBloke/Yi-34B-Chat-AWQ
TheBloke/Yi-34B-Chat-GPTQ
SUSTech/SUS-Chat-34B: this model raked first amog all models below 70B ad outperformed the twice larger deepseek-llm-67b-chat. You ca check the result o the Ope LLM Leaderboard.
OrioStarAI/OrioStar-Yi-34B-Chat-Llama: this model excelled beyod other models (such as GPT-4, Qwe-14B-Chat, Baichua2-13B-Chat) i C-Eval ad CMMLU evaluatios o the OpeCompass LLM Leaderboard.
NousResearch/Nous-Capybara-34B: this model is traied with 200K cotext legth ad 3 epochs o the Capybara dataset.

API

amazig-opeai-api: this tool coverts Yi model APIs ito the OpeAI API format out of the box.
LlamaEdge: this tool builds a OpeAI-compatible API server for Yi-34B-Chat usig a portable Wasm (WebAssembly) file, powered by Rust.

[ Back to top ⬆️ ]

Chat model performace

Yi-34B-Chat model demostrates exceptioal performace, rakig first amog all existig ope-source models i the bechmarks icludig MMLU, CMMLU, BBH, GSM8k, ad more.

Chat model performace

Evaluatio methods ad challeges. ⬇️

Evaluatio methods: we evaluated various bechmarks usig both zero-shot ad few-shot methods, except for TruthfulQA.
Zero-shot vs. few-shot: i chat models, the zero-shot approach is more commoly employed.
Evaluatio strategy: our evaluatio strategy ivolves geeratig resposes while followig istructios explicitly or implicitly (such as usig few-shot examples). We the isolate relevat aswers from the geerated text.
Challeges faced: some models are ot well-suited to produce output i the specific format required by istructios i few datasets, which leads to suboptimal results.

*: C-Eval results are evaluated o the validatio datasets

Base model performace

Yi-34B ad Yi-34B-200K

The Yi-34B ad Yi-34B-200K models stad out as the top performers amog ope-source models, especially excellig i MMLU, CMMLU, commo-sese reasoig, readig comprehesio, ad more.

Base model performace

Evaluatio methods. ⬇️

Disparity i results: while bechmarkig ope-source models, a disparity has bee oted betwee results from our pipelie ad those reported by public sources like OpeCompass.
Ivestigatio fidigs: a deeper ivestigatio reveals that variatios i prompts, post-processig strategies, ad samplig techiques across models may lead to sigificat outcome differeces.
Uiform bechmarkig process: our methodology aligs with the origial bechmarks—cosistet prompts ad post-processig strategies are used, ad greedy decodig is applied durig evaluatios without ay post-processig for the geerated cotet.
Efforts to retrieve ureported scores: for scores that were ot reported by the origial authors (icludig scores reported with differet settigs), we try to get results with our pipelie.
Extesive model evaluatio: to evaluate the model’s capability extesively, we adopted the methodology outlied i Llama2. Specifically, we icluded PIQA, SIQA, HellaSwag, WioGrade, ARC, OBQA, ad CSQA to assess commo sese reasoig. SquAD, QuAC, ad BoolQ were icorporated to evaluate readig comprehesio.
Special cofiguratios: CSQA was exclusively tested usig a 7-shot setup, while all other tests were coducted with a 0-shot cofiguratio. Additioally, we itroduced GSM8K (8-shot@1), MATH (4-shot@1), HumaEval (0-shot@1), ad MBPP (3-shot@1) uder the category "Math & Code".
Falco-180B caveat: Falco-180B was ot tested o QuAC ad OBQA due to techical costraits. Its performace score is a average from other tasks, ad cosiderig the geerally lower scores of these two tasks, Falco-180B's capabilities are likely ot uderestimated.

Yi-9B

Yi-9B is almost the best amog a rage of similar-sized ope-source models (icludig Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 ad more), particularly excellig i code, math, commo-sese reasoig, ad readig comprehesio.

Yi-9B bechmark - details

I terms of overall ability (Mea-All), Yi-9B performs the best amog similarly sized ope-source models, surpassig DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, ad Gemma-7B.

Yi-9B bechmark - overall

I terms of codig ability (Mea-Code), Yi-9B's performace is secod oly to DeepSeek-Coder-7B, surpassig Yi-34B, SOLAR-10.7B, Mistral-7B, ad Gemma-7B.

Yi-9B bechmark - code

I terms of math ability (Mea-Math), Yi-9B's performace is secod oly to DeepSeek-Math-7B, surpassig SOLAR-10.7B, Mistral-7B, ad Gemma-7B.

Yi-9B bechmark - math

I terms of commo sese ad reasoig ability (Mea-Text), Yi-9B's performace is o par with Mistral-7B, SOLAR-10.7B, ad Gemma-7B.

Yi-9B bechmark - text

[ Back to top ⬆️ ]

Who ca use Yi?

Everyoe! ? ✅

The Yi series models are free for persoal usage, academic purposes, ad commercial use. All usage must adhere to the Yi Series Models Commuity Licese Agreemet 2.1
For free commercial use, you oly eed to complete this form to get a Yi Model Commercial Licese.

[ Back to top ⬆️ ]

Misc.

Ackowledgmets

A heartfelt thak you to each of you who have made cotributios to the Yi commuity! You have helped Yi ot just a project, but a vibrat, growig home for iovatio.

[ Back to top ⬆️ ]

Disclaimer

We use data compliace checkig algorithms durig the traiig process, to esure the compliace of the traied model to the best of our ability. Due to complex data ad the diversity of laguage model usage scearios, we caot guaratee that the model will geerate correct, ad reasoable output i all scearios. Please be aware that there is still a risk of the model producig problematic outputs. We will ot be resposible for ay risks ad issues resultig from misuse, misguidace, illegal usage, ad related misiformatio, as well as ay associated data security cocers.

[ Back to top ⬆️ ]

Licese

The source code i this repo is licesed uder the Apache 2.0 licese. The Yi series models are fully ope for academic research ad free for commercial use, with automatic permissio grated upo applicatio. All usage must adhere to the Yi Series Models Commuity Licese Agreemet 2.1. For free commercial use, you oly eed to sed a email to get official commercial permissio.

[ Back to top ⬆️ ]

Yi-9B-200K-GPTQ

技术信息

作品详情

Buildig the Next Geeratio of Ope-Source ad Biligual LLMs

What is Yi?

Itroductio

News

Models

Chat models

Base models

Model ifo

How to use Yi?

Quick start

Choose your path

? Deploy Yi locally

? Not to deploy Yi locally

?‍♀️ Ru Yi with APIs

?‍♀️ Ru Yi i playgroud

?‍♀️ Chat with Yi

Quick start - pip

Step 0: Prerequisites

Step 1: Prepare your eviromet

Step 2: Dowload the Yi model

Step 3: Perform iferece

Perform iferece with Yi chat model

Perform iferece with Yi base model

Quick start - Docker

Step 0: Prerequisites

Step 1: Start Docker

Step 2: Perform iferece

Perform iferece with Yi chat model

Perform iferece with Yi base model

Quick start - coda-lock

Quick start - llama.cpp

Step 0: Prerequisites

Step 1: Dowload llama.cpp

Step 2: Dowload Yi model

Step 3: Perform iferece

Method 1: Perform iferece i termial

Tips

Method 2: Perform iferece i web

Web demo

Fie-tuig

Fietue code for Yi 6B ad 34B

Preparatio

From Image

From Local Server

Hardware Setup

Quick Start

Evaluatio

Quatizatio

GPT-Q

GPT-Q quatizatio

Do Quatizatio

Ru Quatized Model

AWQ

AWQ quatizatio

Do Quatizatio

Ru Quatized Model

Deploymet

Software requiremets

Hardware requiremets

Chat models

Base models

Learig hub

Tutorials

Eglish tutorials

Chiese tutorials

Why Yi?

Ecosystem

Upstream

Dowstream

Servig

Quatizatio

Fie-tuig

API

Bechmarks

Chat model performace

Base model performace

Yi-34B ad Yi-34B-200K

Step 1: Dowload `llama.cpp`