Yi-9B-200K-GPTQ

我要开发同款
匿名用户2024年07月31日
63阅读

技术信息

开源地址
https://modelscope.cn/models/mirror013/Yi-9B-200K-GPTQ
授权协议
other

作品详情

Buildig the Next Geeratio of Ope-Source ad Biligual LLMs

? Huggig Face • ? ModelScope • ✡️ WiseModel

? Joi us ? WeChat (Chiese) !

GPTQ quatized versio of Yi-9B-200K model.


? Table of Cotets


What is Yi?

Itroductio

  • ? The Yi series models are the ext geeratio of ope-source large laguage models traied from scratch by 01.AI.

  • ? Targeted as a biligual laguage model ad traied o 3T multiligual corpus, the Yi series models become oe of the strogest LLM worldwide, showig promise i laguage uderstadig, commosese reasoig, readig comprehesio, ad more. For example,

  • Yi-34B-Chat model laded i secod place (followig GPT-4 Turbo), outperformig other LLMs (such as GPT-4, Mixtral, Claude) o the AlpacaEval Leaderboard (based o data available up to Jauary 2024).

  • Yi-34B model raked first amog all existig ope-source models (such as Falco-180B, Llama-70B, Claude) i both Eglish ad Chiese o various bechmarks, icludig Huggig Face Ope LLM Leaderboard (pre-traied) ad C-Eval (based o data available up to November 2023).

  • ? (Credits to Llama) Thaks to the Trasformer ad Llama ope-source commuities, as they reduce the efforts required to build from scratch ad eable the utilizatio of the same tools withi the AI ecosystem.

    If you're iterested i Yi's adoptio of Llama architecture ad licese usage policy, see Yi's relatio with Llama. ⬇️

    ? TL;DR

    The Yi series models adopt the same model architecture as Llama but are NOT derivatives of Llama.

    • Both Yi ad Llama are all based o the Trasformer structure, which has bee the stadard architecture for large laguage models sice 2018.

    • Grouded i the Trasformer architecture, Llama has become a ew corerstoe for the majority of state-of-the-art ope-source models due to its excellet stability, reliable covergece, ad robust compatibility. This positios Llama as the recogized foudatioal framework for models icludig Yi.

    • Thaks to the Trasformer ad Llama architectures, other models ca leverage their power, reducig the effort required to build from scratch ad eablig the utilizatio of the same tools withi their ecosystems.

    • However, the Yi series models are NOT derivatives of Llama, as they do ot use Llama's weights.

    • As Llama's structure is employed by the majority of ope-source models, the key factors of determiig model performace are traiig datasets, traiig pipelies, ad traiig ifrastructure.

    • Developig i a uique ad proprietary way, Yi has idepedetly created its ow high-quality traiig datasets, efficiet traiig pipelies, ad robust traiig ifrastructure etirely from the groud up. This effort has led to excellet performace with Yi series models rakig just behid GPT4 ad surpassig Llama o the Alpaca Leaderboard i Dec 2023.

[ Back to top ⬆️ ]

News

? 2024-03-07: The log text capability of the Yi-34B-200K has bee ehaced.
I the "Needle-i-a-Haystack" test, the Yi-34B-200K's performace is improved by 10.5%, risig from 89.3% to a impressive 99.8%. We cotiue to pretrai the model o 5B tokes log-cotext data mixture ad demostrate a ear-all-gree performace.

? 2024-03-06: The Yi-9B is ope-sourced ad available to the public.
Yi-9B stads out as the top performer amog a rage of similar-sized ope-source models (icludig Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 ad more), particularly excellig i code, math, commo-sese reasoig, ad readig comprehesio.

? 2024-01-23: The Yi-VL models, <a href="https://huggigface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a> ad <a href="https://huggigface.co/01-ai/Yi-VL-6B">Yi-VL-6B</a>, are ope-sourced ad available to the public.
<a href="https://huggigface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a> has raked first amog all existig ope-source models i the latest bechmarks, icludig MMMU ad CMMMU (based o data available up to Jauary 2024).

? 2023-11-23: Chat models are ope-sourced ad available to the public.
This release cotais two chat models based o previously released base models, two 8-bit models quatized by GPTQ, ad two 4-bit models quatized by AWQ.

  • Yi-34B-Chat
  • Yi-34B-Chat-4bits
  • Yi-34B-Chat-8bits
  • Yi-6B-Chat
  • Yi-6B-Chat-4bits
  • Yi-6B-Chat-8bits

You ca try some of them iteractively at:

? 2023-11-23: The Yi Series Models Commuity Licese Agreemet is updated to v2.1.

? 2023-11-08: Ivited test of Yi-34B chat model.
Applicatio form:

? 2023-11-05: The base models, Yi-6B-200K ad Yi-34B-200K, are ope-sourced ad available to the public.
This release cotais two base models with the same parameter sizes as the previous release, except that the cotext widow is exteded to 200K.

? 2023-11-02: The base models, Yi-6B ad Yi-34B, are ope-sourced ad available to the public.
The first public release cotais two biligual (Eglish/Chiese) base models with the parameter sizes of 6B ad 34B. Both of them are traied with 4K sequece legth ad ca be exteded to 32K durig iferece time.

[ Back to top ⬆️ ]

Models

Yi models come i multiple sizes ad cater to differet use cases. You ca also fie-tue Yi models to meet your specific requiremets.

If you wat to deploy Yi models, make sure you meet the software ad hardware requiremets.

Chat models

Model Dowload
Yi-34B-Chat ? Huggig Face? ModelScope
Yi-34B-Chat-4bits ? Huggig Face? ModelScope
Yi-34B-Chat-8bits ? Huggig Face? ModelScope
Yi-6B-Chat ? Huggig Face? ModelScope
Yi-6B-Chat-4bits ? Huggig Face? ModelScope
Yi-6B-Chat-8bits ? Huggig Face? ModelScope

- 4-bit series models are quatized by AWQ.
- 8-bit series models are quatized by GPTQ
- All quatized models have a low barrier to use sice they ca be deployed o cosumer-grade GPUs (e.g., 3090, 4090).

Base models

Model Dowload
Yi-34B ? Huggig Face? ModelScope
Yi-34B-200K ? Huggig Face? ModelScope
Yi-9B ? Huggig Face
Yi-6B ? Huggig Face? ModelScope
Yi-6B-200K ? Huggig Face? ModelScope

- 200k is roughly equivalet to 400,000 Chiese characters.
- If you wat to use the previous versio of the Yi-34B-200K (released o Nov 5, 2023), ru git checkout 069cd341d60f4ce4b07ec394e82b79e94f656cf to dowload the weight.

Model ifo

  • For chat ad base models
Model Itro Default cotext widow Pretraied tokes Traiig Data Date
6B series models They are suitable for persoal ad academic use. 4K 3T Up to Jue 2023
9B model It is the best at codig ad math i the Yi series models. 4K Yi-9B is cotiuously traied based o Yi-6B, usig 0.8T tokes. Up to Jue 2023
34B series models They are suitable for persoal, academic, ad commercial (particularly for small ad medium-sized eterprises) purposes. It's a cost-effective solutio that's affordable ad equipped with emerget ability. 4K 3T Up to Jue 2023
  • For chat models

    For chat model limitatios, see the explaatios below. ⬇️


    The released chat model has udergoe exclusive traiig usig Supervised Fie-Tuig (SFT). Compared to other stadard chat models, our model produces more diverse resposes, makig it suitable for various dowstream tasks, such as creative scearios. Furthermore, this diversity is expected to ehace the likelihood of geeratig higher quality resposes, which will be advatageous for subsequet Reiforcemet Learig (RL) traiig.
    However, this higher diversity might amplify certai existig issues, icludig:
  • Halluciatio: This refers to the model geeratig factually icorrect or osesical iformatio. With the model's resposes beig more varied, there's a higher chace of halluciatio that are ot based o accurate data or logical reasoig.
  • No-determiism i re-geeratio: Whe attemptig to regeerate or sample resposes, icosistecies i the outcomes may occur. The icreased diversity ca lead to varyig results eve uder similar iput coditios.
  • Cumulative Error: This occurs whe errors i the model's resposes compoud over time. As the model geerates more diverse resposes, the likelihood of small iaccuracies buildig up ito larger errors icreases, especially i complex tasks like exteded reasoig, mathematical problem-solvig, etc.
  • To achieve more coheret ad cosistet resposes, it is advisable to adjust geeratio cofiguratio parameters such as temperature, top_p, or top_k. These adjustmets ca help i the balace betwee creativity ad coherece i the model's outputs.

[ Back to top ⬆️ ]

How to use Yi?

Quick start

Gettig up ad ruig with Yi models is simple with multiple choices available.

Choose your path

Select oe of the followig paths to begi your jourey with Yi!

Quick start - Choose your path

? Deploy Yi locally

If you prefer to deploy Yi models locally,

  • ?‍♀️ ad you have sufficiet resources (for example, NVIDIA A800 80GB), you ca choose oe of the followig methods:

  • ?‍♀️ ad you have limited resources (for example, a MacBook Pro), you ca use llama.cpp.

? Not to deploy Yi locally

If you prefer ot to deploy Yi models locally, you ca explore Yi's capabilities usig ay of the followig optios.

?‍♀️ Ru Yi with APIs

If you wat to explore more features of Yi, you ca adopt oe of these methods:

?‍♀️ Ru Yi i playgroud

If you wat to chat with Yi with more customizable optios (e.g., system prompt, temperature, repetitio pealty, etc.), you ca try oe of the followig optios:

?‍♀️ Chat with Yi

If you wat to chat with Yi, you ca use oe of these olie services, which offer a similar user experiece:

  • Yi-34B-Chat (Yi official o Huggig Face)

  • No registratio is required.

  • Yi-34B-Chat (Yi official beta)

  • Access is available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese).

[ Back to top ⬆️ ]

Quick start - pip

This tutorial guides you through every step of ruig Yi-34B-Chat locally o a A800 (80G) ad the performig iferece.

Step 0: Prerequisites

Step 1: Prepare your eviromet

To set up the eviromet ad istall the required packages, execute the followig commad.

git cloe https://github.com/01-ai/Yi.git
cd yi
pip istall -r requiremets.txt

Step 2: Dowload the Yi model

You ca dowload the weights ad tokeizer of Yi models from the followig sources:

Step 3: Perform iferece

You ca perform iferece with Yi chat or base models as below.

Perform iferece with Yi chat model
  1. Create a file amed quick_start.py ad copy the followig cotet to it.

    from trasformers import AutoModelForCausalLM, AutoTokeizer
    
    model_path = '<your-model-path>'
    
    tokeizer = AutoTokeizer.from_pretraied(model_path, use_fast=False)
    
    # Sice trasformers 4.35.0, the GPT-Q/AWQ model ca be loaded usig AutoModelForCausalLM.
    model = AutoModelForCausalLM.from_pretraied(
        model_path,
        device_map="auto",
        torch_dtype='auto'
    ).eval()
    
    # Prompt cotet: "hi"
    messages = [
        {"role": "user", "cotet": "hi"}
    ]
    
    iput_ids = tokeizer.apply_chat_template(coversatio=messages, tokeize=True, add_geeratio_prompt=True, retur_tesors='pt')
    output_ids = model.geerate(iput_ids.to('cuda'))
    respose = tokeizer.decode(output_ids[0][iput_ids.shape[1]:], skip_special_tokes=True)
    
    # Model respose: "Hello! How ca I assist you today?"
    prit(respose)
    
  2. Ru quick_start.py.

    pytho quick_start.py
    

    The you ca see a output similar to the oe below. ?

    Hello! How ca I assist you today?
    
Perform iferece with Yi base model
  pytho demo/text_geeratio.py  --model <your-model-path>

The you ca see a output similar to the oe below. ?

Output. ⬇️


Prompt: Let me tell you a iterestig story about cat Tom ad mouse Jerry,

Geeratio: Let me tell you a iterestig story about cat Tom ad mouse Jerry, which happeed i my childhood. My father had a big house with two cats livig iside it to kill mice. Oe day whe I was playig at home aloe, I foud oe of the tomcats lyig o his back ear our kitche door, lookig very much like he wated somethig from us but could’t get up because there were too may people aroud him! He kept tryig for several miutes before fially givig up…

  • Yi-9B

    Iput

  from trasformers import AutoModelForCausalLM, AutoTokeizer

  MODEL_DIR = "01-ai/Yi-9B"
  model = AutoModelForCausalLM.from_pretraied(MODEL_DIR, torch_dtype="auto")
  tokeizer = AutoTokeizer.from_pretraied(MODEL_DIR, use_fast=False)

  iput_text = "# write the quick sort algorithm"
  iputs = tokeizer(iput_text, retur_tesors="pt").to(model.device)
  outputs = model.geerate(**iputs, max_legth=256)
  prit(tokeizer.decode(outputs[0], skip_special_tokes=True))

Output

  # write the quick sort algorithm
  def quick_sort(arr):
      if le(arr) <= 1:
          retur arr
      pivot = arr[le(arr) // 2]
      left = [x for x i arr if x < pivot]
      middle = [x for x i arr if x == pivot]
      right = [x for x i arr if x > pivot]
      retur quick_sort(left) + middle + quick_sort(right)

  # test the quick sort algorithm
  prit(quick_sort([3, 6, 8, 10, 1, 2, 1]))
<p alig="right"> [
<a href="#top">Back to top ⬆️ </a>  ] 

Quick start - Docker

Ru Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️
This tutorial guides you through every step of ruig Yi-34B-Chat o a A800 GPU or 4*4090 locally ad the performig iferece.

Step 0: Prerequisites

Make sure you've istalled Docker ad vidia-cotaier-toolkit.

Step 1: Start Docker

docker ru -it --gpus all \
-v &lt;your-model-path&gt;: /models
ghcr.io/01-ai/yi:latest

Alteratively, you ca pull the Yi Docker image from registry.ligyiwawu.com/ci/01-ai/yi:latest.

Step 2: Perform iferece

<p>You ca perform iferece with Yi chat or base models as below.</p>
Perform iferece with Yi chat model
<p>The steps are similar to <a href="#perform-iferece-with-yi-chat-model">pip - Perform iferece with Yi chat model</a>.</p>
<p><strog>Note</strog> that the oly differece is to set model_path = '&lt;your-model-mout-path&gt;' istead of model_path = '&lt;your-model-path&gt;'.</p>
Perform iferece with Yi base model
<p>The steps are similar to <a href="#perform-iferece-with-yi-base-model">pip - Perform iferece with Yi base model</a>.</p>
<p><strog>Note</strog> that the oly differece is to set --model &lt;your-model-mout-path&gt;' istead of model &lt;your-model-path&gt;.</p>

Quick start - coda-lock

You ca use <a href="https://github.com/coda/coda-lock">coda-lock</a> to geerate fully reproducible lock files for coda eviromets. ⬇️
You ca refer to coda-lock.yml for the exact versios of the depedecies. Additioally, you ca utilize <a href="https://mamba.readthedocs.io/e/latest/user_guide/micromamba.html">micromamba</a> for istallig these depedecies.
To istall the depedecies, follow these steps:

  1. Istall micromamba by followig the istructios available here.

  2. Execute micromamba istall -y - yi -f coda-lock.yml to create a coda eviromet amed yi ad istall the ecessary depedecies.

Quick start - llama.cpp

Ru Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️
This tutorial guides you through every step of ruig a quatized model (Yi-chat-6B-2bits) locally ad the performig iferece.

Step 0: Prerequisites

  • This tutorial assumes you use a MacBook Pro with 16GB of memory ad a Apple M2 Pro chip.

  • Make sure git-lfs is istalled o your machie.

Step 1: Dowload llama.cpp

To cloe the llama.cpp repository, ru the followig commad.

git cloe git@github.com:ggergaov/llama.cpp.git

Step 2: Dowload Yi model

2.1 To cloe XeIaso/yi-chat-6B-GGUF with just poiters, ru the followig commad.

GIT_LFS_SKIP_SMUDGE=1 git cloe https://huggigface.co/XeIaso/yi-chat-6B-GGUF

2.2 To dowload a quatized Yi model (yi-chat-6b.Q2_K.gguf), ru the followig commad.

git-lfs pull --iclude yi-chat-6b.Q2_K.gguf

Step 3: Perform iferece

To perform iferece with the Yi model, you ca use oe of the followig methods.

Method 1: Perform iferece i termial

To compile llama.cpp usig 4 threads ad the coduct iferece, avigate to the llama.cpp directory, ad ru the followig commad.

Tips
  • Replace /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf with the actual path of your model.

  • By default, the model operates i completio mode.

  • For additioal output customizatio optios (for example, system prompt, temperature, repetitio pealty, etc.), ru ./mai -h to check detailed descriptios ad usage.

make -j4 && ./mai -m /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf -p "How do you feed your pet fox? Please aswer this questio i 6 simple steps:\Step 1:" - 384 -e

...

How do you feed your pet fox? Please aswer this questio i 6 simple steps:

Step 1: Select the appropriate food for your pet fox. You should choose high-quality, balaced prey items that are suitable for their uique dietary eeds. These could iclude live or froze mice, rats, pigeos, or other small mammals, as well as fresh fruits ad vegetables.

Step 2: Feed your pet fox oce or twice a day, depedig o the species ad its idividual prefereces. Always esure that they have access to fresh water throughout the day.

Step 3: Provide a appropriate eviromet for your pet fox. Esure it has a comfortable place to rest, plety of space to move aroud, ad opportuities to play ad exercise.

Step 4: Socialize your pet with other aimals if possible. Iteractios with other creatures ca help them develop social skills ad prevet boredom or stress.

Step 5: Regularly check for sigs of illess or discomfort i your fox. Be prepared to provide veteriary care as eeded, especially for commo issues such as parasites, detal health problems, or ifectios.

Step 6: Educate yourself about the eeds of your pet fox ad be aware of ay potetial risks or cocers that could affect their well-beig. Regularly cosult with a veteriaria to esure you are providig the best care.

...

Now you have successfully asked a questio to the Yi model ad got a aswer! ?

Method 2: Perform iferece i web
  1. To iitialize a lightweight ad swift chatbot, ru the followig commad.

    cd llama.cpp
    ./server --ctx-size 2048 --host 0.0.0.0 ---gpu-layers 64 --model /Users/yu/yi-chat-6B-GGUF/yi-chat-6b.Q2_K.gguf
    

    The you ca get a output like this:

    ...
    
    llama_ew_cotext_with_model: _ctx      = 2048
    llama_ew_cotext_with_model: freq_base  = 5000000.0
    llama_ew_cotext_with_model: freq_scale = 1
    ggml_metal_iit: allocatig
    ggml_metal_iit: foud device: Apple M2 Pro
    ggml_metal_iit: pickig default device: Apple M2 Pro
    ggml_metal_iit: ggml.metallib ot foud, loadig from source
    ggml_metal_iit: GGML_METAL_PATH_RESOURCES = il
    ggml_metal_iit: loadig '/Users/yu/llama.cpp/ggml-metal.metal'
    ggml_metal_iit: GPU ame:   Apple M2 Pro
    ggml_metal_iit: GPU family: MTLGPUFamilyApple8 (1008)
    ggml_metal_iit: hasUifiedMemory              = true
    ggml_metal_iit: recommededMaxWorkigSetSize  = 11453.25 MB
    ggml_metal_iit: maxTrasferRate               = built-i GPU
    ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size =   128.00 MiB, ( 2629.44 / 10922.67)
    llama_ew_cotext_with_model: KV self size  =  128.00 MiB, K (f16):   64.00 MiB, V (f16):   64.00 MiB
    ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 2629.45 / 10922.67)
    llama_build_graph: o-view tesors processed: 676/676
    llama_ew_cotext_with_model: compute buffer total size = 159.19 MiB
    ggml_backed_metal_buffer_type_alloc_buffer: allocated buffer, size =   156.02 MiB, ( 2785.45 / 10922.67)
    Available slots:
    -> Slot 0 - max cotext: 2048
    
    llama server listeig at http://0.0.0.0:8080
    
  2. To access the chatbot iterface, ope your web browser ad eter http://0.0.0.0:8080 ito the address bar.

    Yi model chatbot iterface - llama.cpp

  3. Eter a questio, such as "How do you feed your pet fox? Please aswer this questio i 6 simple steps" ito the prompt widow, ad you will receive a correspodig aswer.

    Ask a questio to Yi model - llama.cpp

[ Back to top ⬆️ ]

Web demo

You ca build a web UI demo for Yi chat models (ote that Yi base models are ot supported i this seario).

Step 1: Prepare your eviromet.

Step 2: Dowload the Yi model.

Step 3. To start a web service locally, ru the followig commad.

pytho demo/web_demo.py -c <your-model-path>

You ca access the web UI by eterig the address provided i the cosole ito your browser.

Quick start - web demo

[ Back to top ⬆️ ]

Fie-tuig

bash fietue/scripts/ru_sft_Yi_6b.sh

Oce fiished, you ca compare the fietued model ad the base model with the followig commad:

bash fietue/scripts/ru_eval.sh

For advaced usage (like fie-tuig based o your custom data), see the explaatios below. ⬇️

    Fietue code for Yi 6B ad 34B

    Preparatio

    From Image

    By default, we use a small dataset from BAAI/COIG to fietue the base model. You ca also prepare your customized dataset i the followig jsol format:

    { "prompt": "Huma: Who are you? Assistat:", "chose": "I'm Yi." }
    

    Ad the mout them i the cotaier to replace the default oes:

    docker ru -it \
        -v /path/to/save/fietued/model/:/fietued-model \
        -v /path/to/trai.jsol:/yi/fietue/data/trai.jso \
        -v /path/to/eval.jsol:/yi/fietue/data/eval.jso \
        ghcr.io/01-ai/yi:latest \
        bash fietue/scripts/ru_sft_Yi_6b.sh
    
    From Local Server

    Make sure you have coda. If ot, use

    mkdir -p ~/miicoda3
    wget https://repo.aacoda.com/miicoda/Miicoda3-latest-Liux-x86_64.sh -O ~/miicoda3/miicoda.sh
    bash ~/miicoda3/miicoda.sh -b -u -p ~/miicoda3
    rm -rf ~/miicoda3/miicoda.sh
    ~/miicoda3/bi/coda iit bash
    source ~/.bashrc
    

    The, create a coda ev:

    coda create - dev_ev pytho=3.10 -y
    coda activate dev_ev
    pip istall torch==2.0.1 deepspeed==0.10 tesorboard trasformers datasets setecepiece accelerate ray==2.7
    

    Hardware Setup

    For the Yi-6B model, a ode with 4 GPUs, each has GPU mem larger tha 60GB is recommeded.

    For the Yi-34B model, because the usage of zero-offload techique takes a lot CPU memory, please be careful to limit the GPU umbers i 34B fietue traiig. Please use CUDAVISIBLEDEVICES to limit the GPU umber (as show i scripts/rusftYi_34b.sh).

    A typical hardware setup for fietuig 34B model is a ode with 8GPUS (limit to 4 i ruig by CUDAVISIBLEDEVICES=0,1,2,3), each has GPU mem larger tha 80GB, with total CPU mem larger tha 900GB.

    Quick Start

    Dowload a LLM-base model to MODEL_PATH (6B ad 34B). A typical folder of models is like:

    |-- $MODEL_PATH
    |   |-- cofig.jso
    |   |-- pytorch_model-00001-of-00002.bi
    |   |-- pytorch_model-00002-of-00002.bi
    |   |-- pytorch_model.bi.idex.jso
    |   |-- tokeizer_cofig.jso
    |   |-- tokeizer.model
    |   |-- ...
    

    Dowload a dataset from huggigface to local storage DATA_PATH, e.g. Dahoas/rm-static.

    |-- $DATA_PATH
    |   |-- data
    |   |   |-- trai-00000-of-00001-2a1df75c6bce91ab.parquet
    |   |   |-- test-00000-of-00001-8c7c51afc6d45980.parquet
    |   |-- dataset_ifos.jso
    |   |-- README.md
    

    fietue/yi_example_dataset has example datasets, which are modified from BAAI/COIG

    |-- $DATA_PATH
        |--data
            |-- trai.jsol
            |-- eval.jsol
    

    cd ito the scripts folder, copy ad paste the script, ad ru. For example:

    cd fietue/scripts
    
    bash ru_sft_Yi_6b.sh
    

    For the Yi-6B base model, settig traiigdebugsteps=20 ad umtraiepochs=4 ca output a chat model, which takes about 20 miutes.

    For the Yi-34B base model, it takes a relatively log time for iitializatio. Please be patiet.

    Evaluatio

    cd fietue/scripts
    
    bash ru_eval.sh
    

    The you'll see the aswer from both the base model ad the fietued model.

[ Back to top ⬆️ ]

Quatizatio

GPT-Q

pytho quatizatio/gptq/quat_autogptq.py \
  --model /base_model                      \
  --output_dir /quatized_model            \
  --trust_remote_code

Oce fiished, you ca the evaluate the resultig model as follows:

pytho quatizatio/gptq/eval_quatized_model.py \
  --model /quatized_model                       \
  --trust_remote_code

For a more detailed explaatio, see the explaatios below. ⬇️

    GPT-Q quatizatio

    GPT-Q is a PTQ(Post-Traiig Quatizatio) method. It's memory savig ad provides potetial speedups while retaiig the accuracy of the model.

    Yi models ca be GPT-Q quatized without a lot of efforts. We provide a step-by-step tutorial below.

    To ru GPT-Q, we will use AutoGPTQ ad exllama. Ad the huggigface trasformers has itegrated optimum ad auto-gptq to perform GPTQ quatizatio o laguage models.

    Do Quatizatio

    The quat_autogptq.py script is provided for you to perform GPT-Q quatizatio:

    pytho quat_autogptq.py --model /base_model \
        --output_dir /quatized_model --bits 4 --group_size 128 --trust_remote_code
    
    Ru Quatized Model

    You ca ru a quatized model usig the eval_quatized_model.py:

    pytho eval_quatized_model.py --model /quatized_model --trust_remote_code
    

AWQ

pytho quatizatio/awq/quat_autoawq.py \
  --model /base_model                      \
  --output_dir /quatized_model            \
  --trust_remote_code

Oce fiished, you ca the evaluate the resultig model as follows:

pytho quatizatio/awq/eval_quatized_model.py \
  --model /quatized_model                       \
  --trust_remote_code

For detailed explaatios, see the explaatios below. ⬇️

    AWQ quatizatio

    AWQ is a PTQ(Post-Traiig Quatizatio) method. It's a efficiet ad accurate low-bit weight quatizatio (INT3/4) for LLMs.

    Yi models ca be AWQ quatized without a lot of efforts. We provide a step-by-step tutorial below.

    To ru AWQ, we will use AutoAWQ.

    Do Quatizatio

    The quat_autoawq.py script is provided for you to perform AWQ quatizatio:

    pytho quat_autoawq.py --model /base_model \
        --output_dir /quatized_model --bits 4 --group_size 128 --trust_remote_code
    
    Ru Quatized Model

    You ca ru a quatized model usig the eval_quatized_model.py:

    pytho eval_quatized_model.py --model /quatized_model --trust_remote_code
    

[ Back to top ⬆️ ]

Deploymet

If you wat to deploy Yi models, make sure you meet the software ad hardware requiremets.

Software requiremets

Before usig Yi quatized models, make sure you've istalled the correct software listed below.

Model Software
Yi 4-bit quatized models AWQ ad CUDA
Yi 8-bit quatized models GPTQ ad CUDA

Hardware requiremets

Before deployig Yi i your eviromet, make sure your hardware meets the followig requiremets.

Chat models
Model Miimum VRAM Recommeded GPU Example
Yi-6B-Chat 15 GB 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB)
Yi-6B-Chat-4bits 4 GB 1 x RTX 3060 (12 GB)
1 x RTX 4060 (8 GB)
Yi-6B-Chat-8bits 8 GB 1 x RTX 3070 (8 GB)
1 x RTX 4060 (8 GB)
Yi-34B-Chat 72 GB 4 x RTX 4090 (24 GB)
1 x A800 (80GB)
Yi-34B-Chat-4bits 20 GB 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB)
1 x A100 (40 GB)
Yi-34B-Chat-8bits 38 GB 2 x RTX 3090 (24 GB)
2 x RTX 4090 (24 GB)
1 x A800 (40 GB)

Below are detailed miimum VRAM requiremets uder differet batch use cases.

Model batch=1 batch=4 batch=16 batch=32
Yi-6B-Chat 12 GB 13 GB 15 GB 18 GB
Yi-6B-Chat-4bits 4 GB 5 GB 7 GB 10 GB
Yi-6B-Chat-8bits 7 GB 8 GB 10 GB 14 GB
Yi-34B-Chat 65 GB 68 GB 76 GB > 80 GB
Yi-34B-Chat-4bits 19 GB 20 GB 30 GB 40 GB
Yi-34B-Chat-8bits 35 GB 37 GB 46 GB 58 GB
Base models
Model Miimum VRAM Recommeded GPU Example
Yi-6B 15 GB 1 x RTX 3090 (24 GB)
1 x RTX 4090 (24 GB)
1 x A10 (24 GB)
1 x A30 (24 GB)
Yi-6B-200K 50 GB 1 x A800 (80 GB)
Yi-9B 20 GB 1 x RTX 4090 (24 GB)
Yi-34B 72 GB 4 x RTX 4090 (24 GB)
1 x A800 (80 GB)
Yi-34B-200K 200 GB 4 x A800 (80 GB)

[ Back to top ⬆️ ]

Learig hub

If you wat to lear Yi, you ca fid a wealth of helpful educatioal resources here. ⬇️

Welcome to the Yi learig hub!

Whether you're a seasoed developer or a ewcomer, you ca fid a wealth of helpful educatioal resources to ehace your uderstadig ad skills with Yi models, icludig isightful blog posts, comprehesive video tutorials, hads-o guides, ad more.

The cotet you fid here has bee geerously cotributed by kowledgeable Yi experts ad passioate ethusiasts. We exted our heartfelt gratitude for your ivaluable cotributios!

At the same time, we also warmly ivite you to joi our collaborative effort by cotributig to Yi. If you have already made cotributios to Yi, please do't hesitate to showcase your remarkable work i the table below.

With all these resources at your figertips, you're ready to start your excitig jourey with Yi. Happy learig! ?

Tutorials

Eglish tutorials
Type Deliverable Date Author
Video Ru dolphi-2.2-yi-34b o IoT Devices 2023-11-30 Secod State
Blog Ruig Yi-34B-Chat locally usig LlamaEdge 2023-11-30 Secod State
Video Istall Yi 34B Locally - Chiese Eglish Biligual LLM 2023-11-05 Fahd Mirza
Video Dolphi Yi 34b - Brad New Foudatioal Model TESTED 2023-11-27 Matthew Berma
Chiese tutorials
Type Deliverable Date Author
Blog 实测零一万物Yi-VL多模态语言模型:能准确“识图吃瓜” 2024-02-02 苏洋
Blog 本地运行零一万物 34B 大模型,使用 Llama.cpp & 21G 显存 2023-11-26 苏洋
Blog 零一万物模型折腾笔记:官方 Yi-34B 模型基础使用 2023-12-10 苏洋
Blog CPU 混合推理,非常见大模型量化方案:“二三五六” 位量化方案 2023-12-12 苏洋
Blog 单卡 3 小时训练 Yi-6B 大模型 Aget:基于 Llama Factory 实战 2024-01-22 郑耀威
Blog 零一万物开源Yi-VL多模态大模型,魔搭社区推理&微调最佳实践来啦! 2024-01-26 ModelScope
Video 只需 24G 显存,用 vllm 跑起来 Yi-34B 中英双语大模型 2023-12-28 漆妮妮
Video Yi-VL-34B 多模态大模型 - 用两张 A40 显卡跑起来 2023-01-28 漆妮妮

Why Yi?

Ecosystem

Yi has a comprehesive ecosystem, offerig a rage of tools, services, ad models to erich your experieces ad maximize productivity.

Upstream

The Yi series models follow the same model architecture as Llama. By choosig Yi, you ca leverage existig tools, libraries, ad resources withi the Llama ecosystem, elimiatig the eed to create ew tools ad ehacig developmet efficiecy.

For example, the Yi series models are saved i the format of the Llama model. You ca directly use LlamaForCausalLM ad LlamaTokeizer to load the model. For more iformatio, see Use the chat model.

from trasformers import AutoModelForCausalLM, AutoTokeizer

tokeizer = AutoTokeizer.from_pretraied("01-ai/Yi-34b", use_fast=False)

model = AutoModelForCausalLM.from_pretraied("01-ai/Yi-34b", device_map="auto")

[ Back to top ⬆️ ]

Dowstream

? Tip

  • Feel free to create a PR ad share the fatastic work you've built usig the Yi series models.

  • To help others quickly uderstad your work, it is recommeded to use the format of <model-ame>: <model-itro> + <model-highlights>.

Servig

If you wat to get up with Yi i a few miutes, you ca use the followig services built upo Yi.

  • Yi-34B-Chat: you ca chat with Yi usig oe of the followig platforms:

  • Yi-34B-Chat | Huggig Face

  • Yi-34B-Chat | Yi Platform: Note that curretly it's available through a whitelist. Welcome to apply (fill out a form i Eglish or Chiese) ad experiece it firsthad!

  • Yi-6B-Chat (Replicate): you ca use this model with more optios by settig additioal parameters ad callig APIs.

  • ScaleLLM: you ca use this service to ru Yi models locally with added flexibility ad customizatio.

Quatizatio

If you have limited computatioal capabilities, you ca use Yi's quatized models as follows.

These quatized models have reduced precisio but offer icreased efficiecy, such as faster iferece speed ad smaller RAM usage.

Fie-tuig

If you're seekig to explore the diverse capabilities withi Yi's thrivig family, you ca delve ito Yi's fie-tued models as below.

API

  • amazig-opeai-api: this tool coverts Yi model APIs ito the OpeAI API format out of the box.
  • LlamaEdge: this tool builds a OpeAI-compatible API server for Yi-34B-Chat usig a portable Wasm (WebAssembly) file, powered by Rust.

[ Back to top ⬆️ ]

Bechmarks

Chat model performace

Yi-34B-Chat model demostrates exceptioal performace, rakig first amog all existig ope-source models i the bechmarks icludig MMLU, CMMLU, BBH, GSM8k, ad more.

Chat model performace

Evaluatio methods ad challeges. ⬇️

  • Evaluatio methods: we evaluated various bechmarks usig both zero-shot ad few-shot methods, except for TruthfulQA.
  • Zero-shot vs. few-shot: i chat models, the zero-shot approach is more commoly employed.
  • Evaluatio strategy: our evaluatio strategy ivolves geeratig resposes while followig istructios explicitly or implicitly (such as usig few-shot examples). We the isolate relevat aswers from the geerated text.
  • Challeges faced: some models are ot well-suited to produce output i the specific format required by istructios i few datasets, which leads to suboptimal results.

*: C-Eval results are evaluated o the validatio datasets

Base model performace

Yi-34B ad Yi-34B-200K

The Yi-34B ad Yi-34B-200K models stad out as the top performers amog ope-source models, especially excellig i MMLU, CMMLU, commo-sese reasoig, readig comprehesio, ad more.

Base model performace

Evaluatio methods. ⬇️

  • Disparity i results: while bechmarkig ope-source models, a disparity has bee oted betwee results from our pipelie ad those reported by public sources like OpeCompass.
  • Ivestigatio fidigs: a deeper ivestigatio reveals that variatios i prompts, post-processig strategies, ad samplig techiques across models may lead to sigificat outcome differeces.
  • Uiform bechmarkig process: our methodology aligs with the origial bechmarks—cosistet prompts ad post-processig strategies are used, ad greedy decodig is applied durig evaluatios without ay post-processig for the geerated cotet.
  • Efforts to retrieve ureported scores: for scores that were ot reported by the origial authors (icludig scores reported with differet settigs), we try to get results with our pipelie.
  • Extesive model evaluatio: to evaluate the model’s capability extesively, we adopted the methodology outlied i Llama2. Specifically, we icluded PIQA, SIQA, HellaSwag, WioGrade, ARC, OBQA, ad CSQA to assess commo sese reasoig. SquAD, QuAC, ad BoolQ were icorporated to evaluate readig comprehesio.
  • Special cofiguratios: CSQA was exclusively tested usig a 7-shot setup, while all other tests were coducted with a 0-shot cofiguratio. Additioally, we itroduced GSM8K (8-shot@1), MATH (4-shot@1), HumaEval (0-shot@1), ad MBPP (3-shot@1) uder the category "Math & Code".
  • Falco-180B caveat: Falco-180B was ot tested o QuAC ad OBQA due to techical costraits. Its performace score is a average from other tasks, ad cosiderig the geerally lower scores of these two tasks, Falco-180B's capabilities are likely ot uderestimated.

Yi-9B

Yi-9B is almost the best amog a rage of similar-sized ope-source models (icludig Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 ad more), particularly excellig i code, math, commo-sese reasoig, ad readig comprehesio.

Yi-9B bechmark - details

  • I terms of overall ability (Mea-All), Yi-9B performs the best amog similarly sized ope-source models, surpassig DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, ad Gemma-7B.

Yi-9B bechmark - overall

  • I terms of codig ability (Mea-Code), Yi-9B's performace is secod oly to DeepSeek-Coder-7B, surpassig Yi-34B, SOLAR-10.7B, Mistral-7B, ad Gemma-7B.

Yi-9B bechmark - code

  • I terms of math ability (Mea-Math), Yi-9B's performace is secod oly to DeepSeek-Math-7B, surpassig SOLAR-10.7B, Mistral-7B, ad Gemma-7B.

Yi-9B bechmark - math

  • I terms of commo sese ad reasoig ability (Mea-Text), Yi-9B's performace is o par with Mistral-7B, SOLAR-10.7B, ad Gemma-7B.

Yi-9B bechmark - text

[ Back to top ⬆️ ]

Who ca use Yi?

Everyoe! ? ✅

[ Back to top ⬆️ ]

Misc.

Ackowledgmets

A heartfelt thak you to each of you who have made cotributios to the Yi commuity! You have helped Yi ot just a project, but a vibrat, growig home for iovatio.

[ Back to top ⬆️ ]

Disclaimer

We use data compliace checkig algorithms durig the traiig process, to esure the compliace of the traied model to the best of our ability. Due to complex data ad the diversity of laguage model usage scearios, we caot guaratee that the model will geerate correct, ad reasoable output i all scearios. Please be aware that there is still a risk of the model producig problematic outputs. We will ot be resposible for ay risks ad issues resultig from misuse, misguidace, illegal usage, ad related misiformatio, as well as ay associated data security cocers.

[ Back to top ⬆️ ]

Licese

The source code i this repo is licesed uder the Apache 2.0 licese. The Yi series models are fully ope for academic research ad free for commercial use, with automatic permissio grated upo applicatio. All usage must adhere to the Yi Series Models Commuity Licese Agreemet 2.1. For free commercial use, you oly eed to sed a email to get official commercial permissio.

[ Back to top ⬆️ ]

功能介绍

Building the Next Generation of Open-Source and Bilingual LLMs ? Hugging Face • ? ModelScope • ✡️

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论