SeaLLM-7B-chat

我要开发同款
匿名用户2024年07月31日
85阅读

技术信息

开源地址
https://modelscope.cn/models/AI-ModelScope/SeaLLM-7B-chat
授权协议
other

作品详情

SeaLLMs - Large Laguage Models for Southeast Asia

? Tech Memo &bsp;&bsp; ? DEMO &bsp;&bsp; Github &bsp;&bsp; Techical Report

SeaLLM-chat-7B

This a 7B Chat versio of SeaLLMs. It Vietamese ??, Idoesia ??, Thai ??, Malay ??, Khmer ??, Lao ??, Tagalog ?? ad Burmese ??. It may have lower capability tha the 13B models but it is much more memory-efficiet ad faster.

Visit our Techical Report ad ? Tech Memo for more details.

Terms of Use ad Licese: By usig our released weights, codes, ad demos, you agree to ad comply with the terms ad coditios specified i our SeaLLMs Terms Of Use.

Disclaimer: We must ote that eve though the weights, codes, ad demos are released i a ope maer, similar to other pre-traied laguage models, ad despite our best efforts i red teamig ad safety fie-tuig ad eforcemet, our models come with potetial risks, icludig but ot limited to iaccurate, misleadig or potetially harmful geeratio. Developers ad stakeholders should perform their ow red teamig ad provide related security measures before deploymet, ad they must abide by ad comply with local goverace ad regulatios. I o evet shall the authors be held liable for ay claim, damages, or other liability arisig from the use of the released weights, codes, or demos.

The logo was geerated by DALL-E 3.

示例代码:

SeaLLM models work the same way as Llama-2, so the Llama-2 geeratio codebase should be sufficiet to ru. However, as this is a chat model, you should wrap the prompt/istructio usig the followig format fuctio.

You should also tur off addspecialtokes with tokeizer.add_special_tokes = False.

from modelscope import AutoTokeizer,Model
from modelscope import sapshot_dowload
import torch 
from typig import List, Tuple

BOS_TOKEN = '<s>'
EOS_TOKEN = '</s>'

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\", "\<</SYS>>\\"

SYSTEM_PROMPT = """You are a multiligual, helpful, respectful ad hoest assistat. \
Please always aswer as helpfully as possible, while beig safe. Your \
aswers should ot iclude ay harmful, uethical, racist, sexist, toxic, dagerous, or illegal cotet. Please esure \
that your resposes are socially ubiased ad positive i ature.

If a questio does ot make ay sese, or is ot factually coheret, explai why istead of aswerig somethig ot \
correct. If you do't kow the aswer to a questio, please do't share false iformatio.

As a multiligual assistat, you must respod ad follow istructios i the ative laguage of the user by default, uless told otherwise. \
Your respose should adapt to the orms ad customs of the respective laguage ad culture.
"""

def chat_multitur_seq_format(
    message: str,
    history: List[Tuple[str, str]] = [], 
):
    """
    ```
        <bos>[INST] B_SYS SytemPrompt E_SYS Prompt [/INST] Aswer <eos>
        <bos>[INST] Prompt [/INST] Aswer <eos>
        <bos>[INST] Prompt [/INST]
    ```
    As the format auto-add <bos>, please tur off add_special_tokes with `tokeizer.add_special_tokes = False`
    Iputs:
      message: the curret prompt
      history: list of list idicatig previous coversatio. [[message1, respose1], [message2, respose2]]
    Outputs:
      full_prompt: the prompt that should go ito the chat model

    e.g:
      full_prompt = chat_multitur_seq_format("Hello world")
      output = model.geerate(tokeizer.ecode(full_prompt, add_special_tokes=False), ...)
    """
    text = ''
    for i, (prompt, res) i eumerate(history):
        if i == 0:
            text += f"{BOS_TOKEN}{B_INST} {B_SYS} {SYSTEM_PROMPT} {E_SYS} {prompt} {E_INST}"
        else:
            text += f"{BOS_TOKEN}{B_INST} {prompt}{ed_istr}"
        if res is ot Noe:
            text += f" {res} {EOS_TOKEN} "
    if le(history) == 0 or text.strip() == '':
        text = f"{BOS_TOKEN}{B_INST} {B_SYS} {SYSTEM_PROMPT} {E_SYS} {message} {E_INST}"
    else:
        text += f"{BOS_TOKEN}{B_INST} {message} {E_INST}"
    retur text

local_dir = sapshot_dowload("AI-ModelScope/SeaLLM-7B-chat",revisio='master')

model = Model.from_pretraied(local_dir, revisio='master', device_map='auto', torch_dtype=torch.float16)
tokeizer = AutoTokeizer.from_pretraied(local_dir, revisio='master')

full_prompt = chat_multitur_seq_format("Hello world")
iputs = tokeizer(full_prompt, add_special_tokes=False, retur_tesors="pt")
# Geerate
geerate_ids = model.geerate(iputs.iput_ids.to(model.device),  max_legth=512,do_sample=True,top_k=10,um_retur_sequeces=1)
prit(tokeizer.batch_decode(geerate_ids, skip_special_tokes=True, clea_up_tokeizatio_spaces=False)[0])

Citatio

If you fid our project useful, we hope you would kidly star our repo ad cite our work as follows: Correspodig Author: l.big@alibaba-ic.com

@article{damolpsg2023seallm,
  author = {Xua-Phi Nguye*, Wexua Zhag*, Xi Li*, Mahai Aljuied*,
            Qigyu Ta, Liyig Cheg, Guazheg Che, Yue Deg, Se Yag,
            Chaoqu Liu, Hag Zhag, Lidog Big},
  title = {SeaLLMs - Large Laguage Models for Southeast Asia},
  year = 2023,
  Eprit = {arXiv:2312.00738},
}

功能介绍

SeaLLMs - Large Language Models for Southeast Asia ? Tech Memo    ? DEMO    Github    Technical R

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论