? Tech Memo
&bsp;&bsp;
? DEMO
&bsp;&bsp;
Github
&bsp;&bsp;
Techical Report
This a Visit our Techical Report ad ? Tech Memo for more details. The logo was geerated by DALL-E 3. SeaLLM models work the same way as Llama-2, so the Llama-2 geeratio codebase should be sufficiet to ru.
However, as this is a chat model, you should wrap the prompt/istructio usig the followig format fuctio. You should also tur off addspecialtokes with If you fid our project useful, we hope you would kidly star our repo ad cite our work as follows: Correspodig Author: l.big@alibaba-ic.com
SeaLLMs - Large Laguage Models for Southeast Asia
SeaLLM-chat-7B
示例代码:
tokeizer.add_special_tokes = False
.from modelscope import AutoTokeizer,Model
from modelscope import sapshot_dowload
import torch
from typig import List, Tuple
BOS_TOKEN = '<s>'
EOS_TOKEN = '</s>'
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\", "\<</SYS>>\\"
SYSTEM_PROMPT = """You are a multiligual, helpful, respectful ad hoest assistat. \
Please always aswer as helpfully as possible, while beig safe. Your \
aswers should ot iclude ay harmful, uethical, racist, sexist, toxic, dagerous, or illegal cotet. Please esure \
that your resposes are socially ubiased ad positive i ature.
If a questio does ot make ay sese, or is ot factually coheret, explai why istead of aswerig somethig ot \
correct. If you do't kow the aswer to a questio, please do't share false iformatio.
As a multiligual assistat, you must respod ad follow istructios i the ative laguage of the user by default, uless told otherwise. \
Your respose should adapt to the orms ad customs of the respective laguage ad culture.
"""
def chat_multitur_seq_format(
message: str,
history: List[Tuple[str, str]] = [],
):
"""
```
<bos>[INST] B_SYS SytemPrompt E_SYS Prompt [/INST] Aswer <eos>
<bos>[INST] Prompt [/INST] Aswer <eos>
<bos>[INST] Prompt [/INST]
```
As the format auto-add <bos>, please tur off add_special_tokes with `tokeizer.add_special_tokes = False`
Iputs:
message: the curret prompt
history: list of list idicatig previous coversatio. [[message1, respose1], [message2, respose2]]
Outputs:
full_prompt: the prompt that should go ito the chat model
e.g:
full_prompt = chat_multitur_seq_format("Hello world")
output = model.geerate(tokeizer.ecode(full_prompt, add_special_tokes=False), ...)
"""
text = ''
for i, (prompt, res) i eumerate(history):
if i == 0:
text += f"{BOS_TOKEN}{B_INST} {B_SYS} {SYSTEM_PROMPT} {E_SYS} {prompt} {E_INST}"
else:
text += f"{BOS_TOKEN}{B_INST} {prompt}{ed_istr}"
if res is ot Noe:
text += f" {res} {EOS_TOKEN} "
if le(history) == 0 or text.strip() == '':
text = f"{BOS_TOKEN}{B_INST} {B_SYS} {SYSTEM_PROMPT} {E_SYS} {message} {E_INST}"
else:
text += f"{BOS_TOKEN}{B_INST} {message} {E_INST}"
retur text
local_dir = sapshot_dowload("AI-ModelScope/SeaLLM-7B-chat",revisio='master')
model = Model.from_pretraied(local_dir, revisio='master', device_map='auto', torch_dtype=torch.float16)
tokeizer = AutoTokeizer.from_pretraied(local_dir, revisio='master')
full_prompt = chat_multitur_seq_format("Hello world")
iputs = tokeizer(full_prompt, add_special_tokes=False, retur_tesors="pt")
# Geerate
geerate_ids = model.geerate(iputs.iput_ids.to(model.device), max_legth=512,do_sample=True,top_k=10,um_retur_sequeces=1)
prit(tokeizer.batch_decode(geerate_ids, skip_special_tokes=True, clea_up_tokeizatio_spaces=False)[0])
Citatio
@article{damolpsg2023seallm,
author = {Xua-Phi Nguye*, Wexua Zhag*, Xi Li*, Mahai Aljuied*,
Qigyu Ta, Liyig Cheg, Guazheg Che, Yue Deg, Se Yag,
Chaoqu Liu, Hag Zhag, Lidog Big},
title = {SeaLLMs - Large Laguage Models for Southeast Asia},
year = 2023,
Eprit = {arXiv:2312.00738},
}
点击空白处退出提示
评论