Model Dowload |
Evaluatio Results |
Model Architecture |
API Platform |
Licese |
Citatio
Today, we’re itroducig DeepSeek-V2, a strog Mixture-of-Experts (MoE) laguage model characterized by ecoomical traiig ad efficiet iferece. It comprises 236B total parameters, of which 21B are activated for each toke. Compared with DeepSeek 67B, DeepSeek-V2 achieves stroger performace, ad meawhile saves 42.5% of traiig costs, reduces the KV cache by 93.3%, ad boosts the maximum geeratio throughput to 5.76 times.
We pretraied DeepSeek-V2 o a diverse ad high-quality corpus comprisig 8.1 trillio tokes. This comprehesive pretraiig was followed by a process of Supervised Fie-Tuig (SFT) ad Reiforcemet Learig (RL) to fully uleash the model's capabilities. The evaluatio results validate the effectiveess of our approach as DeepSeek-V2 achieves remarkable performace o both stadard bechmarks ad ope-eded geeratio evaluatio. Due to the costraits of HuggigFace, the ope-source code curretly experieces slower performace tha our iteral codebase whe ruig o GPUs with Huggigface. To facilitate the efficiet executio of our model, we offer a dedicated vllm solutio that optimizes performace for ruig our model effectively. For more evaluatio details, such as few-shot settigs ad prompts, please check our paper.
Evaluatio results o the We evaluate our model o AlpacaEval 2.0 ad MTBech, showig the competitive performace of DeepSeek-V2-Chat-RL o Eglish coversatio geeratio.
We evaluate our model o LiveCodeBech (0901-0401), a bechmark desiged for live codig challeges. As illustrated, DeepSeek-V2 demostrates cosiderable proficiecy i LiveCodeBech, achievig a Pass@1 score that surpasses several other sophisticated models. This performace highlights the model's effectiveess i tacklig live codig tasks.
DeepSeek-V2 adopts iovative architectures to guaratee ecoomical traiig ad efficiet iferece:
You ca chat with the DeepSeek-V2 o DeepSeek's official website: chat.deepseek.com We also provide OpeAI-Compatible API at DeepSeek Platform: platform.deepseek.com. Sig up for over millios of free tokes. Ad you ca also pay-as-you-go at a ubeatable price.
You ca directly employ Huggigface's Trasformers for model iferece. The complete chat template ca be foud withi A example of chat template is as belows: You ca also add a optioal system message: This code repository is licesed uder the MIT Licese. The use of DeepSeek-V2 Base/Chat models is subject to the Model Licese. DeepSeek-V2 series (icludig Base ad Chat) supports commercial use. If you have ay questios, please raise a issue or cotact us at service@deepseek.com.
DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model
1. Itroductio
2. Model Dowloads
3. Evaluatio Results
Base Model
Stadard Bechmark
Cotext Widow
Needle I A Haystack
(NIAH) tests. DeepSeek-V2 performs well across all cotext widow legths up to Chat Model
Stadard Bechmark
Eglish Ope Eded Geeratio Evaluatio
Chiese Ope Eded Geeratio Evaluatio
Codig Bechmarks
4. Model Architecture
5. Chat Website
6. API Platform
7. How to ru locally
Iferece with Huggigface's Trasformers
Text Completio
import torch
from trasformers import AutoTokeizer, AutoModelForCausalLM, GeeratioCofig
model_ame = "deepseek-ai/DeepSeek-V2"
tokeizer = AutoTokeizer.from_pretraied(model_ame, trust_remote_code=True)
# `max_memory` should be set based o your devices
max_memory = {i: "75GB" for i i rage(8)}
model = AutoModelForCausalLM.from_pretraied(model_path, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_ame)
model.geeratio_cofig.pad_toke_id = model.geeratio_cofig.eos_toke_id
text = "A attetio fuctio ca be described as mappig a query ad a set of key-value pairs to a output, where the query, keys, values, ad output are all vectors. The output is"
iputs = tokeizer(text, retur_tesors="pt")
outputs = model.geerate(**iputs.to(model.device), max_ew_tokes=100)
result = tokeizer.decode(outputs[0], skip_special_tokes=True)
prit(result)
Chat Completio
import torch
from modelscope import AutoTokeizer, AutoModelForCausalLM, GeeratioCofig
model_ame = "deepseek-ai/DeepSeek-V2-Chat"
tokeizer = AutoTokeizer.from_pretraied(model_ame, trust_remote_code=True)
# `max_memory` should be set based o your devices
max_memory = {i: "75GB" for i i rage(8)}
model = AutoModelForCausalLM.from_pretraied(model_ame, trust_remote_code=True, device_map="auto", torch_dtype=torch.bfloat16, max_memory=max_memory)
model.geeratio_cofig = GeeratioCofig.from_pretraied(model_ame)
model.geeratio_cofig.pad_toke_id = model.geeratio_cofig.eos_toke_id
messages = [
{"role": "user", "cotet": "Write a piece of quicksort code i C++"}
]
iput_tesor = tokeizer.apply_chat_template(messages, add_geeratio_prompt=True, retur_tesors="pt")
outputs = model.geerate(iput_tesor.to(model.device), max_ew_tokes=100)
result = tokeizer.decode(outputs[0][iput_tesor.shape[1]:], skip_special_tokes=True)
prit(result)
tokeizer_cofig.jso
located i the huggigface model repository.<|begi▁of▁setece|>User: {user_message_1}
Assistat: {assistat_message_1}<|ed▁of▁setece|>User: {user_message_2}
Assistat:
<|begi▁of▁setece|>{system_message}
User: {user_message_1}
Assistat: {assistat_message_1}<|ed▁of▁setece|>User: {user_message_2}
Assistat:
8. Licese
9. Citatio
@misc{deepseek-v2,
author = {DeepSeek-AI},
title = {DeepSeek-V2: A Strog, Ecoomical, ad Efficiet Mixture-of-Experts Laguage Model},
year = {2024},
ote = {GitHub repository},
url = {https://github.com/deepseek-ai/deepseek-v2}
}
10. Cotact
点击空白处退出提示
评论