phi-2_开源AI项目-程序员客栈

Model Summary

Phi-2 is a Trasformer with 2.7 billio parameters. It was traied usig the same data sources as Phi-1.5, augmeted with a ew data source that cosists of various NLP sythetic texts ad filtered websites (for safety ad educatioal value). Whe assessed agaist bechmarks testig commo sese, laguage uderstadig, ad logical reasoig, Phi-2 showcased a early state-of-the-art performace amog models with less tha 13 billio parameters.

Our model has't bee fie-tued through reiforcemet learig from huma feedback. The itetio behid craftig this ope-source model is to provide the research commuity with a o-restricted small model to explore vital safety challeges, such as reducig toxicity, uderstadig societal biases, ehacig cotrollability, ad more.

Iteded Uses

Phi-2 is iteded for research purposes oly. Give the ature of the traiig data, the Phi-2 model is best suited for prompts usig the QA format, the chat format, ad the code format.

QA Format:

You ca provide the prompt as a stadaloe questio as follows:

Write a detailed aalogy betwee mathematics ad a lighthouse.

where the model geerates the text after "." . To ecourage the model to write more cocise aswers, you ca also try the followig QA format usig "Istruct: \<prompt>\Output:"

Istruct: Write a detailed aalogy betwee mathematics ad a lighthouse.
Output: Mathematics is like a lighthouse. Just as a lighthouse guides ships safely to shore, mathematics provides a guidig light i the world of umbers ad logic. It helps us avigate through complex problems ad fid solutios. Just as a lighthouse emits a steady beam of light, mathematics provides a cosistet framework for reasoig ad problem-solvig. It illumiates the path to uderstadig ad helps us make sese of the world aroud us.

where the model geerates the text after "Output:".

Chat Format:

Alice: I do't kow why, I'm strugglig to maitai focus while studyig. Ay suggestios?
Bob: Well, have you tried creatig a study schedule ad stickig to it?
Alice: Yes, I have, but it does't seem to help much.
Bob: Hmm, maybe you should try studyig i a quiet eviromet, like the library.
Alice: ...

where the model geerates the text after the first "Bob:".

Code Format:

def prit_prime():
   """
   Prit all primes betwee 1 ad 
   """
   primes = []
   for um i rage(2, +1):
       is_prime = True
       for i i rage(2, it(math.sqrt(um))+1):
           if um % i == 0:
               is_prime = False
               break
       if is_prime:
           primes.apped(um)
   prit(primes)

where the model geerates the text after the commets.

Notes:

Phi-2 is iteded for research purposes. The model-geerated text/code should be treated as a startig poit rather tha a defiitive solutio for potetial use cases. Users should be cautious whe employig these models i their applicatios.
Direct adoptio for productio tasks is out of the scope of this research project. As a result, the Phi-2 model has ot bee tested to esure that it performs adequately for ay productio-level applicatio. Please refer to the limitatio sectios of this documet for more details.
If you are usig trasformers>=4.36.0, always load the model with trust_remote_code=True to prevet side-effects.

Sample Code

There are four types of executio mode:

FP16 / Flash-Attetio / CUDA:

   model = AutoModelForCausalLM.from_pretraied("microsoft/phi-2", torch_dtype="auto", flash_att=True, flash_rotary=True, fused_dese=True, device_map="cuda", trust_remote_code=True)

FP16 / CUDA:

   model = AutoModelForCausalLM.from_pretraied("microsoft/phi-2", torch_dtype="auto", device_map="cuda", trust_remote_code=True)

FP32 / CUDA:

   model = AutoModelForCausalLM.from_pretraied("microsoft/phi-2", torch_dtype=torch.float32, device_map="cuda", trust_remote_code=True)

FP32 / CPU:

   model = AutoModelForCausalLM.from_pretraied("microsoft/phi-2", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True)

To esure the maximum compatibility, we recommed usig the secod executio mode (FP16 / CUDA), as follows:

import torch
from trasformers import AutoModelForCausalLM, AutoTokeizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretraied("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokeizer = AutoTokeizer.from_pretraied("microsoft/phi-2", trust_remote_code=True)

iputs = tokeizer('''def prit_prime():
   """
   Prit all primes betwee 1 ad 
   """''', retur_tesors="pt", retur_attetio_mask=False)

outputs = model.geerate(**iputs, max_legth=200)
text = tokeizer.batch_decode(outputs)[0]
prit(text)

Remark: I the geeratio fuctio, our model curretly does ot support beam search (um_beams > 1). Furthermore, i the forward pass of the model, we curretly do ot support outputtig hidde states or attetio values, or usig custom iput embeddigs.

Limitatios of Phi-2

Geerate Iaccurate Code ad Facts: The model may produce icorrect code sippets ad statemets. Users should treat these outputs as suggestios or startig poits, ot as defiitive or accurate solutios.
Limited Scope for code: Majority of Phi-2 traiig data is based i Pytho ad use commo packages such as "typig, math, radom, collectios, datetime, itertools". If the model geerates Pytho scripts that utilize other packages or scripts i other laguages, we strogly recommed users maually verify all API uses.
Ureliable Resposes to Istructio: The model has ot udergoe istructio fie-tuig. As a result, it may struggle or fail to adhere to itricate or uaced istructios provided by users.
Laguage Limitatios: The model is primarily desiged to uderstad stadard Eglish. Iformal Eglish, slag, or ay other laguages might pose challeges to its comprehesio, leadig to potetial misiterpretatios or errors i respose.
Potetial Societal Biases: Phi-2 is ot etirely free from societal biases despite efforts i assurig traiig data safety. There's a possibility it may geerate cotet that mirrors these societal biases, particularly if prompted or istructed to do so. We urge users to be aware of this ad to exercise cautio ad critical thikig whe iterpretig model outputs.
Toxicity: Despite beig traied with carefully selected data, the model ca still produce harmful cotet if explicitly prompted or istructed to do so. We chose to release the model for research purposes oly -- We hope to help the ope-source commuity develop the most effective ways to reduce the toxicity of a model directly after pretraiig.
Verbosity: Phi-2 beig a base model ofte produces irrelevat or extra text ad resposes followig its first aswer to user prompts withi a sigle tur. This is due to its traiig dataset beig primarily textbooks, which results i textbook-like resposes.

Traiig

Model

Architecture: a Trasformer-based model with ext-word predictio objective
Cotext legth: 2048 tokes
Dataset size: 250B tokes, combiatio of NLP sythetic data created by AOAI GPT-3.5 ad filtered web data from Falco RefiedWeb ad SlimPajama, which was assessed by AOAI GPT-4.
Traiig tokes: 1.4T tokes
GPUs: 96xA100-80G
Traiig time: 14 days

Licese

The model is licesed uder the microsoft-research-licese.

Trademarks

This project may cotai trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to ad must follow Microsoft’s Trademark & Brad Guidelies. Use of Microsoft trademarks or logos i modified versios of this project must ot cause cofusio or imply Microsoft sposorship. Ay use of third-party trademarks or logos are subject to those third-party’s policies.

phi-2

技术信息

作品详情