Usig llama.cpp pull request 7225 for quatizatio. Origial model: https://huggigface.co/microsoft/Phi-3-medium-128k-istruct All quats made usig imatrix optio with dataset from here First, make sure you have huggiface-cli istalled: The, you ca target the specific file you wat: If the model is bigger tha 50GB, it will have bee split ito multiple files. I order to dowload them all to a local folder, ru: You ca either specify a ew local-dir (Phi-3-medium-128k-istruct-Q8_0) or dowload them all i place (./) A great write up with charts showig various performaces is provided by Artefact2 here The first thig to figure out is how big a model you ca ru. To do this, you'll eed to figure out how much RAM ad/or VRAM you have. If you wat your model ruig as FAST as possible, you'll wat to fit the whole thig o your GPU's VRAM. Aim for a quat with a file size 1-2GB smaller tha your GPU's total VRAM. If you wat the absolute maximum quality, add both your system RAM ad your GPU's VRAM together, the similarly grab a quat with a file size 1-2GB Smaller tha that total. Next, you'll eed to decide if you wat to use a 'I-quat' or a 'K-quat'. If you do't wat to thik too much, grab oe of the K-quats. These are i format 'QXKX', like Q5KM. If you wat to get more ito the weeds, you ca check out this extremely useful feature chart: But basically, if you're aimig for below Q4, ad you're ruig cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quats. These are i format IQXX, like IQ3M. These are ewer ad offer better performace for their size. These I-quats ca also be used o CPU ad Apple Metal, but will be slower tha their K-quat equivalet, so speed vs performace is a tradeoff you'll have to decide. The I-quats are ot compatible with Vulca, which is also AMD, so if you have a AMD card double check if you're usig the rocBLAS build or the Vulca build. At the time of writig this, LM Studio has a preview with ROCm support, ad other iferece egies have specific builds for ROCm. Wat to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowskiLlamacpp imatrix Quatizatios of Phi-3-medium-128k-istruct
Prompt format
<|user|> {prompt}<|ed|><|assistat|><|ed|>
Dowload a file (ot the whole brach) from below:
Fileame
Quat type
File Size
Descriptio
Phi-3-medium-128k-istruct-Q8_0.gguf
Q8_0
14.83GB
Extremely high quality, geerally ueeded but max available quat.
Phi-3-medium-128k-istruct-Q6_K.gguf
Q6_K
11.45GB
Very high quality, ear perfect, recommeded.
Phi-3-medium-128k-istruct-Q5KM.gguf
Q5KM
10.07GB
High quality, recommeded.
Phi-3-medium-128k-istruct-Q5KS.gguf
Q5KS
9.62GB
High quality, recommeded.
Phi-3-medium-128k-istruct-Q4KM.gguf
Q4KM
8.56GB
Good quality, uses about 4.83 bits per weight, recommeded.
Phi-3-medium-128k-istruct-Q4KS.gguf
Q4KS
7.95GB
Slightly lower quality with more space savigs, recommeded.
Phi-3-medium-128k-istruct-IQ4_NL.gguf
IQ4_NL
7.89GB
Decet quality, slightly smaller tha Q4KS with similar performace recommeded.
Phi-3-medium-128k-istruct-IQ4_XS.gguf
IQ4_XS
7.46GB
Decet quality, smaller tha Q4KS with similar performace, recommeded.
Phi-3-medium-128k-istruct-Q3KL.gguf
Q3KL
7.49GB
Lower quality but usable, good for low RAM availability.
Phi-3-medium-128k-istruct-Q3KM.gguf
Q3KM
6.92GB
Eve lower quality.
Phi-3-medium-128k-istruct-IQ3_M.gguf
IQ3_M
6.47GB
Medium-low quality, ew method with decet performace comparable to Q3KM.
Phi-3-medium-128k-istruct-IQ3_S.gguf
IQ3_S
6.06GB
Lower quality, ew method with decet performace, recommeded over Q3KS quat, same size with better performace.
Phi-3-medium-128k-istruct-Q3KS.gguf
Q3KS
6.06GB
Low quality, ot recommeded.
Phi-3-medium-128k-istruct-IQ3_XS.gguf
IQ3_XS
5.80GB
Lower quality, ew method with decet performace, slightly better tha Q3KS.
Phi-3-medium-128k-istruct-IQ3_XXS.gguf
IQ3_XXS
5.45GB
Lower quality, ew method with decet performace, comparable to Q3 quats.
Phi-3-medium-128k-istruct-Q2_K.gguf
Q2_K
5.14GB
Very low quality but surprisigly usable.
Phi-3-medium-128k-istruct-IQ2_M.gguf
IQ2_M
4.71GB
Very low quality, uses SOTA techiques to also be surprisigly usable.
Phi-3-medium-128k-istruct-IQ2_S.gguf
IQ2_S
4.33GB
Very low quality, uses SOTA techiques to be usable.
Phi-3-medium-128k-istruct-IQ2_XS.gguf
IQ2_XS
4.12GB
Very low quality, uses SOTA techiques to be usable.
Phi-3-medium-128k-istruct-IQ2_XXS.gguf
IQ2_XXS
3.71GB
Lower quality, uses SOTA techiques to be usable.
Phi-3-medium-128k-istruct-IQ1_M.gguf
IQ1_M
3.24GB
Extremely low quality, ot recommeded.
Phi-3-medium-128k-istruct-IQ1_S.gguf
IQ1_S
2.95GB
Extremely low quality, ot recommeded.
Dowloadig usig huggigface-cli
pip istall -U "huggigface_hub[cli]"
huggigface-cli dowload bartowski/Phi-3-medium-128k-istruct-GGUF --iclude "Phi-3-medium-128k-istruct-Q4_K_M.gguf" --local-dir ./
huggigface-cli dowload bartowski/Phi-3-medium-128k-istruct-GGUF --iclude "Phi-3-medium-128k-istruct-Q8_0.gguf/*" --local-dir Phi-3-medium-128k-istruct-Q8_0
Which file should I choose?
点击空白处退出提示
评论