This is the smallest quantization level that enables the falcon 180B chat model to fit into 44.49GB.
Now you are able to run it on 48GB GPUs.
Clone with HTTP
git clone https://www.modelscope.cn/whatever1983/falcon-180b-chat.IQ2_XXS.gguf-44.49GB.git
评论