chatglm-6B-GGML

我要开发同款
匿名用户2024年07月31日
33阅读
所属分类aipytorch
开源地址https://modelscope.cn/models/Xorbits/chatglm-6B-GGML
授权协议apache-2.0

作品详情

THUDM's chatglm 6B GGML

These files are GGML format model files for THUDM's chatglm 6B.

GGML files are for CPU + GPU inference using chatglm.cpp and Xorbits Inference.

Prompt template

NOTE: prompt template is not available yet since the system prompt is hard coded in chatglm.cpp for now.

Provided files

Name Quant method Bits Size
chatglm-ggml-q4_0.bin q4_0 4 3.5 GB
chatglm-ggml-q4_1.bin q4_1 4 3.9 GB
chatglm-ggml-q5_0.bin q5_0 5 4.3 GB
chatglm-ggml-q5_1.bin q5_1 5 4.6 GB
chatglm-ggml-q5_1.bin q8_0 8 6.6 GB

How to run in Xorbits Inference

Install

Xinference can be installed via pip from PyPI. It is highly recommended to create a new virtual environment to avoid conflicts.

$ pip install "xinference[all]"
$ pip install chatglm-cpp

Start Xorbits Inference

To start a local instance of Xinference, run the following command:

$ xinference

Once Xinference is running, an endpoint will be accessible for model management via CLI or Xinference client. The default endpoint is http://localhost:9997. You can also view a web UI using the Xinference endpoint to chat with all the builtin models. You can even chat with two cutting-edge AI models side-by-side to compare their performance!

Xinference web UI

Slack

For further support, and discussions on these models and AI in general, join our slack channel!

Original model card: THUDM's chatglm 6B

ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. With the quantization technique, users can deploy locally on consumer-grade graphics cards (only 6GB of GPU memory is required at the INT4 quantization level). ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese QA and dialogue. The model is trained for about 1T tokens of Chinese and English corpus, supplemented by supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback. With only about 6.2 billion parameters, the model is able to generate answers that are in line with human preference.

For more instructions, including how to run CLI and web demos, and model quantization, please refer to our Github Repo.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论