internlm2-math-plus-1_8b

我要开发同款
匿名用户2024年07月31日
34阅读

技术信息

官网地址
https://www.shlab.org.cn/
开源地址
https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-plus-1_8b
授权协议
other

作品详情

IterLM-Math-Plus

IterLM-Math Plus
State-of-the-art biligual ope-sourced Math reasoig LLMs. A **solver**, **prover**, **verifier**, **augmetor**. [? Github](https://github.com/IterLM/IterLM-Math) [? Demo](https://huggigface.co/spaces/iterlm/iterlm2-math-7b)

News

  • [2024.05.24] We release updated versio IterLM2-Math-Plus with 4 sizes ad state-of-the-art performaces icludig 1.8B, 7B, 20B, ad 8x22B. We improve iformal math reasoig performace (chai-of-thought ad code-itepreter) ad formal math reasoig performace (LEAN 4 traslatio ad LEAN 4 theorem provig) sigificatly.
  • [2024.02.10] We add tech reports ad citatio referece.
  • [2024.01.31] We add MiiF2F results with evaluatio codes!
  • [2024.01.29] We add checkpoits from ModelScope. Update results about majority votig ad Code Itepreter. Tech report is o the way!
  • [2024.01.26] We add checkpoits from OpeXLab, which ease Chiese users to dowload!

Performace

Formal Math Reasoig

We evaluate the performace of IterLM2-Math-Plus o formal math reasoig bechmark MiiF2F-test. The evaluatio settig is same as Llemma with LEAN 4.

Models MiiF2F-test
ReProver 26.5
LLMStep 27.9
GPT-F 36.6
HTPS 41.0
Llemma-7B 26.2
Llemma-34B 25.8
IterLM2-Math-7B-Base 30.3
IterLM2-Math-20B-Base 29.5
IterLM2-Math-Plus-1.8B 38.9
IterLM2-Math-Plus-7B 43.4
IterLM2-Math-Plus-20B 42.6
IterLM2-Math-Plus-Mixtral8x22B 37.3

Iformal Math Reasoig

We evaluate the performace of IterLM2-Math-Plus o iformal math reasoig bechmark MATH ad GSM8K. IterLM2-Math-Plus-1.8B outperforms MiiCPM-2B i the smallest size settig. IterLM2-Math-Plus-7B outperforms Deepseek-Math-7B-RL which is the state-of-the-art math reasoig ope source model. IterLM2-Math-Plus-Mixtral8x22B achieves 68.5 o MATH (with Pytho) ad 91.8 o GSM8K.

Model MATH MATH-Pytho GSM8K
MiiCPM-2B 10.2 - 53.8
IterLM2-Math-Plus-1.8B 37.0 41.5 58.8
IterLM2-Math-7B 34.6 50.9 78.1
Deepseek-Math-7B-RL 51.7 58.8 88.2
IterLM2-Math-Plus-7B 53.0 59.7 85.8
IterLM2-Math-20B 37.7 54.3 82.6
IterLM2-Math-Plus-20B 53.8 61.8 87.7
Mixtral8x22B-Istruct-v0.1 41.8 - 78.6
Eurux-8x22B-NCA 49.0 - -
IterLM2-Math-Plus-Mixtral8x22B 58.1 68.5 91.8

We also evaluate models o MathBech-A. IterLM2-Math-Plus-Mixtral8x22B has comparable performace compared to Claude 3 Opus.

Model Arithmetic Primary Middle High College Average
GPT-4o-0513 77.7 87.7 76.3 59.0 54.0 70.9
Claude 3 Opus 85.7 85.0 58.0 42.7 43.7 63.0
Qwe-Max-0428 72.3 86.3 65.0 45.0 27.3 59.2
Qwe-1.5-110B 70.3 82.3 64.0 47.3 28.0 58.4
Deepseek-V2 82.7 89.3 59.0 39.3 29.3 59.9
Llama-3-70B-Istruct 70.3 86.0 53.0 38.7 34.7 56.5
IterLM2-Math-Plus-Mixtral8x22B 77.5 82.0 63.6 50.3 36.8 62.0
IterLM2-Math-20B 58.7 70.0 43.7 24.7 12.7 42.0
IterLM2-Math-Plus-20B 65.8 79.7 59.5 47.6 24.8 55.5
Llama3-8B-Istruct 54.7 71.0 25.0 19.0 14.0 36.7
IterLM2-Math-7B 53.7 67.0 41.3 18.3 8.0 37.7
Deepseek-Math-7B-RL 68.0 83.3 44.3 33.0 23.0 50.3
IterLM2-Math-Plus-7B 61.4 78.3 52.5 40.5 21.7 50.9
MiiCPM-2B 49.3 51.7 18.0 8.7 3.7 26.3
IterLM2-Math-Plus-1.8B 43.0 43.3 25.4 18.9 4.7 27.1

Citatio ad Tech Report

@misc{yig2024iterlmmath,
      title={IterLM-Math: Ope Math Large Laguage Models Toward Verifiable Reasoig}, 
      author={Huaiyua Yig ad Shuo Zhag ad Liyag Li ad Zhejia Zhou ad Yufa Shao ad Zhaoye Fei ad Yichua Ma ad Jiawei Hog ad Kuiku Liu ad Ziyi Wag ad Yudog Wag ad Zijia Wu ad Shuaibi Li ad Fegzhe Zhou ad Hogwei Liu ad Sogyag Zhag ad Wewei Zhag ad Hag Ya ad Xipeg Qiu ad Jiayu Wag ad Kai Che ad Dahua Li},
      year={2024},
      eprit={2402.06332},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

功能介绍

InternLM-Math-Plus InternLM-Math Plus State-

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论