匿名用户2024年07月31日
68阅读

技术信息

开源地址
https://modelscope.cn/models/feynmanchen/NuExtract
授权协议
mit

作品详情

Structure Extractio Model by NuMid ?

NuExtract is a versio of phi-3-mii, fie-tued o a private high-quality sythetic dataset for iformatio extractio. To use the model, provide a iput text (less tha 2000 tokes) ad a JSON template describig the iformatio you eed to extract.

Note: This model is purely extractive, so all text output by the model is preset as is i the origial text. You ca also provide a example of output formattig to help the model uderstad your task more precisely.

Try it here: https://huggigface.co/spaces/umid/NuExtract

We also provide a tiy(0.5B) ad large(7B) versio of this model: NuExtract-tiy ad NuExtract-large

Checkout other models by NuMid:

Bechmark

Bechmark 0 shot (will release soo):

Bechmark fie-tuig (see blog post):

Usage

To use the model:

import jso
from trasformers import AutoModelForCausalLM, AutoTokeizer


def predict_NuExtract(model, tokeizer, text, schema, example=["", "", ""]):
    schema = jso.dumps(jso.loads(schema), idet=4)
    iput_llm =  "<|iput|>\### Template:\" +  schema + "\"
    for i i example:
      if i != "":
          iput_llm += "### Example:\"+ jso.dumps(jso.loads(i), idet=4)+"\"

    iput_llm +=  "### Text:\"+text +"\<|output|>\"
    iput_ids = tokeizer(iput_llm, retur_tesors="pt",trucatio = True, max_legth=4000).to("cuda")

    output = tokeizer.decode(model.geerate(**iput_ids)[0], skip_special_tokes=True)
    retur output.split("<|output|>")[1].split("<|ed-output|>")[0]


# We recommed usig bf16 as it results i egligable performace loss
model = AutoModelForCausalLM.from_pretraied("umid/NuExtract", torch_dtype=torch.bfloat16, trust_remote_code=True)
tokeizer = AutoTokeizer.from_pretraied("umid/NuExtract", trust_remote_code=True)

model.to("cuda")

model.eval()

text = """We itroduce Mistral 7B, a 7–billio-parameter laguage model egieered for
superior performace ad efficiecy. Mistral 7B outperforms the best ope 13B
model (Llama 2) across all evaluated bechmarks, ad the best released 34B
model (Llama 1) i reasoig, mathematics, ad code geeratio. Our model
leverages grouped-query attetio (GQA) for faster iferece, coupled with slidig
widow attetio (SWA) to effectively hadle sequeces of arbitrary legth with a
reduced iferece cost. We also provide a model fie-tued to follow istructios,
Mistral 7B – Istruct, that surpasses Llama 2 13B – chat model both o huma ad
automated bechmarks. Our models are released uder the Apache 2.0 licese.
Code: https://github.com/mistralai/mistral-src
Webpage: https://mistral.ai/ews/aoucig-mistral-7b/"""

schema = """{
    "Model": {
        "Name": "",
        "Number of parameters": "",
        "Number of max toke": "",
        "Architecture": []
    },
    "Usage": {
        "Use case": [],
        "Licece": ""
    }
}"""

predictio = predict_NuExtract(model, tokeizer, text, schema, example=["","",""])
prit(predictio)

功能介绍

Structure Extraction Model by NuMind ? NuExtract is a version of phi-3-mini, fine-tuned on a private

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论