开源地址
https://modelscope.cn/models/feynmanchen/NuExtract授权协议
mit

Structure Extractio Model by NuMid ?

NuExtract is a versio of phi-3-mii, fie-tued o a private high-quality sythetic dataset for iformatio extractio. To use the model, provide a iput text (less tha 2000 tokes) ad a JSON template describig the iformatio you eed to extract.

Note: This model is purely extractive, so all text output by the model is preset as is i the origial text. You ca also provide a example of output formattig to help the model uderstad your task more precisely.

Try it here: https://huggigface.co/spaces/umid/NuExtract

We also provide a tiy(0.5B) ad large(7B) versio of this model: NuExtract-tiy ad NuExtract-large

Checkout other models by NuMid:

SOTA Zero-shot NER Model NuNER Zero
SOTA Multiligual Etity Recogitio Foudatio Model: lik
SOTA Setimet Aalysis Foudatio Model: Eglish, Multiligual

Bechmark

Bechmark 0 shot (will release soo):

Bechmark fie-tuig (see blog post):

Usage

To use the model:

import jso
from trasformers import AutoModelForCausalLM, AutoTokeizer


def predict_NuExtract(model, tokeizer, text, schema, example=["", "", ""]):
    schema = jso.dumps(jso.loads(schema), idet=4)
    iput_llm =  "<|iput|>\### Template:\" +  schema + "\"
    for i i example:
      if i != "":
          iput_llm += "### Example:\"+ jso.dumps(jso.loads(i), idet=4)+"\"

    iput_llm +=  "### Text:\"+text +"\<|output|>\"
    iput_ids = tokeizer(iput_llm, retur_tesors="pt",trucatio = True, max_legth=4000).to("cuda")

    output = tokeizer.decode(model.geerate(**iput_ids)[0], skip_special_tokes=True)
    retur output.split("<|output|>")[1].split("<|ed-output|>")[0]


# We recommed usig bf16 as it results i egligable performace loss
model = AutoModelForCausalLM.from_pretraied("umid/NuExtract", torch_dtype=torch.bfloat16, trust_remote_code=True)
tokeizer = AutoTokeizer.from_pretraied("umid/NuExtract", trust_remote_code=True)

model.to("cuda")

model.eval()

text = """We itroduce Mistral 7B, a 7–billio-parameter laguage model egieered for
superior performace ad efficiecy. Mistral 7B outperforms the best ope 13B
model (Llama 2) across all evaluated bechmarks, ad the best released 34B
model (Llama 1) i reasoig, mathematics, ad code geeratio. Our model
leverages grouped-query attetio (GQA) for faster iferece, coupled with slidig
widow attetio (SWA) to effectively hadle sequeces of arbitrary legth with a
reduced iferece cost. We also provide a model fie-tued to follow istructios,
Mistral 7B – Istruct, that surpasses Llama 2 13B – chat model both o huma ad
automated bechmarks. Our models are released uder the Apache 2.0 licese.
Code: https://github.com/mistralai/mistral-src
Webpage: https://mistral.ai/ews/aoucig-mistral-7b/"""

schema = """{
    "Model": {
        "Name": "",
        "Number of parameters": "",
        "Number of max toke": "",
        "Architecture": []
    },
    "Usage": {
        "Use case": [],
        "Licece": ""
    }
}"""

predictio = predict_NuExtract(model, tokeizer, text, schema, example=["","",""])
prit(predictio)

Structure Extraction Model by NuMind ? NuExtract is a version of phi-3-mini, fine-tuned on a private

声明：本文仅代表作者观点，不代表本站立场。如果侵犯到您的合法权益，请联系我们删除侵权资源！如果遇到资源链接失效，请您通过评论或工单的方式通知管理员。未经允许，不得转载，本站所有资源文章禁止商业使用运营!

下载安装【程序员客栈】APP

实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

前往安装

NuExtract

技术信息

作品详情

Structure Extractio Model by NuMid ?

Bechmark

Usage

功能介绍

重点城市程序员兼职推荐

重点岗位程序员兼职推荐