- named-entity-recognition widgets:
- task: named-entity-recognition
inputs:
- type: text validator: max_words: 512 examples:
- name: 1
inputs:
- data: 浙江省杭州市余杭区文一西路969号
- name: 2
inputs:
- data: 浙江省杭州市五常街道淘宝城
- name: 3
inputs:
- data: 海淀区学院路37号北京航空航天大学 domain:
- nlp frameworks:
- PyTorch model-type:
- token-classification-for-ner backbone:
- bert metrics:
- F1 language:
- cn license: Apache License 2.0 tags:
- Alibaba
- NER
- 地理语义
- 信息抽取
datasets: train:
- ccks2021-addrst test:
- ccks2021-addrst evaluation:
- ccks2021-addrst
indexing: results:
- task:
name: Named Entity Recognition
dataset:
name: ccks2021-addrst
metrics:
- type: F1 value: 90.79 description: F1-score args: default
地址结构化要素解析介绍
模型描述
本方法采用Transformer-CRF模型,使用StructBERT作为预训练模型底座,结合使用外部工具召回的相关句子作为额外上下文,使用Multi-view Training方式进行训练。 模型结构如下图所示:
可参考论文:Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning
期望模型使用方式以及适用范围
地址是日常生活中一种重要的文本信息,诸多场景需要登记地址,如电商购物、外卖配送、人口普查、水电气开户等。常见的地址一般包含以下几类信息:
行政区划信息,如省、市、县、乡镇信息;
路网信息,如路名,路号,道路设施等;
详细地址信息,如POI (兴趣点)、楼栋号、户室号等;
非地址信息,如补充说明,误输入等;
地址要素解析是将地址文本拆分成独立语义的要素,并对这些要素进行类型识别。 用户可以自行尝试输入中文句子。具体调用方式请参考代码示例。
如何使用
在安装ModelScope完成之后即可使用nlpstructbertaddress-parsingchinesebase(地址结构化要素解析)的能力, 默认单句长度不超过512。
代码范例
kk```python
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
pipeline_ins = pipeline(
task=Tasks.named_entity_recognition, model='damo/nlp_structbert_address-parsing_chinese_base')
print(pipeline_ins(input='浙江省杭州市余杭区文一西路969号亲橙里'))
# {'output': [{'type': 'prov', 'start': 0, 'end': 3, 'span': '浙江省'}, {'type': 'city', 'start': 3, 'end': 6, 'span': '杭州市'}, {'type': 'district', 'start': 6, 'end': 9, 'span': '余杭区'}, {'type': 'road', 'start': 9, 'end': 13, 'span': '文一西路'}, {'type': 'roadno', 'start': 13, 'end': 17, 'span': '969号'}, {'type': 'poi', 'start': 17, 'end': 20, 'span': '亲橙里'}]}
模型局限性以及可能的偏差
本模型基于ccks2021-addrst数据集上训练,请用户自行评测后决定如何使用。
训练数据介绍
- ccks2021-addrst: 中文地址要素解析数据集。
数据评估及结果
模型在ccks2021-addrst测试数据评估结果:
Dataset | Precision | Recall | F1 |
---|---|---|---|
ccks2021-addrst | 90.39 | 91.20 | 90.79 |
相关论文以及引用信息
如果你觉得这个该模型对有所帮助,请考虑引用下面的相关的论文:
@inproceedings{wang-etal-2021-improving,
title = "Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning",
author = "Wang, Xinyu and
Jiang, Yong and
Bach, Nguyen and
Wang, Tao and
Huang, Zhongqiang and
Huang, Fei and
Tu, Kewei",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.142",
pages = "1800--1812",
}
@inproceedings{wang-etal-2022-damo,
title = "{DAMO}-{NLP} at {S}em{E}val-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition",
author = "Wang, Xinyu and
Shen, Yongliang and
Cai, Jiong and
Wang, Tao and
Wang, Xiaobin and
Xie, Pengjun and
Huang, Fei and
Lu, Weiming and
Zhuang, Yueting and
Tu, Kewei and
Lu, Wei and
Jiang, Yong",
booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.semeval-1.200",
pages = "1457--1468",
}
@inproceedings{zhang-etal-2022-domain,
title = "Domain-Specific NER via Retrieving Correlated Samples",
author = "Zhang, Xin and
Yong, Jiang and
Wang, Xiaobin and
Hu, Xuming and
Sun, Yueheng and
Xie, Pengjun and
Zhang, Meishan",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics"
}
评论