翻译自动译后编辑-英德

我要开发同款
匿名用户2024年07月31日
83阅读

技术信息

开源地址
https://modelscope.cn/models/iic/nlp_automatic_post_editing_for_translation_en2de
授权协议
Apache License 2.0

作品详情

英德翻译自动译后编辑模型

用于机器翻译的自动译后编辑,可根据原文对翻译结果进行进一步修正,获取更好的翻译效果,该模型的esemble系统(MiD)获WMT2020机器翻译大赛自动译后编辑任务 英德语向亚军

模型描述

该模型用一个带memory的ecoder结构同时编码原文和机翻译文信息,再用decoder生成修改后的译文,对memory-ecoder使用了大量平行语料进行预训练,再使用<原文,机翻译文,译后编辑译文>的三元数据训练decoder。模型基于OpeNMT-tf代码框架进行训练。

期望模型使用方式以及适用范围

该模型可用于英德语向的机器翻译译后编辑,输入为原文及机翻译文,输出为修改提升后的译文。

如何使用

在安装ModelScope完成后即可使用,注意该模型仅支持pytho 3.6和tesorflow 1.12~1.15版本,不可在tesorflow 2.x环境下使用。

代码范例

from modelscope.pipelies import pipelie
p = pipelie('traslatio', model='damo/lp_automatic_post_editig_for_traslatio_e2de')
# 以 '\005' 拼接原文和译文
prit(p('Simultaeously, the Legio took part to the pacificatio of Algeria, plagued by various tribal rebellios ad razzias.\005Gleichzeitig ahm die Legio a der Befriedug Algeries teil, die vo verschiedee Stammesaufstäde ud Rasias heimgesucht wurde.'))

模型局限性以及可能的偏差

该模型训练数据主要偏向wiki领域,在其它领域的表现可能会下降

训练数据介绍

WMT2020 APE数据集,训练数据由<原文,机翻译文,译后编辑译文>的三元组组成。

数据评估及结果

该模型的esemble系统在WMT2020测试集上的评测结果如下

Model TER↓ BLEU↑
baselie (MT) 31.56 50.21
Ours 26.99 55.77

相关论文以及引用信息

如果您在您的研究中使用了该模型,请引用下面两篇论文

@iproceedigs{wag-etal-2020-computer,
    title = "Computer Assisted Traslatio with Neural Quality Estimatio ad Automatic Post-Editig",
    author = "Wag, Ke  ad
      Wag, Jiayi  ad
      Ge, Niyu  ad
      Shi, Yagbi  ad
      Zhao, Yu  ad
      Fa, Kai",
    booktitle = "Fidigs of the Associatio for Computatioal Liguistics: EMNLP 2020",
    moth = ov,
    year = "2020",
    address = "Olie",
    publisher = "Associatio for Computatioal Liguistics",
    url = "https://aclathology.org/2020.fidigs-emlp.197",
    doi = "10.18653/v1/2020.fidigs-emlp.197",
    pages = "2175--2186",
    abstract = "With the advet of eural machie traslatio, there has bee a marked shift towards leveragig ad cosumig the machie traslatio results. However, the gap betwee machie traslatio systems ad huma traslators eeds to be maually closed by post-editig. I this paper, we propose a ed-to-ed deep learig framework of the quality estimatio ad automatic post-editig of the machie traslatio output. Our goal is to provide error correctio suggestios ad to further relieve the burde of huma traslators through a iterpretable model. To imitate the behavior of huma traslators, we desig three efficiet delegatio modules {--} quality estimatio, geerative post-editig, ad atomic operatio post-editig ad costruct a hierarchical model based o them. We examie this approach with the Eglish{--}Germa dataset from WMT 2017 APE shared task ad our experimetal results ca achieve the state-of-the-art performace. We also verify that the certified traslators ca sigificatly expedite their post-editig processig with our model i huma evaluatio.",
}
@iproceedigs{wag-etal-2020-alibabas,
    title = "{A}libaba{'}s Submissio for the {WMT} 2020 {APE} Shared Task: Improvig Automatic Post-Editig with Pre-traied Coditioal Cross-Ligual {BERT}",
    author = "Wag, Jiayi  ad
      Wag, Ke  ad
      Fa, Kai  ad
      Zhag, Yuqi  ad
      Lu, Ju  ad
      Ge, Xi  ad
      Shi, Yagbi  ad
      Zhao, Yu",
    booktitle = "Proceedigs of the Fifth Coferece o Machie Traslatio",
    moth = ov,
    year = "2020",
    address = "Olie",
    publisher = "Associatio for Computatioal Liguistics",
    url = "https://aclathology.org/2020.wmt-1.84",
    pages = "789--796",
    abstract = "The goal of Automatic Post-Editig (APE) is basically to examie the automatic methods for correctig traslatio errors geerated by a ukow machie traslatio (MT) system. This paper describes Alibaba{'}s submissios to the WMT 2020 APE Shared Task for the Eglish-Germa laguage pair. We desig a two-stage traiig pipelie. First, a BERT-like cross-ligual laguage model is pre-traied by radomly maskig target seteces aloe. The, a additioal eural decoder o the top of the pre-traied model is joitly fie-tued for the APE task. We also apply a imitatio learig strategy to augmet a reasoable amout of pseudo APE traiig data, potetially prevetig the model to overfit o the limited real traiig data ad boostig the performace o held-out data. To verify our proposed model ad data augmetatio, we examie our approach with the well-kow bechmarkig Eglish-Germa dataset from the WMT 2017 APE task. The experimet results demostrate that our system sigificatly outperforms all other baselies ad achieves the state-of-the-art performace. The fial results o the WMT 2020 test dataset show that our submissio ca achieve +5.56 BLEU ad -4.57 TER with respect to the official MT baselie.",
}

功能介绍

英德翻译自动译后编辑模型 用于机器翻译的自动译后编辑,可根据原文对翻译结果进行进一步修正,获取更好的翻译效果,该模型的ensemble系统(MinD)获WMT2020机器翻译大赛自动译后编辑任务 英德

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论