用于机器翻译的自动译后编辑,可根据原文对翻译结果进行进一步修正,获取更好的翻译效果,该模型的esemble系统(MiD)获WMT2020机器翻译大赛自动译后编辑任务 英德语向亚军 该模型用一个带memory的ecoder结构同时编码原文和机翻译文信息,再用decoder生成修改后的译文,对memory-ecoder使用了大量平行语料进行预训练,再使用<原文,机翻译文,译后编辑译文>的三元数据训练decoder。模型基于OpeNMT-tf代码框架进行训练。 该模型可用于英德语向的机器翻译译后编辑,输入为原文及机翻译文,输出为修改提升后的译文。 在安装ModelScope完成后即可使用,注意该模型仅支持pytho 3.6和tesorflow 1.12~1.15版本,不可在tesorflow 2.x环境下使用。 该模型训练数据主要偏向wiki领域,在其它领域的表现可能会下降 WMT2020 APE数据集,训练数据由<原文,机翻译文,译后编辑译文>的三元组组成。 该模型的esemble系统在WMT2020测试集上的评测结果如下 如果您在您的研究中使用了该模型,请引用下面两篇论文英德翻译自动译后编辑模型
模型描述
期望模型使用方式以及适用范围
如何使用
代码范例
from modelscope.pipelies import pipelie
p = pipelie('traslatio', model='damo/lp_automatic_post_editig_for_traslatio_e2de')
# 以 '\005' 拼接原文和译文
prit(p('Simultaeously, the Legio took part to the pacificatio of Algeria, plagued by various tribal rebellios ad razzias.\005Gleichzeitig ahm die Legio a der Befriedug Algeries teil, die vo verschiedee Stammesaufstäde ud Rasias heimgesucht wurde.'))
模型局限性以及可能的偏差
训练数据介绍
数据评估及结果
Model
TER↓
BLEU↑
baselie (MT)
31.56
50.21
Ours
26.99
55.77
相关论文以及引用信息
@iproceedigs{wag-etal-2020-computer,
title = "Computer Assisted Traslatio with Neural Quality Estimatio ad Automatic Post-Editig",
author = "Wag, Ke ad
Wag, Jiayi ad
Ge, Niyu ad
Shi, Yagbi ad
Zhao, Yu ad
Fa, Kai",
booktitle = "Fidigs of the Associatio for Computatioal Liguistics: EMNLP 2020",
moth = ov,
year = "2020",
address = "Olie",
publisher = "Associatio for Computatioal Liguistics",
url = "https://aclathology.org/2020.fidigs-emlp.197",
doi = "10.18653/v1/2020.fidigs-emlp.197",
pages = "2175--2186",
abstract = "With the advet of eural machie traslatio, there has bee a marked shift towards leveragig ad cosumig the machie traslatio results. However, the gap betwee machie traslatio systems ad huma traslators eeds to be maually closed by post-editig. I this paper, we propose a ed-to-ed deep learig framework of the quality estimatio ad automatic post-editig of the machie traslatio output. Our goal is to provide error correctio suggestios ad to further relieve the burde of huma traslators through a iterpretable model. To imitate the behavior of huma traslators, we desig three efficiet delegatio modules {--} quality estimatio, geerative post-editig, ad atomic operatio post-editig ad costruct a hierarchical model based o them. We examie this approach with the Eglish{--}Germa dataset from WMT 2017 APE shared task ad our experimetal results ca achieve the state-of-the-art performace. We also verify that the certified traslators ca sigificatly expedite their post-editig processig with our model i huma evaluatio.",
}
@iproceedigs{wag-etal-2020-alibabas,
title = "{A}libaba{'}s Submissio for the {WMT} 2020 {APE} Shared Task: Improvig Automatic Post-Editig with Pre-traied Coditioal Cross-Ligual {BERT}",
author = "Wag, Jiayi ad
Wag, Ke ad
Fa, Kai ad
Zhag, Yuqi ad
Lu, Ju ad
Ge, Xi ad
Shi, Yagbi ad
Zhao, Yu",
booktitle = "Proceedigs of the Fifth Coferece o Machie Traslatio",
moth = ov,
year = "2020",
address = "Olie",
publisher = "Associatio for Computatioal Liguistics",
url = "https://aclathology.org/2020.wmt-1.84",
pages = "789--796",
abstract = "The goal of Automatic Post-Editig (APE) is basically to examie the automatic methods for correctig traslatio errors geerated by a ukow machie traslatio (MT) system. This paper describes Alibaba{'}s submissios to the WMT 2020 APE Shared Task for the Eglish-Germa laguage pair. We desig a two-stage traiig pipelie. First, a BERT-like cross-ligual laguage model is pre-traied by radomly maskig target seteces aloe. The, a additioal eural decoder o the top of the pre-traied model is joitly fie-tued for the APE task. We also apply a imitatio learig strategy to augmet a reasoable amout of pseudo APE traiig data, potetially prevetig the model to overfit o the limited real traiig data ad boostig the performace o held-out data. To verify our proposed model ad data augmetatio, we examie our approach with the well-kow bechmarkig Eglish-Germa dataset from the WMT 2017 APE task. The experimet results demostrate that our system sigificatly outperforms all other baselies ad achieves the state-of-the-art performace. The fial results o the WMT 2020 test dataset show that our submissio ca achieve +5.56 BLEU ad -4.57 TER with respect to the official MT baselie.",
}
点击空白处退出提示
评论