Improved Data Augmentation for Translation Suggestion Hongxiao Zhang1 Siyu Lai1 Songming Zhang1 Hui Huang2 Yufeng Chen1 Jinan Xu1andJian Liu1

2025-04-27 1 0 289.08KB 5 页 10玖币

侵权投诉

Improved Data Augmentation for Translation Suggestion

Hongxiao Zhang1, Siyu Lai1, Songming Zhang1, Hui Huang2, Yufeng Chen1∗

Jinan Xu1and Jian Liu1

1Beijing Jiaotong University, Beijing, China

2Harbin Institute of Technology, Harbin, China

{hongxiaozhang,siyulai,smzhang22,chenyf,jaxu,jianliu}@bjtu.edu.cn,

huanghui_hit@126.com

Abstract

Translation suggestion (TS) models are used

to automatically provide alternative sugges-

tions for incorrect spans in sentences gener-

ated by machine translation. This paper intro-

duces the system used in our submission to the

WMT’22 Translation Suggestion shared task.

Our system is based on the ensemble of differ-

ent translation architectures, including Trans-

former, SA-Transformer, and DynamicConv.

We use three strategies to construct synthetic

data from parallel corpora to compensate for

the lack of supervised data. In addition, we

introduce a multi-phase pre-training strategy,

adding an additional pre-training phase with

in-domain data. We rank second and third on

the English-German and English-Chinese bidi-

rectional tasks, respectively.

1 Introduction

Translation suggestion (TS) is a scheme to simplify

Post-editing (PE) by automatically providing alter-

native suggestions for incorrect spans in machine

translation outputs. Yang et al. (2021) formally

deﬁne TS and build a high-quality dataset with hu-

man annotation, establishing a benchmark for TS.

Based on the machine translation framework, the

TS system takes the spliced source sentence

and

the translation sentence

˜m

as the input, where the

incorrect span of

˜m

is masked, and its output is

the correct alternative

of the incorrect span. The

TS task is still in the primary research stage, to

spur the research on this task, WMT released the

translation suggestion shared task.

This WMT’22 shared task consists of two sub-

tasks: Naive Translation Suggestion and Trans-

lation Suggestion with Hints. We participate

in the former, which publishes the bidirectional

translation suggestion task for two language pairs,

English-Chinese and English-German, and we par-

ticipate in all language pairs.

∗Yufeng Chen is the corresponding author.

Our TS systems are built based on several ma-

chine translation models, including Transformer

(Vaswani et al.,2017), SA-Transformer (Yang et al.,

2021), and DynamicConv (Wu et al.,2018). To

make up for the lack of training data, we use par-

allel corpora to construct synthetic data, based on

three strategies. Firstly, we randomly sample a

sub-segment in each target sentence of the golden

parallel data, mask the sampled sub-segment to sim-

ulate an incorrect span, and use the sub-segment

as an alternative suggestion. Secondly, the same

strategy as above is used for pseudo-parallel data

with the target side substituted by machine trans-

lation results. Finally, we use a quality estimation

(QE) model (Zheng et al.,2021) to estimate the

translation quality of words in translation output

sentence and select the span with low conﬁdence

for masking, and then, we utilize an alignment tool

to ﬁnd the sub-segment corresponding to the span

in the reference sentence and use it as the alterna-

tive suggestion for the span.

Considering that there is a domain difference

between the synthetic corpus and the human-

annotated corpus, we add an additional pre-training

phase. Speciﬁcally, we train a discriminator and

use it to ﬁlter sentences from the synthetic cor-

pus that are close to the golden corpus, which we

deem as in-domain data. After pre-training with

large-scale synthetic data, we perform an additional

pre-training with in-domain data, thereby reducing

the domain gap. We will describe our system in

detail in Section 3.

2 Related Work

The translation suggestion (TS) task is an important

part of post-editing (PE), which combines machine

translation (MT) and human translation (HT), and

improves the quality of translation by correcting

incorrect spans in machine translation outputs by

human translators. To simplify PE, some early

scholars have studied translation prediction (Green

arXiv:2210.06138v1 [cs.CL] 12 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovedDataAugmentationforTranslationSuggestionHongxiaoZhang1,SiyuLai1,SongmingZhang1,HuiHuang2,YufengChen1JinanXu1andJianLiu11BeijingJiaotongUniversity,Beijing,China2HarbinInstituteofTechnology,Harbin,China{hongxiaozhang,siyulai,smzhang22,chenyf,jaxu,jianliu}@bjtu.edu.cn,huanghui_hit@126.comAbstr...

展开>> 收起<<

Improved Data Augmentation for Translation Suggestion Hongxiao Zhang1 Siyu Lai1 Songming Zhang1 Hui Huang2 Yufeng Chen1 Jinan Xu1andJian Liu1.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improved Data Augmentation for Translation Suggestion Hongxiao Zhang1 Siyu Lai1 Songming Zhang1 Hui Huang2 Yufeng Chen1 Jinan Xu1andJian Liu1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: