Turning Fixed to Adaptive Integrating Post-Evaluation into Simultaneous Machine Translation Shoutao Guo12 Shaolei Zhang12 Yang Feng12

2025-05-06 0 0 2.96MB 15 页 10玖币

侵权投诉

Turning Fixed to Adaptive: Integrating Post-Evaluation

into Simultaneous Machine Translation

Shoutao Guo 1,2, Shaolei Zhang 1,2, Yang Feng 1,2∗

1Key Laboratory of Intelligent Information Processing

Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS)

2University of Chinese Academy of Sciences, Beijing, China

{guoshoutao22z,zhangshaolei20z,fengyang}@ict.ac.cn

Abstract

Simultaneous machine translation (SiMT)

starts its translation before reading the whole

source sentence and employs either ﬁxed or

adaptive policy to generate the target sentence.

Compared to the ﬁxed policy, the adaptive pol-

icy achieves better latency-quality tradeoffs by

adopting a ﬂexible translation policy. If the

policy can evaluate rationality before taking

action, the probability of incorrect actions will

also decrease. However, previous methods

lack evaluation of actions before taking them.

In this paper, we propose a method of perform-

ing the adaptive policy via integrating post-

evaluation into the ﬁxed policy. Speciﬁcally,

whenever a candidate token is generated, our

model will evaluate the rationality of the next

action by measuring the change in the source

content. Our model will then take different ac-

tions based on the evaluation results. Experi-

ments on three translation tasks show that our

method can exceed strong baselines under all

latency1.

1 Introduction

Simultaneous machine translation (SiMT) (Gu

et al.,2017;Ma et al.,2019;Arivazhagan et al.,

2019;Ma et al.,2020;Zhang and Feng,2021b,

2022d) starts translation before reading the whole

source sentence. It seeks to achieve good latency-

quality tradeoffs and is suitable for various scenar-

ios with different latency tolerances. Compared to

full-sentence machine translation, SiMT is more

challenging because it lacks partial source content

in translation and needs to decide on translation

policy additionally.

The translation policy in SiMT directs the model

to decide when to take READ (i.e., read the next

source token) or WRITE (i.e., output the generated

token) action, so as to ensure that the model has

∗Corresponding author: Yang Feng.

Code is available at

https://github.com/ictnlp/

PED-SiMT

···

Generate

Candidate Token

WRITE

READ

Compare

[rational]

[irrational]

Source Token Translation

Degree

Change in

Translation

Degree

Figure 1: The change in translation degree of source

tokens after generating a candidate token, and the

READ/WRITE action is taken accordingly.

appropriate source content to translate the target

tokens. Because READ and WRITE actions are

often decided based on available source tokens and

generated target tokens, it is difﬁcult to guarantee

their accuracy. Therefore, if the SiMT model can

evaluate the rationality of actions with the help of

the current generated candidate token, it can reduce

the probability of taking incorrect actions.

However, the previous methods, including ﬁxed

and adaptive policies, lack evaluation before taking

the next action. For ﬁxed policy (Ma et al.,2019;

Elbayad et al.,2020;Zhang et al.,2021;Zhang

and Feng,2021c), the model generates translation

according to the predeﬁned translation rules. Al-

though it only relies on simple training methods,

it cannot make full use of the context to decide an

appropriate translation policy. For adaptive pol-

icy (Gu et al.,2017;Arivazhagan et al.,2019;Ma

et al.,2020;Zhang et al.,2022), the model can

obtain better translation performance. But it needs

complicated training methods to obtain translation

policy and takes action immediately after making

decisions, which usually does not guarantee the

accuracy of actions.

Therefore, we attempt to explore some factors

from the translation to reﬂect whether the action is

arXiv:2210.11900v1 [cs.CL] 21 Oct 2022

correct, thereby introducing evaluation into trans-

lation policy. The goal of translation is to con-

vert sentences from the source language to the tar-

get language (Mujadia and Sharma,2021), so the

source and target sentences should contain the same

semantics (i.e., global equivalence). To ensure

the faithfulness of translation (Weng et al.,2020),

the source content that has already been translated

should be semantically equivalent to the previously

generated target tokens at each step (i.e., partial

equivalence) (Zhang and Feng,2022c). Further-

more, by comparing the changes between adjacent

steps, the increment of the source content being

translated should be semantically equivalent to the

current generated token (i.e., incremental equiva-

lence). Therefore, the rationality of the generated

target token can be reﬂected by the increment of the

source content being translated between adjacent

steps, which can be used to evaluate the READ and

WRITE actions.

In this paper, we propose a method of performing

the adaptive policy by integrating post-evaluation

into the ﬁxed policy, which directs the model to

take READ or WRITE action based on the evalua-

tion results. Using partial equivalence, our model

can recognize the translation degree of source to-

kens (i.e., the degree to which the source token

has been translated), which represents how much

the source content is translated at each step. Then

naturally, by virtue of incremental equivalence, the

increment of translated source content can be re-

garded as the change in the translation degree of

available source tokens. Therefore, we can evaluate

the action by measuring the change in translation

degree. As shown in Figure 1, if the translation

degree has signiﬁcant changes after generating a

candidate token, we think that the current gener-

ated token obtains enough source content, and thus

WRITE action should be taken. Otherwise, the

model should continue to take READ actions to

wait for the arrival of the required source tokens.

Experiments on WMT15 De

→

En and IWSLT15

→

Vi translation tasks show that our method can

exceed strong baselines under all latency.

2 Background

Transformer (Vaswani et al.,2017), which con-

sists of encoder and decoder, is the most widely

used neural machine translation model. Given a

source sentence

(x1, ..., xI)

, the encoder maps

it into a sequence of hidden states

(z1, ..., zI)

The decoder generates target hidden states

(h1, ..., hM)

and predicts the target sentence

(y1, ..., yM)based on zautoregressively.

Our method is based on wait-

policy (Ma

et al.,2019) and Capsule Networks (Hinton et al.,

2011) with Guided Dynamic Routing (Zheng et al.,

2019b), so we brieﬂy introduce them.

2.1 Wait-kPolicy

Wait-

policy, which belongs to ﬁxed policy, takes

READ actions ﬁrst and then takes READ and

WRITE actions alternately. Deﬁne a monotonic

non-decreasing function

g(t)

, which represents the

number of available source tokens when translating

target token

. For wait-

policy,

g(t)

can be

calculated as:

g(t;k) = min{k+t−1, I},(1)

where Iis the length of the source sentence.

To avoid the recalculation of the encoder hidden

states when a new source token is read, unidirec-

tional encoder (Elbayad et al.,2020) is proposed

to make each source token only attend to its previ-

ous tokens. Besides, multi-path method (Elbayad

et al.,2020) optimizes the model by sampling

uni-

formly during training and makes a uniﬁed model

obtain the translation performance comparable to

wait-kpolicy under all latency.

2.2 Capsule Networks with Guided Dynamic

Routing

Guided Dynamic Routing (GDR) is a variant of

routing-by-agreement mechanism (Sabour et al.,

2017) in Capsule Networks and makes input cap-

sules route to corresponding output capsules driven

by the decoding state at each step. In detail, en-

coder hidden states

are regarded as a sequence

of input capsules, and a layer of output capsules is

added to the top of the encoder to model different

categories of source information. The decoding

state then directs each input capsule to ﬁnd its afﬁl-

iation to each output capsule at each step, thereby

solving the problem of assigning source tokens to

different categories.

3 The Proposed Method

The architecture of our method is shown in Figure

2. Our method ﬁrst guides the model to recognize

the translation degree of available source tokens

based on partial equivalence during training via the

Output Probability

Unidirectional

Encoder

Conventional

Decoder

 



··· ··· ··· ···

Concatenate

Guide

Output Layer

Available Source Generated Target

GDR Layer

R/W

Prediction

Translation Degree

Action

Figure 2: The architecture of our method. The R/W

prediction module obtains the translation degree of the

available source tokens and evaluates the next action

based on the change in translation degree.

introduced GDR module. Then based on the in-

cremental equivalence between adjacent steps, our

method utilizes the changes in translation degree

to post-evaluate the rationality of the READ and

WRITE actions and accordingly make corrections,

thereby performing an adaptive policy during in-

ference. Besides, to enhance the robustness of the

model in recognizing the translation degree dur-

ing inference, our method applies a disturbed-path

training based on the wait-

policy, which adds

some disturbance to the translation policy during

training. The details are introduced in the following

sections in order.

3.1 Recognizing the Translation Degree

As mentioned above, the translation degree repre-

sents the degree to which the source token has been

translated and is the prerequisite of our method.

Therefore, we introduce Capsule Networks with

GDR to model the translation degree, which is

guided by our proposed two constraints according

to partial equivalence during training.

Translation Degree

We deﬁne the translation

degree of all source tokens at step

d(t)

(d(t)

1, ..., d(t)

. To obtain the translation degree, we

need to utilize the ability of Capsule Networks with

GDR to assign the source tokens to different cate-

gories. Assume that there are

J+N

output capsules

modeling available source information that has al-

ready been translated and has not yet been trans-

lated, among which there are

translated capsules

ΦT

(Φ1, ..., ΦJ)

and

untranslated capsules

ΦU

(ΦJ+1, ..., ΦJ+N)

, respectively. The encoder

hidden states

are regarded as input capsules. To

determine how much of

needs to be sent to

Φj

at step

, the assignment probability

c(t)

in SiMT

is modiﬁed as:

c(t)

ij =





exp b(t)

Plexp b(t)

if i≤g(t)

0otherwise

,(2)

where

b(t)

measures the cumulative similarity be-

tween

and

Φj

. Then

c(t)

is updated iteratively

driven by the decoding state and is seen as the

afﬁliation of

belonging to

Φj

after the last iter-

ation. For more details about Capsule Networks

with GDR, please refer to Zheng et al. (2019b).

On this basis, the translation degree of

is calcu-

lated by aggregating the assignment probability of

routing to the translated capsules at step t:

d(t)

j=1

c(t)

ij .(3)

Segment Constraint

To ensure that the model

can recognize the translation degree of source to-

kens, the model requires additional guidance. Ac-

cording to partial equivalence, the translated source

content should be semantically equivalent to the

generated target tokens. On the contrary, the un-

translated source content and unread source tokens

should be semantically equivalent to target tokens

not generated. So we introduce mean square error

to induce the learning of output capsules:

LS=1

t=1

(kΦT

t−WTHT

tk2

+kΦU

t+WU

eZt−WU

dHU

tk2)

,(4)

where

and

are learnable parame-

ters.

and

are the averages of hidden states

of the generated target tokens and target tokens not

generated, which are calculated respectively:

t=1

t−1

τ=1

hτ,(5)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TurningFixedtoAdaptive:IntegratingPost-EvaluationintoSimultaneousMachineTranslationShoutaoGuo1,2,ShaoleiZhang1,2,YangFeng1,21KeyLaboratoryofIntelligentInformationProcessingInstituteofComputingTechnology,ChineseAcademyofSciences(ICT/CAS)2UniversityofChineseAcademyofSciences,Beijing,China{guoshoutao2...

展开>> 收起<<

Turning Fixed to Adaptive Integrating Post-Evaluation into Simultaneous Machine Translation Shoutao Guo12 Shaolei Zhang12 Yang Feng12.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Turning Fixed to Adaptive Integrating Post-Evaluation into Simultaneous Machine Translation Shoutao Guo12 Shaolei Zhang12 Yang Feng12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: