Turning Fixed to Adaptive Integrating Post-Evaluation into Simultaneous Machine Translation Shoutao Guo12 Shaolei Zhang12 Yang Feng12

2025-05-06 0 0 2.96MB 15 页 10玖币
侵权投诉
Turning Fixed to Adaptive: Integrating Post-Evaluation
into Simultaneous Machine Translation
Shoutao Guo 1,2, Shaolei Zhang 1,2, Yang Feng 1,2
1Key Laboratory of Intelligent Information Processing
Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS)
2University of Chinese Academy of Sciences, Beijing, China
{guoshoutao22z,zhangshaolei20z,fengyang}@ict.ac.cn
Abstract
Simultaneous machine translation (SiMT)
starts its translation before reading the whole
source sentence and employs either fixed or
adaptive policy to generate the target sentence.
Compared to the fixed policy, the adaptive pol-
icy achieves better latency-quality tradeoffs by
adopting a flexible translation policy. If the
policy can evaluate rationality before taking
action, the probability of incorrect actions will
also decrease. However, previous methods
lack evaluation of actions before taking them.
In this paper, we propose a method of perform-
ing the adaptive policy via integrating post-
evaluation into the fixed policy. Specifically,
whenever a candidate token is generated, our
model will evaluate the rationality of the next
action by measuring the change in the source
content. Our model will then take different ac-
tions based on the evaluation results. Experi-
ments on three translation tasks show that our
method can exceed strong baselines under all
latency1.
1 Introduction
Simultaneous machine translation (SiMT) (Gu
et al.,2017;Ma et al.,2019;Arivazhagan et al.,
2019;Ma et al.,2020;Zhang and Feng,2021b,
2022d) starts translation before reading the whole
source sentence. It seeks to achieve good latency-
quality tradeoffs and is suitable for various scenar-
ios with different latency tolerances. Compared to
full-sentence machine translation, SiMT is more
challenging because it lacks partial source content
in translation and needs to decide on translation
policy additionally.
The translation policy in SiMT directs the model
to decide when to take READ (i.e., read the next
source token) or WRITE (i.e., output the generated
token) action, so as to ensure that the model has
Corresponding author: Yang Feng.
1
Code is available at
https://github.com/ictnlp/
PED-SiMT
···
···
Generate
Candidate Token
WRITE
READ
Compare
[rational]
[irrational]
Source Token Translation
Degree
Change in
Translation
Degree
Figure 1: The change in translation degree of source
tokens after generating a candidate token, and the
READ/WRITE action is taken accordingly.
appropriate source content to translate the target
tokens. Because READ and WRITE actions are
often decided based on available source tokens and
generated target tokens, it is difficult to guarantee
their accuracy. Therefore, if the SiMT model can
evaluate the rationality of actions with the help of
the current generated candidate token, it can reduce
the probability of taking incorrect actions.
However, the previous methods, including fixed
and adaptive policies, lack evaluation before taking
the next action. For fixed policy (Ma et al.,2019;
Elbayad et al.,2020;Zhang et al.,2021;Zhang
and Feng,2021c), the model generates translation
according to the predefined translation rules. Al-
though it only relies on simple training methods,
it cannot make full use of the context to decide an
appropriate translation policy. For adaptive pol-
icy (Gu et al.,2017;Arivazhagan et al.,2019;Ma
et al.,2020;Zhang et al.,2022), the model can
obtain better translation performance. But it needs
complicated training methods to obtain translation
policy and takes action immediately after making
decisions, which usually does not guarantee the
accuracy of actions.
Therefore, we attempt to explore some factors
from the translation to reflect whether the action is
arXiv:2210.11900v1 [cs.CL] 21 Oct 2022
correct, thereby introducing evaluation into trans-
lation policy. The goal of translation is to con-
vert sentences from the source language to the tar-
get language (Mujadia and Sharma,2021), so the
source and target sentences should contain the same
semantics (i.e., global equivalence). To ensure
the faithfulness of translation (Weng et al.,2020),
the source content that has already been translated
should be semantically equivalent to the previously
generated target tokens at each step (i.e., partial
equivalence) (Zhang and Feng,2022c). Further-
more, by comparing the changes between adjacent
steps, the increment of the source content being
translated should be semantically equivalent to the
current generated token (i.e., incremental equiva-
lence). Therefore, the rationality of the generated
target token can be reflected by the increment of the
source content being translated between adjacent
steps, which can be used to evaluate the READ and
WRITE actions.
In this paper, we propose a method of performing
the adaptive policy by integrating post-evaluation
into the fixed policy, which directs the model to
take READ or WRITE action based on the evalua-
tion results. Using partial equivalence, our model
can recognize the translation degree of source to-
kens (i.e., the degree to which the source token
has been translated), which represents how much
the source content is translated at each step. Then
naturally, by virtue of incremental equivalence, the
increment of translated source content can be re-
garded as the change in the translation degree of
available source tokens. Therefore, we can evaluate
the action by measuring the change in translation
degree. As shown in Figure 1, if the translation
degree has significant changes after generating a
candidate token, we think that the current gener-
ated token obtains enough source content, and thus
WRITE action should be taken. Otherwise, the
model should continue to take READ actions to
wait for the arrival of the required source tokens.
Experiments on WMT15 De
En and IWSLT15
En
Vi translation tasks show that our method can
exceed strong baselines under all latency.
2 Background
Transformer (Vaswani et al.,2017), which con-
sists of encoder and decoder, is the most widely
used neural machine translation model. Given a
source sentence
x
=
(x1, ..., xI)
, the encoder maps
it into a sequence of hidden states
z
=
(z1, ..., zI)
.
The decoder generates target hidden states
h
=
(h1, ..., hM)
and predicts the target sentence
y
=
(y1, ..., yM)based on zautoregressively.
Our method is based on wait-
k
policy (Ma
et al.,2019) and Capsule Networks (Hinton et al.,
2011) with Guided Dynamic Routing (Zheng et al.,
2019b), so we briefly introduce them.
2.1 Wait-kPolicy
Wait-
k
policy, which belongs to fixed policy, takes
k
READ actions first and then takes READ and
WRITE actions alternately. Define a monotonic
non-decreasing function
g(t)
, which represents the
number of available source tokens when translating
target token
yt
. For wait-
k
policy,
g(t)
can be
calculated as:
g(t;k) = min{k+t1, I},(1)
where Iis the length of the source sentence.
To avoid the recalculation of the encoder hidden
states when a new source token is read, unidirec-
tional encoder (Elbayad et al.,2020) is proposed
to make each source token only attend to its previ-
ous tokens. Besides, multi-path method (Elbayad
et al.,2020) optimizes the model by sampling
k
uni-
formly during training and makes a unified model
obtain the translation performance comparable to
wait-kpolicy under all latency.
2.2 Capsule Networks with Guided Dynamic
Routing
Guided Dynamic Routing (GDR) is a variant of
routing-by-agreement mechanism (Sabour et al.,
2017) in Capsule Networks and makes input cap-
sules route to corresponding output capsules driven
by the decoding state at each step. In detail, en-
coder hidden states
z
are regarded as a sequence
of input capsules, and a layer of output capsules is
added to the top of the encoder to model different
categories of source information. The decoding
state then directs each input capsule to find its affil-
iation to each output capsule at each step, thereby
solving the problem of assigning source tokens to
different categories.
3 The Proposed Method
The architecture of our method is shown in Figure
2. Our method first guides the model to recognize
the translation degree of available source tokens
based on partial equivalence during training via the
Output Probability
Unidirectional
Encoder
Conventional
Decoder

··· ··· ··· ···
Concatenate
Guide
Output Layer
Available Source Generated Target
GDR Layer
R/W
Prediction
Previous
Translation Degree
Action
Figure 2: The architecture of our method. The R/W
prediction module obtains the translation degree of the
available source tokens and evaluates the next action
based on the change in translation degree.
introduced GDR module. Then based on the in-
cremental equivalence between adjacent steps, our
method utilizes the changes in translation degree
to post-evaluate the rationality of the READ and
WRITE actions and accordingly make corrections,
thereby performing an adaptive policy during in-
ference. Besides, to enhance the robustness of the
model in recognizing the translation degree dur-
ing inference, our method applies a disturbed-path
training based on the wait-
k
policy, which adds
some disturbance to the translation policy during
training. The details are introduced in the following
sections in order.
3.1 Recognizing the Translation Degree
As mentioned above, the translation degree repre-
sents the degree to which the source token has been
translated and is the prerequisite of our method.
Therefore, we introduce Capsule Networks with
GDR to model the translation degree, which is
guided by our proposed two constraints according
to partial equivalence during training.
Translation Degree
We define the translation
degree of all source tokens at step
t
as
d(t)
=
(d(t)
1, ..., d(t)
I)
. To obtain the translation degree, we
need to utilize the ability of Capsule Networks with
GDR to assign the source tokens to different cate-
gories. Assume that there are
J+N
output capsules
modeling available source information that has al-
ready been translated and has not yet been trans-
lated, among which there are
J
translated capsules
ΦT
=
1, ..., ΦJ)
and
N
untranslated capsules
ΦU
=
J+1, ..., ΦJ+N)
, respectively. The encoder
hidden states
z
are regarded as input capsules. To
determine how much of
zi
needs to be sent to
Φj
at step
t
, the assignment probability
c(t)
ij
in SiMT
is modified as:
c(t)
ij =
exp b(t)
ij
Plexp b(t)
il
if ig(t)
0otherwise
,(2)
where
b(t)
ij
measures the cumulative similarity be-
tween
zi
and
Φj
. Then
c(t)
ij
is updated iteratively
driven by the decoding state and is seen as the
affiliation of
zi
belonging to
Φj
after the last iter-
ation. For more details about Capsule Networks
with GDR, please refer to Zheng et al. (2019b).
On this basis, the translation degree of
xi
is calcu-
lated by aggregating the assignment probability of
routing to the translated capsules at step t:
d(t)
i=
J
X
j=1
c(t)
ij .(3)
Segment Constraint
To ensure that the model
can recognize the translation degree of source to-
kens, the model requires additional guidance. Ac-
cording to partial equivalence, the translated source
content should be semantically equivalent to the
generated target tokens. On the contrary, the un-
translated source content and unread source tokens
should be semantically equivalent to target tokens
not generated. So we introduce mean square error
to induce the learning of output capsules:
LS=1
M
M
X
t=1
(kΦT
tWTHT
tk2
+kΦU
t+WU
eZtWU
dHU
tk2)
,(4)
where
WT
,
WU
e
and
WU
d
are learnable parame-
ters.
HT
t
and
HU
t
are the averages of hidden states
of the generated target tokens and target tokens not
generated, which are calculated respectively:
HT
t=1
t1
t1
X
τ=1
hτ,(5)
摘要:

TurningFixedtoAdaptive:IntegratingPost-EvaluationintoSimultaneousMachineTranslationShoutaoGuo1,2,ShaoleiZhang1,2,YangFeng1,21KeyLaboratoryofIntelligentInformationProcessingInstituteofComputingTechnology,ChineseAcademyofSciences(ICT/CAS)2UniversityofChineseAcademyofSciences,Beijing,China{guoshoutao2...

展开>> 收起<<
Turning Fixed to Adaptive Integrating Post-Evaluation into Simultaneous Machine Translation Shoutao Guo12 Shaolei Zhang12 Yang Feng12.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:15 页 大小:2.96MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注