correct, thereby introducing evaluation into trans-
lation policy. The goal of translation is to con-
vert sentences from the source language to the tar-
get language (Mujadia and Sharma,2021), so the
source and target sentences should contain the same
semantics (i.e., global equivalence). To ensure
the faithfulness of translation (Weng et al.,2020),
the source content that has already been translated
should be semantically equivalent to the previously
generated target tokens at each step (i.e., partial
equivalence) (Zhang and Feng,2022c). Further-
more, by comparing the changes between adjacent
steps, the increment of the source content being
translated should be semantically equivalent to the
current generated token (i.e., incremental equiva-
lence). Therefore, the rationality of the generated
target token can be reflected by the increment of the
source content being translated between adjacent
steps, which can be used to evaluate the READ and
WRITE actions.
In this paper, we propose a method of performing
the adaptive policy by integrating post-evaluation
into the fixed policy, which directs the model to
take READ or WRITE action based on the evalua-
tion results. Using partial equivalence, our model
can recognize the translation degree of source to-
kens (i.e., the degree to which the source token
has been translated), which represents how much
the source content is translated at each step. Then
naturally, by virtue of incremental equivalence, the
increment of translated source content can be re-
garded as the change in the translation degree of
available source tokens. Therefore, we can evaluate
the action by measuring the change in translation
degree. As shown in Figure 1, if the translation
degree has significant changes after generating a
candidate token, we think that the current gener-
ated token obtains enough source content, and thus
WRITE action should be taken. Otherwise, the
model should continue to take READ actions to
wait for the arrival of the required source tokens.
Experiments on WMT15 De
→
En and IWSLT15
En
→
Vi translation tasks show that our method can
exceed strong baselines under all latency.
2 Background
Transformer (Vaswani et al.,2017), which con-
sists of encoder and decoder, is the most widely
used neural machine translation model. Given a
source sentence
x
=
(x1, ..., xI)
, the encoder maps
it into a sequence of hidden states
z
=
(z1, ..., zI)
.
The decoder generates target hidden states
h
=
(h1, ..., hM)
and predicts the target sentence
y
=
(y1, ..., yM)based on zautoregressively.
Our method is based on wait-
k
policy (Ma
et al.,2019) and Capsule Networks (Hinton et al.,
2011) with Guided Dynamic Routing (Zheng et al.,
2019b), so we briefly introduce them.
2.1 Wait-kPolicy
Wait-
k
policy, which belongs to fixed policy, takes
k
READ actions first and then takes READ and
WRITE actions alternately. Define a monotonic
non-decreasing function
g(t)
, which represents the
number of available source tokens when translating
target token
yt
. For wait-
k
policy,
g(t)
can be
calculated as:
g(t;k) = min{k+t−1, I},(1)
where Iis the length of the source sentence.
To avoid the recalculation of the encoder hidden
states when a new source token is read, unidirec-
tional encoder (Elbayad et al.,2020) is proposed
to make each source token only attend to its previ-
ous tokens. Besides, multi-path method (Elbayad
et al.,2020) optimizes the model by sampling
k
uni-
formly during training and makes a unified model
obtain the translation performance comparable to
wait-kpolicy under all latency.
2.2 Capsule Networks with Guided Dynamic
Routing
Guided Dynamic Routing (GDR) is a variant of
routing-by-agreement mechanism (Sabour et al.,
2017) in Capsule Networks and makes input cap-
sules route to corresponding output capsules driven
by the decoding state at each step. In detail, en-
coder hidden states
z
are regarded as a sequence
of input capsules, and a layer of output capsules is
added to the top of the encoder to model different
categories of source information. The decoding
state then directs each input capsule to find its affil-
iation to each output capsule at each step, thereby
solving the problem of assigning source tokens to
different categories.
3 The Proposed Method
The architecture of our method is shown in Figure
2. Our method first guides the model to recognize
the translation degree of available source tokens
based on partial equivalence during training via the