Cross-Align Modeling Deep Cross-lingual Interactions for Word Alignment Siyu Lai1 Zhen Yang2 Fandong Meng2 Yufeng Chen1y

2025-04-27 0 0 2.03MB 11 页 10玖币
侵权投诉
Cross-Align: Modeling Deep Cross-lingual Interactions for Word
Alignment
Siyu Lai1
, Zhen Yang2, Fandong Meng2, Yufeng Chen1
,
Jinan Xu1and Jie Zhou2
1Beijing Key Lab of Traffic Data Analysis and Mining,
Beijing Jiaotong University, Beijing, China
2Pattern Recognition Center, WeChat AI, Tencent Inc, China
{siyulai,chenyf,jaxu}@bjtu.edu.cn,
{zieenyang,fandongmeng,withtomzhou}@tencent.com
Abstract
Word alignment which aims to extract lexicon
translation equivalents between source and tar-
get sentences, serves as a fundamental tool for
natural language processing. Recent studies
in this area have yielded substantial improve-
ments by generating alignments from contex-
tualized embeddings of the pre-trained multi-
lingual language models. However, we find
that the existing approaches capture few in-
teractions between the input sentence pairs,
which degrades the word alignment quality
severely, especially for the ambiguous words
in the monolingual context. To remedy this
problem, we propose Cross-Align to model
deep interactions between the input sentence
pairs, in which the source and target sentences
are encoded separately with the shared self-
attention modules in the shallow layers, while
cross-lingual interactions are explicitly con-
structed by the cross-attention modules in the
upper layers. Besides, to train our model effec-
tively, we propose a two-stage training frame-
work, where the model is trained with a simple
Translation Language Modeling (TLM) objec-
tive in the first stage and then finetuned with a
self-supervised alignment objective in the sec-
ond stage. Experiments show that the pro-
posed Cross-Align achieves the state-of-the-
art (SOTA) performance on four out of five lan-
guage pairs.1
1 Introduction
Word alignment which aims to extract the lexicon
translation equivalents between the input source-
target sentence pairs (Brown et al.,1993;Zenkel
et al.,2019;Garg et al.,2019;Jalili Sabet et al.,
2020), has been widely used in machine transla-
tion (Och and Ney,2000;Arthur et al.,2016;Yang
Work done when Siyu were interning at Pattern Recog-
nition Center, WeChat AI, Tencent Inc, China.
Yufeng Chen is the corresponding author.
1
The code is publicly available at:
https://github.com/
lisasiyu/Cross-Align
!
israel
Figure 1: An example from Dou and Neubig (2021).
There is a misalignment between “” and “to” and
“for”. Red boxes denote the gold alignments.
et al.,2020,2021), transfer text annotations (Fang
and Cohn,2016;Huck et al.,2019), typological
analysis (Lewis and Xia,2008), generating adver-
sarial examples (Lai et al.,2022), etc. Statistical
word aligners based on the IBM translation models
(Brown et al.,1993), such as GIZA++ (Och and
Ney,2003) and FastAlign (Dyer et al.,2013), have
remained popular over the past thirty years for their
good performance. Recently, with the advancement
of deep neural models, neural aligners have devel-
oped rapidly and surpassed the statistical aligners
on many language pairs. Typically, these neural ap-
proaches can be divided into two branches:
Neural
Machine Translation (NMT) based aligners
and
Language Model (LM) based aligners.
NMT based aligners
(Garg et al.,2019;Zenkel
et al.,2020;Chen et al.,2020,2021;Zhang and van
Genabith,2021) take alignments as a by-product of
NMT systems by using attention weights to extract
alignments. As NMT models are unidirectional,
two NMT models (source-to-target and target-to-
source) are required to obtain the final alignments,
which makes the NMT based aligners less efficient.
As opposed to the NMT based aligners, the
LM
based aligners
generate alignments from the con-
textualized embeddings of the directionless mul-
tilingual language models. They extract contex-
tualized embeddings from LMs and induce align-
arXiv:2210.04141v1 [cs.CL] 9 Oct 2022
ments based on the matrix of embedding similar-
ities (Jalili Sabet et al.,2020;Dou and Neubig,
2021). While achieving some improvements in
the alignment quality and efficiency, we find that
the existing LM based aligners capture few inter-
actions between the input source-target sentence
pairs. Specifically, SimAlign (Jalili Sabet et al.,
2020) encodes the source and target sentences sep-
arately without attending to the context in the other
language. Dou and Neubig (2021) further propose
Awesome-Align, which considers the cross-lingual
context by taking the concatenation of the sentence
pairs as inputs during training, but still encodes
them separately during inference.
However, the lack of interaction between the in-
put source-target sentence pairs degrades the align-
ment quality severely, especially for the ambigu-
ous words in the monolingual context. Figure 1
presents an example of our reproduced results from
Awesome-Align. The ambiguous Chinese word
” has two different meanings: 1) a preposition
(“to”, “as”, “for” in English), 2) the abbreviation
of the word “
” (“Israel” in English). In this
example, the word “
” is misaligned to “to” and
“for” as the model does not fully consider the word
“Israel’ in the target sentence. Intuitively, the cross-
lingual context is very helpful for alleviating the
meaning confusion in the task of word alignment.
Based on the above observation, we propose
Cross-Align
, which fully considers the cross-
lingual context by modeling deep interactions be-
tween the input sentence pairs. Specifically, Cross-
Align encodes the monolingual information for
source and target sentences independently with the
shared self-attention modules in the shallow layers,
and then explicitly models deep cross-lingual in-
teractions with the cross-attention modules in the
upper layers. Besides, to train Cross-Align effec-
tively, we propose a two-stage training framework,
where the model is trained with the simple TLM
objective (Conneau and Lample,2019) to learn the
cross-lingual representations in the first stage, and
then finetuned with a self-supervised alignment
objective to bridge the gap between training and in-
ference in the second stage. We conduct extensive
experiments on five different language pairs and the
results show that our approach achieves the SOTA
performance on four out of five language pairs.
2
Compared to the existing approaches which apply
2
In Ro-En, we achieve the best performance among models
in the same line, but perform a little poorer than the NMT
based models which have much more parameters than ours.
many complex training objectives, our approach is
simple yet effective.
Our main contributions are summarized as fol-
lows:
We propose Cross-Align, a novel word alignment
model which utilizes the self-attention modules
to encode monolingual representations and the
cross-attention modules to model cross-lingual
interactions.
We propose a two-stage training framework to
boost model performance on word alignment,
which is simple yet effective.
Extensive experiments show that the proposed
model achieves SOTA performance on four out
of five different language pairs.
2 Related Work
2.1 NMT based Aligner
Recently, there is a surge of interest in studying
alignment based on the attention weights (Vaswani
et al.,2017) of NMT systems. However, the naive
attention may fails to capture clear word alignments
(Serrano and Smith,2019). Therefore, Zenkel et al.
(2019) and Garg et al. (2019) extend the Trans-
former architecture with a separate alignment layer
on top of the decoder, and produce competitive
results compared to GIZA++. Chen et al. (2020)
further improve alignment quality by adapting the
alignment induction with the to-be-aligned target
token. Recently, Chen et al. (2021) and Zhang
and van Genabith (2021) propose self-supervised
models that take advantage of the full context on
the target side, and achieve the SOTA results. Al-
though NMT based aligners achieve promising re-
sults, there are still some disadvantages: 1) The
inherent discrepancy between translation task and
word alignment is not eliminated, so the reliability
of the attention mechanism is still under suspicion
(Li et al.,2019); 2) Since NMT models are unidirec-
tional, it requires NMT models in both directions to
obtain final alignment, which is lack of efficiency.
2.2 LM based Aligner
Recent pre-trained multilingual language mod-
els like mBERT (Devlin et al.,2019) and XLM-
R (Conneau and Lample,2019) achieve promis-
ing results on many cross-lingual transfer tasks
(Liang et al.,2020;Hu et al.,2020;Wang et al.,
2022a,b). Jalili Sabet et al. (2020) prove that mul-
tilingual LMs are also helpful in word alignment
Self-Attention
Feed Forward
Cross-Attention
Feed Forward
Self-Attention
Feed Forward
source text
QK V Q
K
V
target text
Shared
QQ
KVKV
𝑛X
𝒕𝟏𝒕𝟐𝒕𝟑
...
𝒕𝒋
𝒔𝟏𝒔𝟐𝒔𝟑
...
𝒔𝒊
𝒙𝟏𝒙
𝟐
𝒙
𝟑
...
𝒙𝒊𝒚𝟏𝒚
𝟐
𝒚
𝟑
...
𝒚𝒋
similarity
matrix
Encoder Encoder
(a) SimAlign
Encoder
(c) Our Cross-Align
source text target text
source text target text
(b) Awesome-Align
𝑚X
𝑚X
𝒙𝟏𝒙
𝟐
𝒙
𝟑
...
𝒙𝒊
𝒙𝟏𝒙
𝟐
𝒙
𝟑
...
𝒙𝒊
𝒚𝟏𝒚
𝟐
𝒚
𝟑
...
𝒚𝒋
𝒔𝟏𝒔𝟐𝒔𝟑
...
𝒔𝒊𝒕𝟏𝒕𝟐𝒕𝟑
...
𝒕𝒋
𝒔𝟏𝒔𝟐𝒔𝟑
...
𝒔𝒊𝒕𝟏𝒕𝟐𝒕𝟑
...
𝒕𝒋
𝑴𝒙𝑴𝒚
𝑪𝒙𝑪𝒚
𝒚𝟏𝒚
𝟐
𝒚
𝟑
...
𝒚𝒋
Figure 2: Comparison between different LM based aligners. (a) SimAlign (Jalili Sabet et al.,2020) encodes
source and target sentences separately. (b) AwesomeAlign (Dou and Neubig,2021) concatenates source and target
sentences together as inputs. (c) The proposed Cross-Align model.
task and propose SimAlign to extract alignments
from similarity matrices of contextualized embed-
dings without relying on parallel data (Figure 2(a)).
Awesome-Align further improves the alignment
quality of LMs by crafting several training objec-
tives based on parallel data, like masked language
modeling, TLM, and parallel sentence identifica-
tion task. Although Awesome-Align has achieved
the SOTA performance among LM based aligners,
we find it still has two main problems: 1) Dur-
ing training, they simply concatenate the source
and target sentences together as the input of self-
attention module (Figure 2(b)). However, Luo et al.
(2021) prove that self-attention module tends to fo-
cus on their own context, while ignores the paired
context, leading to few attention patterns across
languages in the self-attention module. 2) During
inference, they still encode the language pairs in-
dividually, which causes the cross-lingual context
unavailable when generating alignments.
3
There-
fore, Awesome-Align models few interactions be-
tween cross-lingual pairs. Based on the above ob-
servation, we propose Cross-Align, which aims to
model deep interactions of cross-lingual pairs to
solve these problems.
3
For Awesome-Align, concatenating the input sentence
pair during inference leads to poor results compared to sepa-
rately encoding. Please refer to Table 2for comparison results.
3 Method
In this section, we first introduce the model archi-
tecture and then illustrate how we extract align-
ments from Cross-Align. Finally, we describe the
proposed two-stage training framework in detail.
3.1 Model Architecture
As shown in Figure 2(c), Cross-Align is com-
posed of a stack of
m
self-attention modules and
n
cross-attention modules (Vaswani et al.,2017).
Given a sentence
x={x1, x2, . . . , xi}
in the
source language and its corresponding parallel sen-
tence
y={y1, y2, . . . , yj}
in the target language,
Cross-Align first encodes them separately with the
shared self-attention modules to extract the mono-
lingual representations, and then generate the cross-
lingual representations by fusing the source and
target monolingual representations with the cross-
attention modules. We elaborate the self-attention
module and cross-attention module as follows.
Self-Attention Module.
Each self-attention
module contains a self-attention sub-layer and
a fully connected feed-forward network (FFN).
The attention function maps a query (
Q
) and a
set of key-value (
K
-
V
) pairs to an output. As
for self-attention, all queries, keys and values are
from the same language. Formally, the output of a
摘要:

Cross-Align:ModelingDeepCross-lingualInteractionsforWordAlignmentSiyuLai1,ZhenYang2,FandongMeng2,YufengChen1y,JinanXu1andJieZhou21BeijingKeyLabofTrafcDataAnalysisandMining,BeijingJiaotongUniversity,Beijing,China2PatternRecognitionCenter,WeChatAI,TencentInc,China{siyulai,chenyf,jaxu}@bjtu.edu.cn,{z...

展开>> 收起<<
Cross-Align Modeling Deep Cross-lingual Interactions for Word Alignment Siyu Lai1 Zhen Yang2 Fandong Meng2 Yufeng Chen1y.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.03MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注