Cross-Align Modeling Deep Cross-lingual Interactions for Word Alignment Siyu Lai1 Zhen Yang2 Fandong Meng2 Yufeng Chen1y

2025-04-27 0 0 2.03MB 11 页 10玖币

侵权投诉

Cross-Align: Modeling Deep Cross-lingual Interactions for Word

Alignment

Siyu Lai1∗

, Zhen Yang2, Fandong Meng2, Yufeng Chen1†

Jinan Xu1and Jie Zhou2

1Beijing Key Lab of Trafﬁc Data Analysis and Mining,

Beijing Jiaotong University, Beijing, China

2Pattern Recognition Center, WeChat AI, Tencent Inc, China

{siyulai,chenyf,jaxu}@bjtu.edu.cn,

{zieenyang,fandongmeng,withtomzhou}@tencent.com

Abstract

Word alignment which aims to extract lexicon

translation equivalents between source and tar-

get sentences, serves as a fundamental tool for

natural language processing. Recent studies

in this area have yielded substantial improve-

ments by generating alignments from contex-

tualized embeddings of the pre-trained multi-

lingual language models. However, we ﬁnd

that the existing approaches capture few in-

teractions between the input sentence pairs,

which degrades the word alignment quality

severely, especially for the ambiguous words

in the monolingual context. To remedy this

problem, we propose Cross-Align to model

deep interactions between the input sentence

pairs, in which the source and target sentences

are encoded separately with the shared self-

attention modules in the shallow layers, while

cross-lingual interactions are explicitly con-

structed by the cross-attention modules in the

upper layers. Besides, to train our model effec-

tively, we propose a two-stage training frame-

work, where the model is trained with a simple

Translation Language Modeling (TLM) objec-

tive in the ﬁrst stage and then ﬁnetuned with a

self-supervised alignment objective in the sec-

ond stage. Experiments show that the pro-

posed Cross-Align achieves the state-of-the-

art (SOTA) performance on four out of ﬁve lan-

guage pairs.1

1 Introduction

Word alignment which aims to extract the lexicon

translation equivalents between the input source-

target sentence pairs (Brown et al.,1993;Zenkel

et al.,2019;Garg et al.,2019;Jalili Sabet et al.,

2020), has been widely used in machine transla-

tion (Och and Ney,2000;Arthur et al.,2016;Yang

∗

Work done when Siyu were interning at Pattern Recog-

nition Center, WeChat AI, Tencent Inc, China.

†Yufeng Chen is the corresponding author.

The code is publicly available at:

https://github.com/

lisasiyu/Cross-Align

israel

Figure 1: An example from Dou and Neubig (2021).

There is a misalignment between “以” and “to” and

“for”. Red boxes denote the gold alignments.

et al.,2020,2021), transfer text annotations (Fang

and Cohn,2016;Huck et al.,2019), typological

analysis (Lewis and Xia,2008), generating adver-

sarial examples (Lai et al.,2022), etc. Statistical

word aligners based on the IBM translation models

(Brown et al.,1993), such as GIZA++ (Och and

Ney,2003) and FastAlign (Dyer et al.,2013), have

remained popular over the past thirty years for their

good performance. Recently, with the advancement

of deep neural models, neural aligners have devel-

oped rapidly and surpassed the statistical aligners

on many language pairs. Typically, these neural ap-

proaches can be divided into two branches:

Neural

Machine Translation (NMT) based aligners

and

Language Model (LM) based aligners.

NMT based aligners

(Garg et al.,2019;Zenkel

et al.,2020;Chen et al.,2020,2021;Zhang and van

Genabith,2021) take alignments as a by-product of

NMT systems by using attention weights to extract

alignments. As NMT models are unidirectional,

two NMT models (source-to-target and target-to-

source) are required to obtain the ﬁnal alignments,

which makes the NMT based aligners less efﬁcient.

As opposed to the NMT based aligners, the

based aligners

generate alignments from the con-

textualized embeddings of the directionless mul-

tilingual language models. They extract contex-

tualized embeddings from LMs and induce align-

arXiv:2210.04141v1 [cs.CL] 9 Oct 2022

ments based on the matrix of embedding similar-

ities (Jalili Sabet et al.,2020;Dou and Neubig,

2021). While achieving some improvements in

the alignment quality and efﬁciency, we ﬁnd that

the existing LM based aligners capture few inter-

actions between the input source-target sentence

pairs. Speciﬁcally, SimAlign (Jalili Sabet et al.,

2020) encodes the source and target sentences sep-

arately without attending to the context in the other

language. Dou and Neubig (2021) further propose

Awesome-Align, which considers the cross-lingual

context by taking the concatenation of the sentence

pairs as inputs during training, but still encodes

them separately during inference.

However, the lack of interaction between the in-

put source-target sentence pairs degrades the align-

ment quality severely, especially for the ambigu-

ous words in the monolingual context. Figure 1

presents an example of our reproduced results from

Awesome-Align. The ambiguous Chinese word

“

以

” has two different meanings: 1) a preposition

(“to”, “as”, “for” in English), 2) the abbreviation

of the word “

以色列

” (“Israel” in English). In this

example, the word “

以

” is misaligned to “to” and

“for” as the model does not fully consider the word

“Israel’ in the target sentence. Intuitively, the cross-

lingual context is very helpful for alleviating the

meaning confusion in the task of word alignment.

Based on the above observation, we propose

Cross-Align

, which fully considers the cross-

lingual context by modeling deep interactions be-

tween the input sentence pairs. Speciﬁcally, Cross-

Align encodes the monolingual information for

source and target sentences independently with the

shared self-attention modules in the shallow layers,

and then explicitly models deep cross-lingual in-

teractions with the cross-attention modules in the

upper layers. Besides, to train Cross-Align effec-

tively, we propose a two-stage training framework,

where the model is trained with the simple TLM

objective (Conneau and Lample,2019) to learn the

cross-lingual representations in the ﬁrst stage, and

then ﬁnetuned with a self-supervised alignment

objective to bridge the gap between training and in-

ference in the second stage. We conduct extensive

experiments on ﬁve different language pairs and the

results show that our approach achieves the SOTA

performance on four out of ﬁve language pairs.

Compared to the existing approaches which apply

In Ro-En, we achieve the best performance among models

in the same line, but perform a little poorer than the NMT

based models which have much more parameters than ours.

many complex training objectives, our approach is

simple yet effective.

Our main contributions are summarized as fol-

lows:

•

We propose Cross-Align, a novel word alignment

model which utilizes the self-attention modules

to encode monolingual representations and the

cross-attention modules to model cross-lingual

interactions.

•

We propose a two-stage training framework to

boost model performance on word alignment,

which is simple yet effective.

•

Extensive experiments show that the proposed

model achieves SOTA performance on four out

of ﬁve different language pairs.

2 Related Work

2.1 NMT based Aligner

Recently, there is a surge of interest in studying

alignment based on the attention weights (Vaswani

et al.,2017) of NMT systems. However, the naive

attention may fails to capture clear word alignments

(Serrano and Smith,2019). Therefore, Zenkel et al.

(2019) and Garg et al. (2019) extend the Trans-

former architecture with a separate alignment layer

on top of the decoder, and produce competitive

results compared to GIZA++. Chen et al. (2020)

further improve alignment quality by adapting the

alignment induction with the to-be-aligned target

token. Recently, Chen et al. (2021) and Zhang

and van Genabith (2021) propose self-supervised

models that take advantage of the full context on

the target side, and achieve the SOTA results. Al-

though NMT based aligners achieve promising re-

sults, there are still some disadvantages: 1) The

inherent discrepancy between translation task and

word alignment is not eliminated, so the reliability

of the attention mechanism is still under suspicion

(Li et al.,2019); 2) Since NMT models are unidirec-

tional, it requires NMT models in both directions to

obtain ﬁnal alignment, which is lack of efﬁciency.

2.2 LM based Aligner

Recent pre-trained multilingual language mod-

els like mBERT (Devlin et al.,2019) and XLM-

R (Conneau and Lample,2019) achieve promis-

ing results on many cross-lingual transfer tasks

(Liang et al.,2020;Hu et al.,2020;Wang et al.,

2022a,b). Jalili Sabet et al. (2020) prove that mul-

tilingual LMs are also helpful in word alignment

Self-Attention

Feed Forward

Cross-Attention

Feed Forward

Self-Attention

Feed Forward

source text

QK V Q

target text

Shared

KVKV

𝑛X

𝒕𝟏𝒕𝟐𝒕𝟑

...

𝒕𝒋

𝒔𝟏𝒔𝟐𝒔𝟑

...

𝒔𝒊

𝒙𝟏𝒙

𝟐

𝒙

𝟑

...

𝒙𝒊𝒚𝟏𝒚

𝟐

𝒚

𝟑

...

𝒚𝒋

similarity

matrix

⨀

Encoder Encoder

(a) SimAlign

Encoder

source text target text

(b) Awesome-Align

𝑚X

𝒙𝟏𝒙

𝟐

𝒙

𝟑

...

𝒙𝒊

𝒙𝟏𝒙

𝟐

𝒙

𝟑

...

𝒙𝒊

𝒚𝟏𝒚

𝟐

𝒚

𝟑

...

𝒚𝒋

𝒔𝟏𝒔𝟐𝒔𝟑

...

𝒔𝒊𝒕𝟏𝒕𝟐𝒕𝟑

...

𝒕𝒋

𝒔𝟏𝒔𝟐𝒔𝟑

...

𝒔𝒊𝒕𝟏𝒕𝟐𝒕𝟑

...

𝒕𝒋

𝑴𝒙𝑴𝒚

𝑪𝒙𝑪𝒚

𝒚𝟏𝒚

𝟐

𝒚

𝟑

...

𝒚𝒋

Figure 2: Comparison between different LM based aligners. (a) SimAlign (Jalili Sabet et al.,2020) encodes

source and target sentences separately. (b) AwesomeAlign (Dou and Neubig,2021) concatenates source and target

sentences together as inputs. (c) The proposed Cross-Align model.

task and propose SimAlign to extract alignments

from similarity matrices of contextualized embed-

dings without relying on parallel data (Figure 2(a)).

Awesome-Align further improves the alignment

quality of LMs by crafting several training objec-

tives based on parallel data, like masked language

modeling, TLM, and parallel sentence identiﬁca-

tion task. Although Awesome-Align has achieved

the SOTA performance among LM based aligners,

we ﬁnd it still has two main problems: 1) Dur-

ing training, they simply concatenate the source

and target sentences together as the input of self-

attention module (Figure 2(b)). However, Luo et al.

(2021) prove that self-attention module tends to fo-

cus on their own context, while ignores the paired

context, leading to few attention patterns across

languages in the self-attention module. 2) During

inference, they still encode the language pairs in-

dividually, which causes the cross-lingual context

unavailable when generating alignments.

There-

fore, Awesome-Align models few interactions be-

tween cross-lingual pairs. Based on the above ob-

servation, we propose Cross-Align, which aims to

model deep interactions of cross-lingual pairs to

solve these problems.

For Awesome-Align, concatenating the input sentence

pair during inference leads to poor results compared to sepa-

rately encoding. Please refer to Table 2for comparison results.

3 Method

In this section, we ﬁrst introduce the model archi-

tecture and then illustrate how we extract align-

ments from Cross-Align. Finally, we describe the

proposed two-stage training framework in detail.

3.1 Model Architecture

As shown in Figure 2(c), Cross-Align is com-

posed of a stack of

self-attention modules and

cross-attention modules (Vaswani et al.,2017).

Given a sentence

x={x1, x2, . . . , xi}

in the

source language and its corresponding parallel sen-

tence

y={y1, y2, . . . , yj}

in the target language,

Cross-Align ﬁrst encodes them separately with the

shared self-attention modules to extract the mono-

lingual representations, and then generate the cross-

lingual representations by fusing the source and

target monolingual representations with the cross-

attention modules. We elaborate the self-attention

module and cross-attention module as follows.

Self-Attention Module.

Each self-attention

module contains a self-attention sub-layer and

a fully connected feed-forward network (FFN).

The attention function maps a query (

) and a

set of key-value (

) pairs to an output. As

for self-attention, all queries, keys and values are

from the same language. Formally, the output of a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Cross-Align:ModelingDeepCross-lingualInteractionsforWordAlignmentSiyuLai1,ZhenYang2,FandongMeng2,YufengChen1y,JinanXu1andJieZhou21BeijingKeyLabofTrafcDataAnalysisandMining,BeijingJiaotongUniversity,Beijing,China2PatternRecognitionCenter,WeChatAI,TencentInc,China{siyulai,chenyf,jaxu}@bjtu.edu.cn,{z...

展开>> 收起<<

Cross-Align Modeling Deep Cross-lingual Interactions for Word Alignment Siyu Lai1 Zhen Yang2 Fandong Meng2 Yufeng Chen1y.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Cross-Align Modeling Deep Cross-lingual Interactions for Word Alignment Siyu Lai1 Zhen Yang2 Fandong Meng2 Yufeng Chen1y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: