Retroﬁtting Multilingual Sentence Embeddings with Abstract Meaning Representation Deng CaiXin LiyJackie Chun-Sing HoLidong BingWai Lam

2025-04-29 1 0 490.41KB 16 页 10玖币

侵权投诉

Retroﬁtting Multilingual Sentence Embeddings

with Abstract Meaning Representation∗

Deng Cai♥Xin Li♠,†Jackie Chun-Sing Ho♥Lidong Bing♠Wai Lam♥

♥The Chinese University of Hong Kong

♠DAMO Academy, Alibaba Group

thisisjcykcd@gmail.com

{xinting.lx,l.bing}@alibaba-inc.com

{schun,wlam}@se.cuhk.edu.hk

Abstract

We introduce a new method to improve

existing multilingual sentence embeddings

with Abstract Meaning Representation (AMR).

Compared with the original textual input,

AMR is a structured semantic representation

that presents the core concepts and relations

in a sentence explicitly and unambiguously.

It also helps reduce surface variations across

different expressions and languages. Unlike

most prior work that only evaluates the abil-

ity to measure semantic similarity, we present

a thorough evaluation of existing multilin-

gual sentence embeddings and our improved

versions, which include a collection of ﬁve

transfer tasks in different downstream applica-

tions. Experiment results show that retroﬁtting

multilingual sentence embeddings with AMR

leads to better state-of-the-art performance on

both semantic textual similarity and transfer

tasks. Our codebase and evaluation scripts

can be found at https://github.com/jcyk/

MSE-AMR.

1 Introduction

Multilingual sentence embedding (MSE) aims to

provide universal sentence representations shared

across different languages (Hermann and Blun-

som,2014;Pham et al.,2015;Schwenk and

Douze,2017). As an important ingredient of cross-

lingual and multilingual natural language process-

ing (NLP), MSE has recently attracted increasing

attention in the NLP community. MSE has been

widely adopted to bridge the language barrier in

several downstream applications such as bitext min-

ing (Guo et al.,2018;Schwenk,2018), document

classiﬁcation (Eriguchi et al.,2018;Singla et al.,

2018;Yu et al.,2018) and natural language infer-

ence (Artetxe and Schwenk,2019). Prior work typ-

ically borrows ﬁxed-size embedding vectors from

∗

This work was supported by Alibaba Group through

the Alibaba Innovative Research (AIR) Program.

†

XL is the

corresponding author.

multilingual neural machine models (Schwenk and

Douze,2017;Yu et al.,2018) or trains siamese

neural networks to align the semantically similar

sentences written in different languages (Wieting

et al.,2019;Yang et al.,2020;Feng et al.,2020).

Despite the recent progress, the current evalua-

tion of multilingual sentence embeddings has fo-

cused on cross-lingual Semantic Textual Similar-

ity (STS) (Agirre et al.,2016;Cer et al.,2017) or

bi-text mining tasks (Zweigenbaum et al.,2018;

Artetxe and Schwenk,2019). Nevertheless, as

pointed out by Gao et al. (2021), the evaluation

on semantic similarity may not be sufﬁcient be-

cause better performance on STS does not always

indicate better embeddings for downstream tasks.

Therefore, for a more comprehensive MSE evalua-

tion, it is necessary to additionally evaluate down-

stream tasks, which is largely ignored in recent

work (Chidambaram et al.,2019;Reimers and

Gurevych,2020;Feng et al.,2020). In this pa-

per, we collect a set of multilingual transfer tasks

and test various existing multilingual sentence em-

beddings. We ﬁnd that different methods excel at

different tasks and the conclusions drawn from the

STS evaluation do not always hold in the transfer

tasks and vice versa. We aim to establish a stan-

dardized evaluation protocol for future research in

multilingual sentence embeddings.

To improve the quality of existing MSE mod-

els, we explore Abstract Meaning Representation

(AMR) (Banarescu et al.,2013), a symbolic seman-

tic representation, for augmenting existing neural

semantic representations. Our motivation is two-

fold. First, AMR explicitly offers core concepts

and relations in a sentence. This helps prevent

learning the superﬁcial patterns or spurious correla-

tions in the training data, which do not generalize

well to new domains or tasks (Poliak et al.,2018;

Clark et al.,2019). Second, AMR reduces the vari-

ances in surface forms with the same meaning. This

helps alleviate the data sparsity issue as there are

arXiv:2210.09773v1 [cs.CL] 18 Oct 2022

rich lexical variations across different languages.

On the other hand, despite that AMR is advo-

cated to act as an interlingua (Xue et al.,2014;

Hajiˇ

c et al.,2014;Damonte and Cohen,2018), lit-

tle work has been done to reﬂect on the ability of

AMR to have impact on subsequent tasks. In order

to advance research in AMR and its applications,

multilingual sentence embedding can be seen as

an important benchmark for highlighting its abil-

ity to abstract away from surface realizations and

represent the core concepts expressed in the sen-

tence. To our knowledge, this is the ﬁrst attempt

to leverage the AMR semantic representation for

multilingual NLP.

We learn AMR embeddings with contrastive

siamese network (Gao et al.,2021) and AMR

graphs derived from different languages (Cai et al.,

2021). Experiment results on 10 STS tasks and

5 transfer tasks with four state-of-the-art embed-

ding methods show that retroﬁtting multilingual

sentence embeddings with AMR improves the per-

formance substantially and consistently.

Our contribution is three-fold.

•

We propose a new method to obtain high-quality

semantic vectors for multilingual sentence rep-

resentation, which takes advantage of language-

invariant Abstract Meaning Representation that

captures the core semantics of sentences.

•

We present a thorough evaluation of multilingual

sentence embeddings, which goes beyond seman-

tic textual similarity and includes various transfer

tasks in downstream applications.

•

We demonstrate that retroﬁtting multilingual sen-

tence embeddings with Abstract Meaning Repre-

sentation leads to better performance on both se-

mantic textual similarity and transfer tasks.

2 Related Work

Universal Sentence Embeddings

Our work

aims to learn universal sentence representations,

which should be useful for a broad set of ap-

plications. There are two lines of research for

universal sentence embeddings: unsupervised ap-

proaches and supervised approaches. Early unsu-

pervised approaches (Kiros et al.,2015;Hill et al.,

2016;Gan et al.,2017;Logeswaran and Lee,2018)

design various surrounding sentence reconstruc-

tion/prediction objectives for sentence representa-

tion learning. Jernite et al. (2017) exploit sentence-

level discourse relations as supervision signals for

training sentence embedding model. Instead of us-

ing the interactions of sentences within a document,

Le and Mikolov (2014) propose to learn the embed-

dings for texts of arbitrary length on top of word

vectors. Likewise, Chen (2017); Pagliardini et al.

(2018); Yang et al. (2019b) calculate sentence em-

beddings from compositional

-gram features. Re-

cent approaches often adopt contrastive objectives

(Zhang et al.,2020;Giorgi et al.,2021;Wu et al.,

2020;Meng et al.,2021;Carlsson et al.,2021;Kim

et al.,2021;Yan et al.,2021;Gao et al.,2021) by

taking different views—from data augmentation or

different copies of models—of the same sentence

as training examples.

On the other hand, supervised methods (Con-

neau et al.,2017;Cer et al.,2018;Reimers and

Gurevych,2019;Gao et al.,2021) take advan-

tage of labeled natural language inference (NLI)

datasets (Bowman et al.,2015;Williams et al.,

2018), where a sentence embedding model is ﬁne-

tuned on entailment or contradiction sentence pairs.

Furthermore, Wieting and Gimpel (2018); Wieting

et al. (2020) demonstrate that bilingual and back-

translation corpora provide useful supervision for

learning semantic similarity. Another line of work

focuses on regularizing embeddings (Li et al.,2020;

Su et al.,2021;Huang et al.,2021) to alleviate the

representation degeneration problem.

Multilingual Sentence Embeddings

Recently,

multilingual sentence representations have at-

tracted increasing attention. Schwenk and Douze

(2017); Yu et al. (2018); Artetxe and Schwenk

(2019) propose to use encoders from multilingual

neural machine translation to produce universal

representations across different languages. Chi-

dambaram et al. (2019); Wieting et al. (2019); Yang

et al. (2020); Feng et al. (2020) ﬁne-tune siamese

networks (Bromley et al.,1993) with contrastive

objectives using parallel corpora. Reimers and

Gurevych (2020) train a multilingual model to map

sentences to the same embedding space of an exist-

ing English model. Different from existing work,

our work resorts to multilingual AMR, a language-

agnostic disambiguated semantic representation,

for performance enhancement.

Evaluation of Sentence Embeddings

Tradition-

ally, the mainstream evaluation for assessing the

quality of English-only sentence embeddings is

based on the Semantic Textual Similarity (STS)

tasks and a suite of downstream classiﬁcation

tasks. The STS tasks (Agirre et al.,2012,2013,

2014,2015,2016;Marelli et al.,2014;Cer et al.,

2017) calculate the embedding distance of sentence

pairs and compare them with the human-annotated

scores for semantic similarity. The classiﬁcation

tasks (e.g., sentiment analysis) from SentEval (Con-

neau and Kiela,2018) take sentence embeddings

as ﬁxed input features to a logistic regression clas-

siﬁer. These tasks are commonly used to bench-

mark the transferability of sentence embeddings on

downstream tasks. For multilingual sentence em-

beddings, most previous work has focused on cross-

lingual STS (Agirre et al.,2016;Cer et al.,2017)

and the relevant bi-text mining tasks (Zweigen-

baum et al.,2018;Artetxe and Schwenk,2019).

The evaluation on downstream transfer tasks has

been largely ignored (Chidambaram et al.,2019;

Reimers and Gurevych,2020;Feng et al.,2020).

Nevertheless, as pointed out in Gao et al. (2021)

in English scenarios, better performance on seman-

tic similarity tasks does not always indicate better

embeddings for transfer tasks. For a more compre-

hensive evaluation, in this paper, we collect a set of

multilingual transfer tasks and test various existing

multilingual sentence embeddings. We aim to es-

tablish a standardized evaluation protocol for future

research in multilingual sentence embeddings.

3 Preliminaries

3.1 Contrastive Siamese Network

Siamese network (Bromley et al.,1993) has at-

tracted considerable attention for self-supervised

representation learning. It has been extensively

adopted with contrastive learning (Hadsell et al.,

2006) for learning dense vector representations

of images and sentences (Reimers and Gurevych,

2019;Chen et al.,2020). The core idea of con-

trastive learning is to pull together the represen-

tations of semantically close objects (images or

sentences) and repulse the representations of neg-

ative pairs of dissimilar ones. Recent work in

computer vision (Caron et al.,2020;Grill et al.,

2020;Chen and He,2021;Zbontar et al.,2021)

has demonstrated that negative samples may not be

necessary. A similar observation was made in NLP

by Zhang et al. (2021) who adopted the BYOL

framework (Grill et al.,2020) for sentence rep-

resentation learning. In this work, we adopt the

framework in (Gao et al.,2021) with in-batch neg-

atives (Chen et al.,2017;Henderson et al.,2017).

Formally, we assume a set of training examples

D={(xi, x+

i, x−

i)}N

i=1

, where

and

x−

are se-

mantically close and semantically irrelevant to

respectively. The training is done with stochas-

tic mini-batches. Each mini-batch consists of

examples and the training objective is deﬁned as:

`i=−log es(xi,x+

i)/τ

j=1 es(xi,x−

j)/τ +PM

j=1 es(xi,x+

j)/τ

(1)

where

s(·,·)

measures the similarity of two objects

and

is a scalar controlling the temperature of

training. As seen, other objects in the same mini-

batch (i.e.,

{x−

j}j6=i

and

{x+

j}j6=i

) are treated as

negatives for

. More concretely,

s(·,·)

computes

the cosine similarity between the representations

of two objects:

s(xi, xj) = hT

ihj

khik·khjk

where

and

are obtained from a neural encoder

fθ(·)

h=fθ(x)

. The model parameters

are then

optimized using the contrastive learning objective.

3.2 Multilingual AMR Parsing

AMR (Banarescu et al.,2013) is a broad-coverage

semantic formalism originally designed for English.

The accuracy of AMR parsing has been greatly im-

proved in recent years (Cai and Lam,2019,2020a;

Bevilacqua et al.,2021;Bai et al.,2022). Because

AMR is agnostic to syntactic and wording varia-

tions, recent work has suggested the potential of

AMR to work as an interlingua (Xue et al.,2014;

Hajiˇ

c et al.,2014;Damonte and Cohen,2018).

That is, we can represent the semantics in other lan-

guages using the corresponding AMR graph of the

semantic equivalent in English. A number of cross-

lingual AMR parsers (Damonte and Cohen,2018;

Blloshmi et al.,2020;Sheth et al.,2021;Procopio

et al.,2021;Cai et al.,2021) have been developed

to transform non-English texts into AMR graphs.

Most of them rely on pre-trained multilingual lan-

guage models and synthetic parallel data. In par-

ticular, Cai et al. (2021) proposed to learn a multi-

lingual AMR parser from an English AMR parser

via knowledge distillation. Their single parser is

trained for ﬁve different languages (German, Span-

ish, Italian, Chinese, and English) and achieves

state-of-the-art parsing accuracies. In addition, the

one-for-all design maintains parsing efﬁciency and

reduces prediction inconsistency across different

languages. Thus, we adopt the multilingual AMR

parser of Cai et al. (2021) in our experiments.1

1https://github.com/jcyk/XAMR

It is worth noting that the multilingual parser is

capable of parsing many other languages, including

those it has not been explicitly trained for, thanks to

the generalization power inherited from pre-trained

multilingual language models (Tang et al.,2020;

Liu et al.,2020). In Section 4.2, we further extend

the training of the multilingual parser to French, an-

other major language, for improved performance.

4 Proposed Method

We ﬁrst introduce how we learn AMR embeddings

and then describe the whole pipeline for enhancing

existing sentence embeddings.

4.1 Learning AMR Embeddings

Linearization & Modeling

Given AMR is

graph-structured, a variety of graph neural net-

works (Song et al.,2018;Beck et al.,2018;Ribeiro

et al.,2019;Guo et al.,2019;Cai and Lam,2020b;

Ribeiro et al.,2019) have been proposed for the

representation learning of AMR. However, recent

work (Zhang et al.,2019a;Mager et al.,2020;

Bevilacqua et al.,2021) has demonstrated that

the power of existing pre-trained language mod-

els based on the Transformer architecture (Vaswani

et al.,2017), such as BERT (Devlin et al.,2019),

GPT2 (Radford et al.,2019) and BART (Lewis

et al.,2020), can be leveraged for achieving better

performance. Following them, we also take BERT

as the backbone model.

Since Transformer-based language models are

designed for sequential data, to encode graphical

AMR, we resort to the linearization techniques

in (Bevilacqua et al.,2021). Figure 1illustrates

the linearization of AMR graphs. For each AMR

graph, a DFS traversal is performed starting from

the root node of the graph, and the trajectory is

recorded. We use parentheses to mark the hier-

archy of node depths. Bevilacqua et al. (2021)

also proposed to use special tokens for indicating

variables in the linearized graph and for handling

reentrancies (i.e., a node plays multiple roles in

the graph). However, the introduction of special

tokens signiﬁcantly increases the length of the out-

put sequence (almost 50% increase). We remove

this feature and simply repeat the nodes when re-

visiting happens. This signiﬁcantly reduces the

length of the output sequence and allows more efﬁ-

cient modeling with Transformer-based language

models. The downside is that reentrancy informa-

tion becomes unrecoverable. However, we empir-

The facts are

accessible to you.

You have no

access to the facts.

(access-01 :polarity -

:ARG0 you :ARG1 fact)

access-01

ARG0

ARG1

you

fact

polarity

parsing linearizing

(access-01

:ARG0 you :ARG1 fact)

AMR Graph

Figure 1: The parsing and linearization pipeline.

ically found that the shortened sequences lead to

better performance. The linearizations of AMR

graphs are then treated as plain token sequences

when being fed into Transformer-based language

models. Note that AMR linearization introduces

additional tokens that are rarely shown in English

(e.g., “ARG2" and “belong-01"). These tokens may

not be included in the original vocabulary of ex-

isting language models and could be segmented

into sub-tokens (e.g., “belong-01"

⇒

“belong", “-",

“01"), which are less meaningful and increase the

sequence length. To deal with this problem, we

extend the original vocabulary of existing language

models to include all the relation and frame names

occurring at least 5 times in the AMR sembank

(LDC2017T10).

Positive & Negative Examples

Contrastive

learning aims to learn effective representations by

pulling semantically similar examples together and

pushing apart dissimilar examples. Following the

discussion in Section 3.1, the most critical question

in contrastive learning is how to obtain positive and

negative examples. In language representations,

positive examples

are often constructed by ap-

plying minimal distortions (e.g., word deletion, re-

ordering, and substitution) on

(Wu et al.,2020;

Meng et al.,2021) or introducing some random

noise (e.g., dropout (Srivastava et al.,2014)) to the

modeling function

fθ

(Gao et al.,2021). On the

other hand, negative examples

x−

are usually sam-

pled from other sentences. However, prior work

(Conneau et al.,2017;Gao et al.,2021) has demon-

strated that entailment/contradiction sentence pairs

in supervised natural language inference (NLI)

datasets (Bowman et al.,2015;Williams et al.,

2018) are better positive/negative pairs for learn-

ing sentence embeddings. Following (Gao et al.,

2021), we borrow the supervisions from two NLI

datasets, namely SNLI (Bowman et al.,2015) and

MNLI (Williams et al.,2018). In the NLI datasets,

given one premise, there are one entailment hy-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RetrottingMultilingualSentenceEmbeddingswithAbstractMeaningRepresentationDengCai~XinLi;yJackieChun-SingHo~LidongBingWaiLam~~TheChineseUniversityofHongKongDAMOAcademy,AlibabaGroupthisisjcykcd@gmail.com{xinting.lx,l.bing}@alibaba-inc.com{schun,wlam}@se.cuhk.edu.hkAbstractWeintroduceanewmethodtoim...

展开>> 收起<<

Retroﬁtting Multilingual Sentence Embeddings with Abstract Meaning Representation Deng CaiXin LiyJackie Chun-Sing HoLidong BingWai Lam.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Retroﬁtting Multilingual Sentence Embeddings with Abstract Meaning Representation Deng CaiXin LiyJackie Chun-Sing HoLidong BingWai Lam

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: