Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation Deng CaiXin LiyJackie Chun-Sing HoLidong BingWai Lam

2025-04-29 0 0 490.41KB 16 页 10玖币
侵权投诉
Retrofitting Multilingual Sentence Embeddings
with Abstract Meaning Representation
Deng CaiXin Li,Jackie Chun-Sing HoLidong BingWai Lam
The Chinese University of Hong Kong
DAMO Academy, Alibaba Group
thisisjcykcd@gmail.com
{xinting.lx,l.bing}@alibaba-inc.com
{schun,wlam}@se.cuhk.edu.hk
Abstract
We introduce a new method to improve
existing multilingual sentence embeddings
with Abstract Meaning Representation (AMR).
Compared with the original textual input,
AMR is a structured semantic representation
that presents the core concepts and relations
in a sentence explicitly and unambiguously.
It also helps reduce surface variations across
different expressions and languages. Unlike
most prior work that only evaluates the abil-
ity to measure semantic similarity, we present
a thorough evaluation of existing multilin-
gual sentence embeddings and our improved
versions, which include a collection of five
transfer tasks in different downstream applica-
tions. Experiment results show that retrofitting
multilingual sentence embeddings with AMR
leads to better state-of-the-art performance on
both semantic textual similarity and transfer
tasks. Our codebase and evaluation scripts
can be found at https://github.com/jcyk/
MSE-AMR.
1 Introduction
Multilingual sentence embedding (MSE) aims to
provide universal sentence representations shared
across different languages (Hermann and Blun-
som,2014;Pham et al.,2015;Schwenk and
Douze,2017). As an important ingredient of cross-
lingual and multilingual natural language process-
ing (NLP), MSE has recently attracted increasing
attention in the NLP community. MSE has been
widely adopted to bridge the language barrier in
several downstream applications such as bitext min-
ing (Guo et al.,2018;Schwenk,2018), document
classification (Eriguchi et al.,2018;Singla et al.,
2018;Yu et al.,2018) and natural language infer-
ence (Artetxe and Schwenk,2019). Prior work typ-
ically borrows fixed-size embedding vectors from
This work was supported by Alibaba Group through
the Alibaba Innovative Research (AIR) Program.
XL is the
corresponding author.
multilingual neural machine models (Schwenk and
Douze,2017;Yu et al.,2018) or trains siamese
neural networks to align the semantically similar
sentences written in different languages (Wieting
et al.,2019;Yang et al.,2020;Feng et al.,2020).
Despite the recent progress, the current evalua-
tion of multilingual sentence embeddings has fo-
cused on cross-lingual Semantic Textual Similar-
ity (STS) (Agirre et al.,2016;Cer et al.,2017) or
bi-text mining tasks (Zweigenbaum et al.,2018;
Artetxe and Schwenk,2019). Nevertheless, as
pointed out by Gao et al. (2021), the evaluation
on semantic similarity may not be sufficient be-
cause better performance on STS does not always
indicate better embeddings for downstream tasks.
Therefore, for a more comprehensive MSE evalua-
tion, it is necessary to additionally evaluate down-
stream tasks, which is largely ignored in recent
work (Chidambaram et al.,2019;Reimers and
Gurevych,2020;Feng et al.,2020). In this pa-
per, we collect a set of multilingual transfer tasks
and test various existing multilingual sentence em-
beddings. We find that different methods excel at
different tasks and the conclusions drawn from the
STS evaluation do not always hold in the transfer
tasks and vice versa. We aim to establish a stan-
dardized evaluation protocol for future research in
multilingual sentence embeddings.
To improve the quality of existing MSE mod-
els, we explore Abstract Meaning Representation
(AMR) (Banarescu et al.,2013), a symbolic seman-
tic representation, for augmenting existing neural
semantic representations. Our motivation is two-
fold. First, AMR explicitly offers core concepts
and relations in a sentence. This helps prevent
learning the superficial patterns or spurious correla-
tions in the training data, which do not generalize
well to new domains or tasks (Poliak et al.,2018;
Clark et al.,2019). Second, AMR reduces the vari-
ances in surface forms with the same meaning. This
helps alleviate the data sparsity issue as there are
arXiv:2210.09773v1 [cs.CL] 18 Oct 2022
rich lexical variations across different languages.
On the other hand, despite that AMR is advo-
cated to act as an interlingua (Xue et al.,2014;
Hajiˇ
c et al.,2014;Damonte and Cohen,2018), lit-
tle work has been done to reflect on the ability of
AMR to have impact on subsequent tasks. In order
to advance research in AMR and its applications,
multilingual sentence embedding can be seen as
an important benchmark for highlighting its abil-
ity to abstract away from surface realizations and
represent the core concepts expressed in the sen-
tence. To our knowledge, this is the first attempt
to leverage the AMR semantic representation for
multilingual NLP.
We learn AMR embeddings with contrastive
siamese network (Gao et al.,2021) and AMR
graphs derived from different languages (Cai et al.,
2021). Experiment results on 10 STS tasks and
5 transfer tasks with four state-of-the-art embed-
ding methods show that retrofitting multilingual
sentence embeddings with AMR improves the per-
formance substantially and consistently.
Our contribution is three-fold.
We propose a new method to obtain high-quality
semantic vectors for multilingual sentence rep-
resentation, which takes advantage of language-
invariant Abstract Meaning Representation that
captures the core semantics of sentences.
We present a thorough evaluation of multilingual
sentence embeddings, which goes beyond seman-
tic textual similarity and includes various transfer
tasks in downstream applications.
We demonstrate that retrofitting multilingual sen-
tence embeddings with Abstract Meaning Repre-
sentation leads to better performance on both se-
mantic textual similarity and transfer tasks.
2 Related Work
Universal Sentence Embeddings
Our work
aims to learn universal sentence representations,
which should be useful for a broad set of ap-
plications. There are two lines of research for
universal sentence embeddings: unsupervised ap-
proaches and supervised approaches. Early unsu-
pervised approaches (Kiros et al.,2015;Hill et al.,
2016;Gan et al.,2017;Logeswaran and Lee,2018)
design various surrounding sentence reconstruc-
tion/prediction objectives for sentence representa-
tion learning. Jernite et al. (2017) exploit sentence-
level discourse relations as supervision signals for
training sentence embedding model. Instead of us-
ing the interactions of sentences within a document,
Le and Mikolov (2014) propose to learn the embed-
dings for texts of arbitrary length on top of word
vectors. Likewise, Chen (2017); Pagliardini et al.
(2018); Yang et al. (2019b) calculate sentence em-
beddings from compositional
n
-gram features. Re-
cent approaches often adopt contrastive objectives
(Zhang et al.,2020;Giorgi et al.,2021;Wu et al.,
2020;Meng et al.,2021;Carlsson et al.,2021;Kim
et al.,2021;Yan et al.,2021;Gao et al.,2021) by
taking different views—from data augmentation or
different copies of models—of the same sentence
as training examples.
On the other hand, supervised methods (Con-
neau et al.,2017;Cer et al.,2018;Reimers and
Gurevych,2019;Gao et al.,2021) take advan-
tage of labeled natural language inference (NLI)
datasets (Bowman et al.,2015;Williams et al.,
2018), where a sentence embedding model is fine-
tuned on entailment or contradiction sentence pairs.
Furthermore, Wieting and Gimpel (2018); Wieting
et al. (2020) demonstrate that bilingual and back-
translation corpora provide useful supervision for
learning semantic similarity. Another line of work
focuses on regularizing embeddings (Li et al.,2020;
Su et al.,2021;Huang et al.,2021) to alleviate the
representation degeneration problem.
Multilingual Sentence Embeddings
Recently,
multilingual sentence representations have at-
tracted increasing attention. Schwenk and Douze
(2017); Yu et al. (2018); Artetxe and Schwenk
(2019) propose to use encoders from multilingual
neural machine translation to produce universal
representations across different languages. Chi-
dambaram et al. (2019); Wieting et al. (2019); Yang
et al. (2020); Feng et al. (2020) fine-tune siamese
networks (Bromley et al.,1993) with contrastive
objectives using parallel corpora. Reimers and
Gurevych (2020) train a multilingual model to map
sentences to the same embedding space of an exist-
ing English model. Different from existing work,
our work resorts to multilingual AMR, a language-
agnostic disambiguated semantic representation,
for performance enhancement.
Evaluation of Sentence Embeddings
Tradition-
ally, the mainstream evaluation for assessing the
quality of English-only sentence embeddings is
based on the Semantic Textual Similarity (STS)
tasks and a suite of downstream classification
tasks. The STS tasks (Agirre et al.,2012,2013,
2014,2015,2016;Marelli et al.,2014;Cer et al.,
2017) calculate the embedding distance of sentence
pairs and compare them with the human-annotated
scores for semantic similarity. The classification
tasks (e.g., sentiment analysis) from SentEval (Con-
neau and Kiela,2018) take sentence embeddings
as fixed input features to a logistic regression clas-
sifier. These tasks are commonly used to bench-
mark the transferability of sentence embeddings on
downstream tasks. For multilingual sentence em-
beddings, most previous work has focused on cross-
lingual STS (Agirre et al.,2016;Cer et al.,2017)
and the relevant bi-text mining tasks (Zweigen-
baum et al.,2018;Artetxe and Schwenk,2019).
The evaluation on downstream transfer tasks has
been largely ignored (Chidambaram et al.,2019;
Reimers and Gurevych,2020;Feng et al.,2020).
Nevertheless, as pointed out in Gao et al. (2021)
in English scenarios, better performance on seman-
tic similarity tasks does not always indicate better
embeddings for transfer tasks. For a more compre-
hensive evaluation, in this paper, we collect a set of
multilingual transfer tasks and test various existing
multilingual sentence embeddings. We aim to es-
tablish a standardized evaluation protocol for future
research in multilingual sentence embeddings.
3 Preliminaries
3.1 Contrastive Siamese Network
Siamese network (Bromley et al.,1993) has at-
tracted considerable attention for self-supervised
representation learning. It has been extensively
adopted with contrastive learning (Hadsell et al.,
2006) for learning dense vector representations
of images and sentences (Reimers and Gurevych,
2019;Chen et al.,2020). The core idea of con-
trastive learning is to pull together the represen-
tations of semantically close objects (images or
sentences) and repulse the representations of neg-
ative pairs of dissimilar ones. Recent work in
computer vision (Caron et al.,2020;Grill et al.,
2020;Chen and He,2021;Zbontar et al.,2021)
has demonstrated that negative samples may not be
necessary. A similar observation was made in NLP
by Zhang et al. (2021) who adopted the BYOL
framework (Grill et al.,2020) for sentence rep-
resentation learning. In this work, we adopt the
framework in (Gao et al.,2021) with in-batch neg-
atives (Chen et al.,2017;Henderson et al.,2017).
Formally, we assume a set of training examples
D={(xi, x+
i, x
i)}N
i=1
, where
x+
i
and
x
i
are se-
mantically close and semantically irrelevant to
xi
,
respectively. The training is done with stochas-
tic mini-batches. Each mini-batch consists of
M
examples and the training objective is defined as:
`i=log es(xi,x+
i)
PM
j=1 es(xi,x
j)+PM
j=1 es(xi,x+
j)
(1)
where
s(·,·)
measures the similarity of two objects
and
τ
is a scalar controlling the temperature of
training. As seen, other objects in the same mini-
batch (i.e.,
{x
j}j6=i
and
{x+
j}j6=i
) are treated as
negatives for
xi
. More concretely,
s(·,·)
computes
the cosine similarity between the representations
of two objects:
s(xi, xj) = hT
ihj
khik·khjk
where
hi
and
hj
are obtained from a neural encoder
fθ(·)
:
h=fθ(x)
. The model parameters
θ
are then
optimized using the contrastive learning objective.
3.2 Multilingual AMR Parsing
AMR (Banarescu et al.,2013) is a broad-coverage
semantic formalism originally designed for English.
The accuracy of AMR parsing has been greatly im-
proved in recent years (Cai and Lam,2019,2020a;
Bevilacqua et al.,2021;Bai et al.,2022). Because
AMR is agnostic to syntactic and wording varia-
tions, recent work has suggested the potential of
AMR to work as an interlingua (Xue et al.,2014;
Hajiˇ
c et al.,2014;Damonte and Cohen,2018).
That is, we can represent the semantics in other lan-
guages using the corresponding AMR graph of the
semantic equivalent in English. A number of cross-
lingual AMR parsers (Damonte and Cohen,2018;
Blloshmi et al.,2020;Sheth et al.,2021;Procopio
et al.,2021;Cai et al.,2021) have been developed
to transform non-English texts into AMR graphs.
Most of them rely on pre-trained multilingual lan-
guage models and synthetic parallel data. In par-
ticular, Cai et al. (2021) proposed to learn a multi-
lingual AMR parser from an English AMR parser
via knowledge distillation. Their single parser is
trained for five different languages (German, Span-
ish, Italian, Chinese, and English) and achieves
state-of-the-art parsing accuracies. In addition, the
one-for-all design maintains parsing efficiency and
reduces prediction inconsistency across different
languages. Thus, we adopt the multilingual AMR
parser of Cai et al. (2021) in our experiments.1
1https://github.com/jcyk/XAMR
It is worth noting that the multilingual parser is
capable of parsing many other languages, including
those it has not been explicitly trained for, thanks to
the generalization power inherited from pre-trained
multilingual language models (Tang et al.,2020;
Liu et al.,2020). In Section 4.2, we further extend
the training of the multilingual parser to French, an-
other major language, for improved performance.
4 Proposed Method
We first introduce how we learn AMR embeddings
and then describe the whole pipeline for enhancing
existing sentence embeddings.
4.1 Learning AMR Embeddings
Linearization & Modeling
Given AMR is
graph-structured, a variety of graph neural net-
works (Song et al.,2018;Beck et al.,2018;Ribeiro
et al.,2019;Guo et al.,2019;Cai and Lam,2020b;
Ribeiro et al.,2019) have been proposed for the
representation learning of AMR. However, recent
work (Zhang et al.,2019a;Mager et al.,2020;
Bevilacqua et al.,2021) has demonstrated that
the power of existing pre-trained language mod-
els based on the Transformer architecture (Vaswani
et al.,2017), such as BERT (Devlin et al.,2019),
GPT2 (Radford et al.,2019) and BART (Lewis
et al.,2020), can be leveraged for achieving better
performance. Following them, we also take BERT
as the backbone model.
Since Transformer-based language models are
designed for sequential data, to encode graphical
AMR, we resort to the linearization techniques
in (Bevilacqua et al.,2021). Figure 1illustrates
the linearization of AMR graphs. For each AMR
graph, a DFS traversal is performed starting from
the root node of the graph, and the trajectory is
recorded. We use parentheses to mark the hier-
archy of node depths. Bevilacqua et al. (2021)
also proposed to use special tokens for indicating
variables in the linearized graph and for handling
reentrancies (i.e., a node plays multiple roles in
the graph). However, the introduction of special
tokens significantly increases the length of the out-
put sequence (almost 50% increase). We remove
this feature and simply repeat the nodes when re-
visiting happens. This significantly reduces the
length of the output sequence and allows more effi-
cient modeling with Transformer-based language
models. The downside is that reentrancy informa-
tion becomes unrecoverable. However, we empir-
The facts are
accessible to you.
You have no
access to the facts.
(access-01 :polarity -
:ARG0 you :ARG1 fact)
-
access-01
ARG0
ARG1
you
fact
parsing linearizing
(access-01
:ARG0 you :ARG1 fact)
AMR Graph
Figure 1: The parsing and linearization pipeline.
ically found that the shortened sequences lead to
better performance. The linearizations of AMR
graphs are then treated as plain token sequences
when being fed into Transformer-based language
models. Note that AMR linearization introduces
additional tokens that are rarely shown in English
(e.g., “ARG2" and “belong-01"). These tokens may
not be included in the original vocabulary of ex-
isting language models and could be segmented
into sub-tokens (e.g., “belong-01"
“belong", “-",
“01"), which are less meaningful and increase the
sequence length. To deal with this problem, we
extend the original vocabulary of existing language
models to include all the relation and frame names
occurring at least 5 times in the AMR sembank
(LDC2017T10).
Positive & Negative Examples
Contrastive
learning aims to learn effective representations by
pulling semantically similar examples together and
pushing apart dissimilar examples. Following the
discussion in Section 3.1, the most critical question
in contrastive learning is how to obtain positive and
negative examples. In language representations,
positive examples
x+
i
are often constructed by ap-
plying minimal distortions (e.g., word deletion, re-
ordering, and substitution) on
xi
(Wu et al.,2020;
Meng et al.,2021) or introducing some random
noise (e.g., dropout (Srivastava et al.,2014)) to the
modeling function
fθ
(Gao et al.,2021). On the
other hand, negative examples
x
i
are usually sam-
pled from other sentences. However, prior work
(Conneau et al.,2017;Gao et al.,2021) has demon-
strated that entailment/contradiction sentence pairs
in supervised natural language inference (NLI)
datasets (Bowman et al.,2015;Williams et al.,
2018) are better positive/negative pairs for learn-
ing sentence embeddings. Following (Gao et al.,
2021), we borrow the supervisions from two NLI
datasets, namely SNLI (Bowman et al.,2015) and
MNLI (Williams et al.,2018). In the NLI datasets,
given one premise, there are one entailment hy-
摘要:

RetrottingMultilingualSentenceEmbeddingswithAbstractMeaningRepresentationDengCai~XinLi;yJackieChun-SingHo~LidongBingWaiLam~~TheChineseUniversityofHongKongDAMOAcademy,AlibabaGroupthisisjcykcd@gmail.com{xinting.lx,l.bing}@alibaba-inc.com{schun,wlam}@se.cuhk.edu.hkAbstractWeintroduceanewmethodtoim...

展开>> 收起<<
Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation Deng CaiXin LiyJackie Chun-Sing HoLidong BingWai Lam.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:490.41KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注