Translate First Reorder Later Leveraging Monotonicity in Semantic Parsing Francesco CazzaroDavide Locatelli Ariadna Quattoni

2025-05-06 0 0 452.21KB 12 页 10玖币
侵权投诉
Translate First Reorder Later:
Leveraging Monotonicity in Semantic Parsing
Francesco CazzaroDavide LocatelliAriadna Quattoni
Universitat Politècnica de Catalunya
name.lastname@upc.edu
Xavier Carreras
dMetrics
xavier.carreras@dmetrics.com
Abstract
Prior work in semantic parsing has shown that
conventional seq2seq models fail at composi-
tional generalization tasks. This limitation led
to a resurgence of methods that model align-
ments between sentences and their correspond-
ing meaning representations, either implicitly
through latent variables or explicitly by taking
advantage of alignment annotations. We take
the second direction and propose TPOL, a two-
step approach that first translates input sen-
tences monotonically and then reorders them
to obtain the correct output. This is achieved
with a modular framework comprising a Trans-
lator and a Reorderer component. We test
our approach on two popular semantic pars-
ing datasets. Our experiments show that by
means of the monotonic translations, TPOL
can learn reliable lexico-logical patterns from
aligned data, significantly improving compo-
sitional generalization both over conventional
seq2seq models, as well as over other ap-
proaches that exploit gold alignments. Our
code is publicly available at https://github.
com/interact-erc/TPol.git
1 Introduction
The goal of a semantic parser is to map natural
language sentences (NLs) into meaning represen-
tations (MRs). Most current semantic parsers are
based on deep sequence-to-sequence (seq2seq) ap-
proaches and presume that it is unnecessary to
model token alignments between NLs and MRs
because the attention mechanism can automatically
learn the correspondences (Dong and Lapata,2016;
Jia and Liang,2016). However, recent work has
shown that such seq2seq models find compositional
generalization challenging, i.e., they struggle to pre-
dict unseen structures made up of components ob-
served at training (Lake and Baroni,2018;Finegan-
Dollak et al.,2018).
Equal contribution
Figure 1: Examples from the GEOALIGNED dataset.
(a) is a monotonic alignment, (b) is non-monotonic.
This limitation motivated the resurgence of ap-
proaches that model alignments between NL sen-
tences and their corresponding MRs more simi-
larly to classical grammar and translation-based
parsers (Herzig and Berant,2021). Alignments can
be modeled either implicitly through latent vari-
ables (Wang et al.,2021), or explicitly by leverag-
ing gold alignment annotations (Shi et al.,2020;
Liu et al.,2021a). We take the second direction
and exploit a recently released multilingual dataset
for semantic parsing annotated with word align-
ments: GEOALIGNED (Locatelli and Quattoni,
2022), which augments the popular GEO bench-
mark (Zelle and Mooney,1996).
Figure 1shows some examples of the annota-
tions provided. One key observation is that a signif-
icant percentage of the alignments are monotonic,
i.e., they require no reordering of the target MR
(Figure 1a), as opposed to non-monotonic align-
ments (Figure 1b). This suggests that learning reli-
able lexico-logical translation patterns from aligned
data should be possible. If there are simple patterns,
shouldn’t an ideal model be able to exploit them?
With this in mind, we propose TPOL, a
T
wo-
step
P
arsing approach that leverages m
o
notonic
trans
l
ations. TPOL introduces a modular frame-
work with two components: a Monotonic Trans-
lator and a Reorderer. The Translator is trained
from pairs of NLs and MRs, where the MRs have
been permuted to be monotonically aligned. Hence,
arXiv:2210.04878v2 [cs.CL] 6 Feb 2023
the Translator’s output will be an MR whose order
might not correspond to that of the gold truth. For
this reason, the Reorderer is trained to restore the
correct order of the original MR.
Our experiments on GEOALIGNED demonstrate
that compared to a multilingual BART model (Liu
et al.,2020), TPOL achieves similar performance
on the random test split but significantly outper-
forms on the compositional split across all lan-
guages. For example, on the query split in En-
glish, mBART obtains
69.4%
in exact-match ac-
curacy and TPOL obtains
87.8%
. This result also
improves on the
74.6%
obtained by SPANBASED
(Herzig and Berant,2021), another approach that
leverages alignment annotations.
Because most semantic parsing datasets do not
contain alignment information, we experiment with
alignments generated automatically. On GEO,
TPOL trained with automatic alignments still out-
performs mBART, and in particular on the English
query split it improves by almost 10 points. Further-
more, we show competitive results on the popular
SCAN dataset (Lake and Baroni,2018).
In summary, the main contributions of this paper
are:
1.
We propose TPOL, a modular two-step ap-
proach for semantic parsing which explicitly
leverages monotonic alignments;
2.
Our experiments show that TPOL improves
compositional generalization without compro-
mising overall performance;
3.
We show that even without gold alignments
TPOL can achieve competitive results.
2 Related Work
Recently, the semantic parsing community has
raised the question of whether current models can
generalize compositionally, along with an effort to
test for it (Lake and Baroni,2018;Finegan-Dollak
et al.,2018;Kim and Linzen,2020). The consen-
sus is that conventional seq2seq models struggle
to generalize compositionally (Loula et al.,2018;
Keysers et al.,2020). Moreover, large pre-trained
language models have been shown not to improve
compositional generalization (Oren et al.,2020;
Qiu et al.,2022b). This has prompted the com-
munity to realize that parsers should be designed
intentionally with compositionality in mind (Lake,
2019;Gordon et al.,2020;Weißenhorn et al.,2022).
It has also been pointed out that compositional ar-
chitectures are often designed for synthetic datasets
and that compositionality on non-synthetic data is
under-tested (Shaw et al.,2021).
Data augmentation techniques have been pro-
posed to improve compositional generalization
(Andreas,2020;Yang et al.,2022;Qiu et al.,
2022a). Another strategy is to exploit some level of
word alignments. In general, there has been a resur-
gent interest in alignments as it has been shown that
they can be beneficial to neural models (Shi et al.,
2020). It has also been conjectured that the lack of
alignment information might hamper progress in
semantic parsing (Zhang et al.,2019). As a result,
the field has seen some annotation efforts in this
regard (Shi et al.,2020;Herzig and Berant,2021;
Locatelli and Quattoni,2022).
Alignments have been modeled implicitly: Wang
et al. (2021) treat alignments as discrete structured
latent variables within a neural seq2seq model, em-
ploying a framework that first reorders the NL and
then decodes the MR. Explicit use of alignment
information has also been explored: Herzig and Be-
rant (2021) use alignments and predict a span tree
over the NL. Sun et al. (2022) recently proposed
an approach to data augmentation via sub-tree sub-
stitutions. In text-to-SQL, attention-based models
that try to capture alignments have been proposed
(Lei et al.,2020;Liu et al.,2021b), as well as at-
tempts that try to leverage them directly (Sun et al.,
2022).
Our two-step approach resembles statistical ma-
chine translation, which decomposes the translation
task into lexical translation and reordering (Chang
et al.,2022). Machine translation techniques have
previously been applied to semantic parsing. The
first attempt was by Wong and Mooney (2006),
who argued that a parsing model can be viewed as
a syntax-based translation model and used a statis-
tical word alignment algorithm. Later a machine
translation approach was used on the GEO dataset,
obtaining what was at the time state-of-the-art re-
sults (Andreas et al.,2013). More recently, Agar-
wal et al. (2020) employed machine translation to
aid semantic parsing.
3 Preliminaries: Word Alignments
This section briefly explains word alignments,
showing the difference between monotonic and
non-monotonic alignments, and illustrates the no-
tion of monotonic translations.
Assume that we have a pair of sequences
x=x1, . . . , xn
and
y=y1, . . . , ym
, where
n
and
m
are the respective sequence lengths. A
bi-sequence is defined as the tuple
(x,y)
. In
our application,
x
is a NL sentence, and
y
is its
corresponding MR. For example:
x=
which city has the highest population density?
y=answer(largest(density(city(all))))
A word alignment is a set of bi-symbols
A
,
where each bi-symbol defines an alignment from a
token in the NL to a token in the MR. For instance,
the bi-symbol
(xi, yj)
aligns token
xi
to token
yj
.
In our example, the tokens "which" and "answer"
could be paired by a bi-symbol (which, answer).
If a token
xi
does not align to anything in
y
,
an
ε
is introduced in
y
: the resulting bi-symbol
(xi, ε)
corresponds to a deletion. In our example,
the token "has" in the NL can be deleted with a
bi-symbol
(
has,
ε)
. Similarly, if a token
yj
is not
aligned to a token in
x
, an
ε
is introduced in
x
:
(ε, yj)
is an insertion. In our example, the token
"all" in the MR is inserted with bi-symbol (ε, all).
The bi-symbols in
A
are all one-to-one. Hence,
to map a single token to a phrase, i.e., to multi-
ple tokens, it is necessary to choose a head token
in the phrase, while the remaining tokens require
insertion or deletion. In our example, the token
"density" in the MR corresponds to "population
density" in the NL, and, if "density" is chosen as
the head token in the NL, "population" needs a dele-
tion: the alignment will be given by the bi-symbols
(
population,
ε)
and
(
density, density
)
.
1
Following
this strategy, this notation can account for one-to-
many and many-to-one alignments with deletion
and insertion operations.
Figure 2a shows a possible bi-sequence word
alignment for the aforementioned example. Each
bi-symbol is conveniently represented by a hori-
zontal line connecting the tokens it aligns.
Alignments can be monotonic or non-monotonic.
An alignment is monotonic if it does not involve
any crossing, i.e., a mapping that does not require
reordering tokens. In our example, the alignment
is non-monotonic because the bi-symbol
(
city,city
)
crosses over others. By permuting the MR, we can
obtain a monotonic translation of the NL: Figure
1
Locatelli and Quattoni (2022) showed that annotators are
consistent in the way they pick head-tokens, and reported high
inter-annotator agreement scores on GEOALIGNED.
Figure 2: (a) A possible alignment for an NL-MR pair.
(b) The corresponding monotonic translation. For sim-
plicity, we removed the brackets and question mark.
2b shows such permutation. The next section illus-
trates how TPOL can leverage these translations.
4 Translate First Reorder Later
We propose TPOL, a two-step parsing approach
with a modular framework made up of two com-
ponents: a Monotonic Translator and a Reorderer.
Figure 3shows how our semantic parser takes an
input sentence
x
and predicts the corresponding
MR
y
. In the first step,
x
is fed to the Translator,
which outputs a monotonic translation
z
. In other
words,
z
is the target MR that has been permuted so
that it aligns monotonically to the input NL. Then,
in a second step,
z
is fed to the Reorderer, which is
trained to place the MR tokens back into the correct
order to produce the final prediction y.
The main idea behind TPOL is decomposing the
task into lexical translation and reordering, to learn
more reliable translation patterns. We purport that
modeling monotonic alignments eases the learning
of novel pattern combinations of seen structures,
improving compositional generalization.
An alternative approach would be to permute the
NL inputs rather than the MRs monotonically. We
do not follow this direction due to the observation
that in semantic parsing, multiple NLs can map to
the same MR. In other words, the NL domain is
larger than that of the MRs, and thus we believe
that learning to reorder the MRs is more feasible.
4.1 Monotonic Translator
The Monotonic Translator is responsible for mak-
ing an initial prediction of the MR sequence, which
will contain the correct tokens in monotonic or-
der. To create the training bi-sequences, we use
alignment information and permute the gold MR
摘要:

TranslateFirstReorderLater:LeveragingMonotonicityinSemanticParsingFrancescoCazzaroDavideLocatelliAriadnaQuattoniUniversitatPolitècnicadeCatalunyaname.lastname@upc.eduXavierCarrerasdMetricsxavier.carreras@dmetrics.comAbstractPriorworkinsemanticparsinghasshownthatconventionalseq2seqmodelsfailatcomp...

收起<<
Translate First Reorder Later Leveraging Monotonicity in Semantic Parsing Francesco CazzaroDavide Locatelli Ariadna Quattoni.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:452.21KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注