Structural generalization is hard for sequence-to-sequence models Yuekun Yao and Alexander Koller Department of Language Science and Technology

2025-05-02 0 0 461.06KB 15 页 10玖币
侵权投诉
Structural generalization is hard for sequence-to-sequence models
Yuekun Yao and Alexander Koller
Department of Language Science and Technology
Saarland Informatics Campus
Saarland University, Saarbrücken, Germany
{ykyao, koller}@coli.uni-saarland.de
Abstract
Sequence-to-sequence (seq2seq) models have
been successful across many NLP tasks, in-
cluding ones that require predicting linguistic
structure. However, recent work on composi-
tional generalization has shown that seq2seq
models achieve very low accuracy in general-
izing to linguistic structures that were not seen
in training. We present new evidence that this
is a general limitation of seq2seq models that
is present not just in semantic parsing, but also
in syntactic parsing and in text-to-text tasks,
and that this limitation can often be overcome
by neurosymbolic models that have linguistic
knowledge built in. We further report on some
experiments that give initial answers on the
reasons for these limitations.
1 Introduction
Humans are able to understand and produce lin-
guistic structures they have never observed before
(Chomsky,1957;Fodor and Pylyshyn,1988;Fodor
and Lepore,2002). From limited, finite observa-
tions, they generalize at an early age to an infinite
variety of novel structures using recursion. They
can also assign meaning to these, using the Princi-
ple of Compositionality. This ability to generalize
to unseen structures is important for NLP systems
in low-resource settings, such as underresourced
languages or projects with a limited annotation bud-
get, where a user can easily use structures that had
no annotations in training.
Over the past few years, large pretrained
sequence-to-sequence (seq2seq) models, such as
BART (Lewis et al.,2020) and T5 (Raffel et al.,
2020), have brought tremendous progress to many
NLP tasks. This includes linguistically complex
tasks such as broad-coverage semantic parsing,
where e.g. a lightly modified BART set a new
state of the art on AMR parsing (Bevilacqua et al.,
2021). However, there have been some concerns
that seq2seq models may have difficulties with com-
positional generalization, a class of tasks in seman-
tic parsing where the training data is structurally
impoverished in comparison to the test data (Lake
and Baroni,2018;Keysers et al.,2020). We focus
on the COGS dataset of Kim and Linzen (2020) be-
cause some of its generalization types specifically
target structural generalization, i.e. the ability to
generalize to unseen structures.
In this paper, we make two contributions. First,
we offer evidence that structural generalization is
systematically hard for seq2seq models. On the
semantic parsing task of COGS, seq2seq mod-
els don’t fail on compositional generalization as
a whole, but specifically on the three COGS gener-
alization types that require generalizing to unseen
linguistic structures, achieving accuracies below
10%. This is true both for BART and T5 and for
seq2seq models that were specifically developed
for COGS. What’s more, BART and T5 fail simi-
larly on syntax and even POS tagging variants of
COGS (introduced in this paper), indicating that
they do not only struggle with compositional gen-
eralization in semantics, but with structural gener-
alization more generally. Structure-aware models,
such as the compositional semantic parsers of Liu
et al. (2021) and Weißenhorn et al. (2022) and the
Neural Berkeley Parser (Kitaev and Klein,2018),
achieve perfect accuracy on these tasks.
Second, we conduct a series of experiments to
investigate what makes structural generalization so
hard for seq2seq models. It is not because the en-
coder loses structurally relevant information: One
can train a probe to predict COGS syntax from
BART encodings, in line with earlier work (Hewitt
and Manning,2019;Tenney et al.,2019a); but the
decoder does not learn to use it for structural gen-
eralization. We find further that the decoder does
not even learn to generalize semantically when the
input is enriched with syntactic structure. Finally,
it is not merely because the COGS tasks require
the mapping of language into symbolic represen-
arXiv:2210.13050v1 [cs.CL] 24 Oct 2022
Training Generalization
(a) LEX
subj_to_obj
(common noun)
A hedgehog ate the cake.
*cake(x4); hedgehog(x1)
eat.agent(x2, x1)eat.theme(x2, x4)
The baby liked the hedgehog.
*baby(x1); *hedgehog(x4);
like.agent(x2, x1)like.theme(x2, x4)
(b) STRUCT
PP recursion
Ava saw a ball in a bowl on the table.
*table(x9); see.agent(x1,Ava)
see.theme(x1, x3)ball(x3)
ball.nmod.in(x3, x6)bowl(x6)
bowl.nmod.on(x6, x9)
Ava saw a ball in a bowl on the table on the floor.
*table(x9); *floor(x12); see.agent(x1,
Ava) see.theme(x1, x3)
ball(x3)ball.nmod.in(x3, x6)
bowl(x6)bowl.nmod.on(x6, x9)
table.nmod.on(x9, x12)
(c) STRUCT
obj_to_subj PP
Noah ate the cake on the plate.
*cake(x3); *plate(x6);
eat.agent(x1,Noah) eat.theme(x1, x3)
cake.nmod.on(x3, x6)
The cake on the table burned.
*cake(x1); *table(x4);
cake.nmod.on(x1, x4)
burn.theme(x3, x1)
Figure 1: Some examples from the COGS dataset. LEX represents lexical generalization and STRUCT denotes
structural generalization.
tations. We introduce a new text-to-text variant of
COGS called QA-COGS, where questions about
COGS sentences must be answered in English. We
find that T5 performs well on structural general-
ization with the original COGS sentences, but all
models still struggle with a harder text-to-text task
involving structural disambiguation.
The code1and datasets2are available online.
2 Related work
The recent interest in compositional generalization
has raised concerns about limitations of seq2seq
models. For instance, the SCAN dataset (Lake
and Baroni,2018) requires a model to translate
natural-language instructions into symbolic action
sequences; it has multiple splits in which the test
data contains new combinations of commands or
instructions that are systematically longer than in
training. The PCFG dataset (Hupkes et al.,2020)
builds upon SCAN and adds instructions with re-
cursive structure. The CFQ dataset (Keysers et al.,
2020) maps questions to SPARQL queries, and
splits the data according to a measure of compo-
sitional complexity (MCD). In all of these papers,
simple seq2seq models based on LSTMs and trans-
formers were shown to perform poorly when the
test data was more complex than the training data.
Since then, followup research has shown that
both generic transformer-based models (Ontanon
et al.,2022;Csordás et al.,2021), general-purpose
pretrained models (Furrer et al.,2020), and seq2seq
models that are specialized for the task can achieve
1https://github.com/coli-saar/
Seq2seq-on-COGS
2https://github.com/coli-saar/
Syntax-COGS
higher accuracies than the ones reported in the pa-
pers introducing the datasets. Nonetheless, there is
a sense that despite the best efforts of the commu-
nity, pure seq2seq models are hitting a ceiling on
compositional generalization tasks.
In this paper, we shed some light on the issue
by (a) clarifying that seq2seq models do not strug-
gle with compositional generalization per se, but
with structural generalization, and (b) demonstrat-
ing that this type of generalization remains hard
for seq2seq models even after heavy pretraining.
This is in contrast to most previous research, which
has avoided pretraining and focused on length or
MCD as the primary source of difficulty. Our data
includes instances where the structure, but not the
length differs between training and testing, and
therefore allows us to differentiate between the two.
The importance of structure to compositional gener-
alization is also recognized by Bogin et al. (2022).
The difficulty of structural generalization for
neural models has also been studied in more tar-
geted ways. For instance, Yu et al. (2019) show
empirically that LSTM-based seq2seq models can-
not learn to close the brackets of Dyck languages,
and Hahn (2020) proves that transformers cannot
learn to distinguish well-bracketed Dyck expres-
sions. McCoy et al. (2020) find empirically that
seq2seq models struggle to learn the structural op-
erations necessary to rewrite declarative English
sentences into questions, whereas tree-based mod-
els work better.
3 Structural generalization in COGS
COGS (Kim and Linzen,2020) is a synthetic
semantic parsing dataset in which English sen-
S
NP
Noah
VP
V
ate
NP
NP
the cake
PP
on the plate
=S
NP
NP
the cake
PP
on the table
VP
V
burned
(a) object PP to subject PP
S
NP
Ava
VP
V
saw
NP
NP
a ball
PP
in NP
NP
a bowl
PP
on NP
the table
=S
NP
Ava
VP
V
saw
NP
NP
a ball
PP
in NP
NP
a bowl
PP
on NP
NP
the table
PP
on the floor
(b) PP recursion
Figure 2: Structural generalization in COGS.
tences must be mapped to logic-based meaning
representations (see Fig. 1for some examples).
It distinguishes 21 generalization types, each of
which requires generalizing from training instances
to test instances in a particular systematic and
linguistically-informed way. COGS was designed
to measure compositional generalization, the abil-
ity of a semantic parser to assign correct mean-
ing representations to out-of-distribution sentences.
Unlike SCAN and CFQ, it includes generalization
types with unbounded recursion and separates them
cleanly from other generalization types, both of
which are crucial for the experiments reported here.
Most generalization types in COGS are lexical:
they recombine known grammatical structures with
words that were not observed in these particular
structures in training. An example is the general-
ization type “subject to object” (Fig. 1a), in which
a noun (“hedgehog”) is only seen as a subject in
training, whereas it is only used as on object at test
time. The syntactic structure at test time was al-
ready observed in training; only the words change.
By contrast, structural generalization involves
generalizing to linguistic structures that were not
seen in training (cf. Fig. 1b,c). Examples are the
generalization types “PP recursion”, where training
instances contain prepositional phrases of depth up
to two and generalization instances have PPs of
depth 3–12; and “object PP to subject PP”, where
PPs modify only objects in training and only sub-
jects at test time. These structural changes are
illustrated in Fig. 2.
Structural generalization requires learning about
recursion and compositionality, and is thus a more
thorough test of human-like language use, whereas
lexical generalization amounts to smart template
filling. In this paper, we investigate how well
structural generalization can be solved by differ-
ent classes of model architectures: seq2seq models
and structure-aware models. We define a model
as “structure-aware” if it is explicitly designed to
encode linguistic knowledge beyond the fact that
sentences are sequences of tokens. This captures
a large class of models that can be as “deep” as a
compositional semantic parser or as “shallow” as a
POS tagger that requires that each input token gets
exactly one POS tag.
4 Structural generalization is hard for
seq2seq
We begin with some evidence that structural gen-
eralization in COGS is hard for seq2seq models,
while structure-aware models learn it quite easily.
We first collect some results on the original se-
mantic parsing task of COGS, extending it with
numbers for BART and T5. We then transform
COGS into a corpus for syntactic parsing and POS
tagging and investigate the ability of BART and T5
to generalize structurally on these tasks.
4.1 Experimental setup: COGS
We follow standard COGS practice and evaluate all
models on the generalization set. We report exact
match accuracies, averaged across 5 training runs.
Seq2seq models.
We train BART (Lewis et al.,
2020) and T5 (Raffel et al.,2020) as semantic
parsers on COGS. Both models are strong represen-
tatives of seq2seq models and perform well across
many NLP tasks. To apply these models on COGS,
we directly fine-tune the pretrained bart-base and
t5-base model on it with the corresponding tok-
enizer; see Appendix Afor details. We also report
results for a wide range of published seq2seq mod-
els for COGS (Kim and Linzen,2020;Conklin
et al.,2021;Csordás et al.,2021;Akyürek and An-
dreas,2021;Zheng and Lapata,2022;Qiu et al.,
2021).
Structure-aware models.
We report evaluation
results for LeAR (Liu et al.,2021) and the AM
parser (Weißenhorn et al.,2022). Both models
learn to predict a tree structure which is decoded
into COGS meaning representations using the Prin-
ciple of Compositionality. Thus both models are
structure-aware.
摘要:

Structuralgeneralizationishardforsequence-to-sequencemodelsYuekunYaoandAlexanderKollerDepartmentofLanguageScienceandTechnologySaarlandInformaticsCampusSaarlandUniversity,Saarbrücken,Germany{ykyao,koller}@coli.uni-saarland.deAbstractSequence-to-sequence(seq2seq)modelshavebeensuccessfulacrossmanyNLPta...

展开>> 收起<<
Structural generalization is hard for sequence-to-sequence models Yuekun Yao and Alexander Koller Department of Language Science and Technology.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:461.06KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注