Structural generalization is hard for sequence-to-sequence models
Yuekun Yao and Alexander Koller
Department of Language Science and Technology
Saarland Informatics Campus
Saarland University, Saarbrücken, Germany
{ykyao, koller}@coli.uni-saarland.de
Abstract
Sequence-to-sequence (seq2seq) models have
been successful across many NLP tasks, in-
cluding ones that require predicting linguistic
structure. However, recent work on composi-
tional generalization has shown that seq2seq
models achieve very low accuracy in general-
izing to linguistic structures that were not seen
in training. We present new evidence that this
is a general limitation of seq2seq models that
is present not just in semantic parsing, but also
in syntactic parsing and in text-to-text tasks,
and that this limitation can often be overcome
by neurosymbolic models that have linguistic
knowledge built in. We further report on some
experiments that give initial answers on the
reasons for these limitations.
1 Introduction
Humans are able to understand and produce lin-
guistic structures they have never observed before
(Chomsky,1957;Fodor and Pylyshyn,1988;Fodor
and Lepore,2002). From limited, finite observa-
tions, they generalize at an early age to an infinite
variety of novel structures using recursion. They
can also assign meaning to these, using the Princi-
ple of Compositionality. This ability to generalize
to unseen structures is important for NLP systems
in low-resource settings, such as underresourced
languages or projects with a limited annotation bud-
get, where a user can easily use structures that had
no annotations in training.
Over the past few years, large pretrained
sequence-to-sequence (seq2seq) models, such as
BART (Lewis et al.,2020) and T5 (Raffel et al.,
2020), have brought tremendous progress to many
NLP tasks. This includes linguistically complex
tasks such as broad-coverage semantic parsing,
where e.g. a lightly modified BART set a new
state of the art on AMR parsing (Bevilacqua et al.,
2021). However, there have been some concerns
that seq2seq models may have difficulties with com-
positional generalization, a class of tasks in seman-
tic parsing where the training data is structurally
impoverished in comparison to the test data (Lake
and Baroni,2018;Keysers et al.,2020). We focus
on the COGS dataset of Kim and Linzen (2020) be-
cause some of its generalization types specifically
target structural generalization, i.e. the ability to
generalize to unseen structures.
In this paper, we make two contributions. First,
we offer evidence that structural generalization is
systematically hard for seq2seq models. On the
semantic parsing task of COGS, seq2seq mod-
els don’t fail on compositional generalization as
a whole, but specifically on the three COGS gener-
alization types that require generalizing to unseen
linguistic structures, achieving accuracies below
10%. This is true both for BART and T5 and for
seq2seq models that were specifically developed
for COGS. What’s more, BART and T5 fail simi-
larly on syntax and even POS tagging variants of
COGS (introduced in this paper), indicating that
they do not only struggle with compositional gen-
eralization in semantics, but with structural gener-
alization more generally. Structure-aware models,
such as the compositional semantic parsers of Liu
et al. (2021) and Weißenhorn et al. (2022) and the
Neural Berkeley Parser (Kitaev and Klein,2018),
achieve perfect accuracy on these tasks.
Second, we conduct a series of experiments to
investigate what makes structural generalization so
hard for seq2seq models. It is not because the en-
coder loses structurally relevant information: One
can train a probe to predict COGS syntax from
BART encodings, in line with earlier work (Hewitt
and Manning,2019;Tenney et al.,2019a); but the
decoder does not learn to use it for structural gen-
eralization. We find further that the decoder does
not even learn to generalize semantically when the
input is enriched with syntactic structure. Finally,
it is not merely because the COGS tasks require
the mapping of language into symbolic represen-
arXiv:2210.13050v1 [cs.CL] 24 Oct 2022