
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns
of Known Functions, and Compatibility of Neural Representations
Róbert Csordás1Kazuki Irie1Jürgen Schmidhuber1,2
1The Swiss AI Lab IDSIA, USI & SUPSI, Lugano, Switzerland
2AI Initiative, KAUST, Thuwal, Saudi Arabia
{robert, kazuki, juergen}@idsia.ch
Abstract
Well-designed diagnostic tasks have played a
key role in studying the failure of neural nets
(NNs) to generalize systematically. Famous
examples include SCAN and Compositional
Table Lookup (CTL). Here we introduce
CTL++, a new diagnostic dataset based on
compositions of unary symbolic functions.
While the original CTL is used to test length
generalization or productivity, CTL++ is
designed to test systematicity of NNs, that is,
their capability to generalize to unseen com-
positions of known functions. CTL++ splits
functions into groups and tests performance
on group elements composed in a way not
seen during training. We show that recent
CTL-solving Transformer variants fail on
CTL++. The simplicity of the task design al-
lows for fine-grained control of task difficulty,
as well as many insightful analyses. For exam-
ple, we measure how much overlap between
groups is needed by tested NNs for learning
to compose. We also visualize how learned
symbol representations in outputs of functions
from different groups are compatible in case
of success but not in case of failure. These
results provide insights into failure cases
reported on more complex compositions in the
natural language domain. Our code is public.1
1 Introduction
Neural networks (NNs) should ideally learn from
training data to generalize systematically (Fodor
et al.,1988), by learning generally applicable rules
instead of pure pattern matching. Existing NNs,
however, typically don’t. For example, in the con-
text of sequence processing NNs, superficial differ-
ences between training and test distributions, e.g.,
with respect to input sequence length or unseen
input/word combinations, are enough to prevent
current NNs from generalizing (Lake and Baroni,
2018). Training on a large amounts of data might
1https://github.com/robertcsordas/ctlpp
alleviate the problem, but it is infeasible to cover
all possible lengths and combinations.
Indeed, while large language models trained on
a large amounts of data have obtained impressive
results (Brown et al.,2020), they often fail on
tasks requiring simple algorithmic reasoning, e.g.,
simple arithmetics (Rae et al.,2021). A promis-
ing way to achieve systematic generalization is
to make NNs more compositional (Schmidhuber,
1990), by reflecting and exploiting the hierarchi-
cal structure of many problems either within some
NN’s learned weights, or through tailored NN ar-
chitectures. For example, recent work by Csordás
et al. (2022) proposes architectural modifications
to the standard Transformer (Vaswani et al.,2017)
motivated by the principles of compositionality.
The resulting Neural Data Router (NDR) exhibits
strong length generalization or productivity on rep-
resentative datasets such as Compositional Table
Lookup (CTL; Liska et al. (2018); Hupkes et al.
(2019)).
The focus of the present work is on systematicity:
the capability to generalize to unseen compositions
of known functions/words. That is crucial for learn-
ing to process natural language or to reason on
algorithmic problems without an excessive amount
of training examples. Some of the existing bench-
marks (such as COGS (Kim and Linzen,2020) and
PCFG (Hupkes et al.,2020)) are almost solvable
by plain NNs with careful tuning (Csordás et al.,
2021), while others, such as CFQ (Keysers et al.,
2020), are much harder. A recent analysis of CFQ
by Bogin et al. (2022) suggests that the difficult
examples have a common characteristic: they con-
tain some local structures (describable by parse
trees) which are not present in the training exam-
ples. These findings provide hints for constructing
both challenging and intuitive (simple to define and
analyze) diagnostic tasks for testing systematicity.
We propose CTL++, a new diagnostic dataset build-
ing upon CTL. CTL++ is basically as simple as the
arXiv:2210.06350v1 [cs.LG] 12 Oct 2022