CTL Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions and Compatibility of Neural Representations Róbert Csordás1Kazuki Irie1Jürgen Schmidhuber12

2025-04-26 1 0 775.03KB 10 页 10玖币

侵权投诉

CTL++: Evaluating Generalization on Never-Seen Compositional Patterns

of Known Functions, and Compatibility of Neural Representations

Róbert Csordás1Kazuki Irie1Jürgen Schmidhuber1,2

1The Swiss AI Lab IDSIA, USI & SUPSI, Lugano, Switzerland

2AI Initiative, KAUST, Thuwal, Saudi Arabia

{robert, kazuki, juergen}@idsia.ch

Abstract

Well-designed diagnostic tasks have played a

key role in studying the failure of neural nets

(NNs) to generalize systematically. Famous

examples include SCAN and Compositional

Table Lookup (CTL). Here we introduce

CTL++, a new diagnostic dataset based on

compositions of unary symbolic functions.

While the original CTL is used to test length

generalization or productivity, CTL++ is

designed to test systematicity of NNs, that is,

their capability to generalize to unseen com-

positions of known functions. CTL++ splits

functions into groups and tests performance

on group elements composed in a way not

seen during training. We show that recent

CTL-solving Transformer variants fail on

CTL++. The simplicity of the task design al-

lows for ﬁne-grained control of task difﬁculty,

as well as many insightful analyses. For exam-

ple, we measure how much overlap between

groups is needed by tested NNs for learning

to compose. We also visualize how learned

symbol representations in outputs of functions

from different groups are compatible in case

of success but not in case of failure. These

results provide insights into failure cases

reported on more complex compositions in the

natural language domain. Our code is public.1

1 Introduction

Neural networks (NNs) should ideally learn from

training data to generalize systematically (Fodor

et al.,1988), by learning generally applicable rules

instead of pure pattern matching. Existing NNs,

however, typically don’t. For example, in the con-

text of sequence processing NNs, superﬁcial differ-

ences between training and test distributions, e.g.,

with respect to input sequence length or unseen

input/word combinations, are enough to prevent

current NNs from generalizing (Lake and Baroni,

2018). Training on a large amounts of data might

1https://github.com/robertcsordas/ctlpp

alleviate the problem, but it is infeasible to cover

all possible lengths and combinations.

Indeed, while large language models trained on

a large amounts of data have obtained impressive

results (Brown et al.,2020), they often fail on

tasks requiring simple algorithmic reasoning, e.g.,

simple arithmetics (Rae et al.,2021). A promis-

ing way to achieve systematic generalization is

to make NNs more compositional (Schmidhuber,

1990), by reﬂecting and exploiting the hierarchi-

cal structure of many problems either within some

NN’s learned weights, or through tailored NN ar-

chitectures. For example, recent work by Csordás

et al. (2022) proposes architectural modiﬁcations

to the standard Transformer (Vaswani et al.,2017)

motivated by the principles of compositionality.

The resulting Neural Data Router (NDR) exhibits

strong length generalization or productivity on rep-

resentative datasets such as Compositional Table

Lookup (CTL; Liska et al. (2018); Hupkes et al.

(2019)).

The focus of the present work is on systematicity:

the capability to generalize to unseen compositions

of known functions/words. That is crucial for learn-

ing to process natural language or to reason on

algorithmic problems without an excessive amount

of training examples. Some of the existing bench-

marks (such as COGS (Kim and Linzen,2020) and

PCFG (Hupkes et al.,2020)) are almost solvable

by plain NNs with careful tuning (Csordás et al.,

2021), while others, such as CFQ (Keysers et al.,

2020), are much harder. A recent analysis of CFQ

by Bogin et al. (2022) suggests that the difﬁcult

examples have a common characteristic: they con-

tain some local structures (describable by parse

trees) which are not present in the training exam-

ples. These ﬁndings provide hints for constructing

both challenging and intuitive (simple to deﬁne and

analyze) diagnostic tasks for testing systematicity.

We propose CTL++, a new diagnostic dataset build-

ing upon CTL. CTL++ is basically as simple as the

arXiv:2210.06350v1 [cs.LG] 12 Oct 2022

original CTL in terms of task deﬁnition, but adds

the core challenge of compositional generalization

absent in CTL. Such simplicity allows for insight-

ful analyses: one low-level reason for the failure to

generalize compositionally appears to be the failure

to learn functions whose outputs are symbol repre-

sentations compatible with inputs of other learned

neural functions. We will visualize this.

Well-designed diagnostic datasets have histori-

cally contributed to studies of systematic general-

ization in NNs. Our CTL++ strives to continue this

tradition.

2 Original CTL

Our new task (Sec. 3) is based on the CTL task

(Liska et al.,2018;Hupkes et al.,2019;Dubois

et al.,2020) whose examples consist of composi-

tions of bijective unary functions deﬁned over a set

of symbols. Each example in the original CTL is

deﬁned by one input symbol and a list of functions

to be applied sequentially, i.e., the ﬁrst function is

applied to the input symbol and the resulting output

becomes the input to the second function, and so

forth. The functions are bijective and randomly

generated. The original CTL uses eight different

symbols. We represent each symbol by a natural

number, and each function by a letter. For example,

‘

d a b 3

’ corresponds to the expression

d(a(b(3)))

The model has to predict the corresponding output

symbol (this can be viewed as a sequence clas-

siﬁcation task). When the train/test distributions

are independent and identically distributed (IID),

even the basic Transformer achieves perfect test

accuracy (Csordás et al.,2022). The task becomes

more interesting when test examples are longer

than training examples. In such a productivity split,

which is the common setting of the original CTL

(Dubois et al.,2020;Csordás et al.,2022), standard

Transformers fail, while NDR and bi-directional

LSTM work perfectly.

3 Extensions for Systematicity: CTL++

To introduce a systematicity split to the CTL frame-

work, we divide the set of functions into disjoint

groups and restrict the sampling process such that

some patterns of compositions between group el-

ements are never sampled for training, only for

testing. Based on this simple principle, we derive

three variations of CTL++. They differ from each

other in terms of compositional patterns used for

testing (excluded from training) as described below.

OUT

Figure 1: Sampling graph for variant ‘A.’

We’ll also visualize the difference using sampling

graphs in which the nodes represent the groups,

and the edges specify possible compositional pat-

terns. The colors of the edges reﬂect when the

edges are used: black for both training and testing,

blue for training, and red only for testing.

Variation ‘A’ (as in ‘Alternating’).

Here func-

tions are divided in groups

and

. During

training, successively composed functions are sam-

pled from different groups in an alternating way—

i.e., successive functions cannot be from the same

group. During testing, however, only functions

from the same group can be composed. The sam-

pling graph is shown in Fig. 1. Importantly, the sin-

gle function applications are part of the training set,

to allow the model to learn common input/output

symbol representations for the interface between

different groups.

Variation ‘R’ (as in ‘Repeating’).

This variant

is the complement of variation ‘A’ above. To get

a training example, either

is sampled,

and all functions in that example are sampled from

that same group for the whole sequence. In test

examples, functions are sampled in an alternating

way. There is thus no exchange of information

between the groups, except for the shared input

embeddings and the output classiﬁcation weight

matrix. The sampling graph is like in Fig. 1for ‘A’

except that blue edges should become red and vice

versa (see Fig. 5in the appendix).

Variation ‘S’ (as in ‘Staged’).

In this variant,

functions are divided into ﬁve disjoint groups:

Ga1

Ga2

Gb1

Gb2

and

. As indicated by the indices,

each group belongs to one of the two paths (‘a’

or ‘b’) and one of the two stages (‘1’ or ‘2’), ex-

cept for

which only belongs to stage ‘2’ shared

between paths ‘a’ and ‘b’ during training. The cor-

responding sampling graph is shown in Fig. 2. To

get a training example, we sample an integer

which deﬁnes the sequence length as

2K+ 1

, and

iterate the following process for

k∈[0, .., K]

and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CTL++:EvaluatingGeneralizationonNever-SeenCompositionalPatternsofKnownFunctions,andCompatibilityofNeuralRepresentationsRóbertCsordás1KazukiIrie1JürgenSchmidhuber1;21TheSwissAILabIDSIA,USI&SUPSI,Lugano,Switzerland2AIInitiative,KAUST,Thuwal,SaudiArabia{robert,kazuki,juergen}@idsia.chAbstractWell-desig...

展开>> 收起<<

CTL Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions and Compatibility of Neural Representations Róbert Csordás1Kazuki Irie1Jürgen Schmidhuber12.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CTL Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions and Compatibility of Neural Representations Róbert Csordás1Kazuki Irie1Jürgen Schmidhuber12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: