Functional Indirection Neural Estimator for Better Out-of-distribution Generalization

2025-05-06 0 0 5.43MB 15 页 10玖币

侵权投诉

Functional Indirection Neural Estimator

for Better Out-of-distribution Generalization

Kha Pham1Hung Le1Man Ngo2Truyen Tran1

1Applied Artiﬁcial Intelligence Institute, Deakin University

2Faculty of Mathematics and Computer Science, VNUHCM-University of Science

1{phti, thai.le, truyen.tran}@deakin.edu.au

2nmman@hcmus.edu.vn

Abstract

The capacity to achieve out-of-distribution (OOD) generalization is a hallmark of

human intelligence and yet remains out of reach for machines. This remarkable

capability has been attributed to our abilities to make conceptual abstraction and

analogy, and to a mechanism known as indirection, which binds two representations

and uses one representation to refer to the other. Inspired by these mechanisms,

we hypothesize that OOD generalization may be achieved by performing analogy-

making and indirection in the functional space instead of the data space as in

current methods. To realize this, we design

FINE

(Functional Indirection Neural

Estimator), a neural framework that learns to compose functions that map data

input to output on-the-ﬂy.

FINE

consists of a backbone network and a trainable

semantic memory of basis weight matrices. Upon seeing a new input-output

data pair,

FINE

dynamically constructs the backbone weights by mixing the basis

weights. The mixing coefﬁcients are indirectly computed through querying a

separate corresponding semantic memory using the data pair. We demonstrate

empirically that

FINE

can strongly improve out-of-distribution generalization on

IQ tasks that involve geometric transformations. In particular, we train

FINE

and competing models on IQ tasks using images from the MNIST, Omniglot

and CIFAR100 datasets and test on tasks with unseen image classes from one or

different datasets and unseen transformation rules.

FINE

not only achieves the best

performance on all tasks but also is able to adapt to small-scale data scenarios.

1 Introduction

Every computer science problem can be solved with a higher level of indirection.

—Andrew Koenig, Butler Lampson, David J. Wheeler

Generalizing to new circumstances is a hallmark of intelligence [

]. In some Intelligence

Quotient (IQ) tests–a popular benchmark for human intelligence–one must leverage their prior

experience to identify the hidden abstract rules out of a concrete example (e.g., a transformation

of an image) and then apply the rules to the next (e.g., a new set of images of totally different

appearance). These tasks necessitate several key capabilities, including conceptual abstraction and

analogy-making [

]. Abstraction allows us to extend a concept to novel situations. It is also driven

by analogy-making, which maps the current situation to the previous experience stored in the memory.

Indeed, analogy-making has been argued to be one of the most important abilities of human cognition,

or even further, “a concept is a package of analogies” [

]. The ability for humans to traverse

seamlessly across concrete and abstraction levels suggests another mechanism known as indirection

to bind two representations and use one representation to refer to the other [16, 20].

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.12739v1 [cs.LG] 23 Oct 2022

x... ...

backbone

layer 1

backbone

layer t y

function composition

indirection

yt x+

Mvalue

t,1

Mvalue

t,2

Mvalue

t,s

Mkey

t,1

Mkey

t,2

Mkey

t,s

analogy-making

MEMORY

encoder encoder

The function composer

geometric transformation

Figure 1:

FINE

architecture. Above:

FINE

uses a pre-deﬁned deep backbone architecture to

approximate a function mapping a given input embedding

to a given output embedding

. Below:

given the input

and pseudo-output

of the

tth

backbone layer,

FINE

ﬁrst computes the query

which represents the relation between

and

. Then

FINE

performs analogy-making to compare the

query with past experiences in the form of value memories. Finally,

FINE

binds the value memories

with associated key memories via indirection and computes the weight

for the

tth

backbone layer.

Several deep learning models have successfully utilized analogy and indirection. The Transformer

[

] and RelationNet [

] learn analogies between data, through self-attention or pairwise functions.

The ESBN [

] goes further by incorporating the indirection mechanism to bind an entity to a symbol

and then reason on the symbols; and this has proved to be efﬁcient on tasks involving abstract rules,

similar to those aforementioned IQ tests. However, a common drawback of these approaches is that

they operate on the data space, and thus are susceptible to out-of-distribution samples.

In this paper, we propose to perform analogy-making and indirection in functional spaces instead. We

aim to learn to compose a functional mapping from a given input to an output on-the-ﬂy. By doing

so, we achieve two clear advantages. First, since the class of possible mappings is often restricted,

it may not require a large amount of training data to learn the distribution of functions. Second,

more importantly, since this approach performs indirection in functional spaces, it avoids bindings

between numerous entities and symbols in data spaces, thus may help improve the out-of-distribution

generalization capability.

To this end, we introduce a new class of problems that requires functional analogy-making and

indirection, which are deemed to be challenging for current data-oriented approaches. The tasks are

similar to popular IQ tasks in which the model is given hints about the hidden rules, then it has to

predict the missing answer following the rules. One reasonable approach is that models should be

able to compare the current task to what they saw previously to identify the rules between appearing

entities, and thus has to search on functional spaces instead of data spaces. More concretely, we

construct the IQ tasks by applying geometric transformations to images from MNIST dataset, hand-

written Omniglot dataset and real-image CIFAR100 dataset, where the training set and test set contain

disjoint image classes from the same or different datasets, and possibly disjoint transformation rules .

Second, we present a novel framework named Functional Indirection Neural Estimator (FINE) to

solve this class of problems (see Fig. 1 for the overall architecture of

FINE

consists of (a) a

neural backbone to approximate the functions and (b) a trainable key-value memory module that

stores the basis of the network weights that spans the space of possible functions deﬁned by the

backbone. The weight basis memories allow

FINE

to perform analogy-making and indirection in the

function space. More concretely, when a new IQ task arrives, FINE ﬁrst (1) takes the hint images to

make analogies with value memories, then (2) performs indirection to bind value memories with key

memories and ﬁnally (3) computes the approximated functions based on key memories. Throughout

AB C D

Rotation

AB C D

Black-White

Figure 2: Examples of two IQ tasks involving geometric transformations. Choices with green circles

are the correct solutions. Left. 90-degree rotation. Right. syntactic black-white transformation: a part

of the image is transformed to the opposite colors.

a comprehensive suite of experiments, we demonstrate that

FINE

outperforms the competing methods

and adapts more effectively in small data scenarios.

2 Tasks

For concreteness, we will focus on Intelligence Quotient (IQ) tests, which have been widely accepted

as one of the reliable benchmarks to measure human intelligence [

]. We will study a popular class

of IQ tasks that provides hints following some hidden rules and requires the player to choose among

given choices to ﬁll in a placeholder so that the ﬁlled-in entity obeys the rules of the current task. In

order to succeed in these tasks, the player must be able to ﬁgure out the hidden rules and perform

analogy-making to select the correct choice. Moreover, once ﬁguring out the rules for the current

task, a human player can almost always solve tasks with similar rules regardless of the appearing

entities given in the tasks. This remarkable ability of out-of-distribution generalization indicates that

humans treat objects and relations abstractly instead of relying on the raw sensory data.

We aim to solve IQ tasks that involve geometric transformations (e.g., see Fig. 2 for an example),

which include afﬁne transformations (translation, rotation, shear, scale, and reﬂection), non-linear

transformations (ﬁsheye, horizontal wave) and syntactic transformations (black-white, swap). Details

of transformations are given in Supplementary. In a task, the models are given 3 images

x, y

and

, where

is the result of

after applying a geometric transformation. The models are then

asked to select

among 4 choices

y1, y2, y3, y4

so that

(x0, y0)

follows the same rule as

(x, y)

(i.e.

y=f(x)

then

y0=f(x0)

for transformation

). The 4 choices include (i) one with correct

object/image and correct transformation (which is the solution), (ii) one with correct object/image

and incorrect transformation, (iii) one with incorrect object/image and correct transformation, and

(iv) one with both incorrect object/image and transformation.

Inspired by human capability, a reasonable approach to solve these tasks is that models should be able

to ﬁgure out the transformation (or relation) between objects/images and apply the transformation to

novel objects/images. The datasets can be classiﬁed into two main categories: single-transformation

datasets and multi-transformation datasets. Single-transformation datasets are ones that only include

a particular transformation, e.g. rotation. Note that the individual transformations of the same type

vary, e.g., rotations by different angles. Multi-transformation datasets, on the other hand, consist of

several transformation types. To test the generalization capability of the models, we build testing sets

including classes of images that have never been seen during training (see Section 4.1 and Section 4.2),

or even more challenging tasks including unseen rules and unseen datasets (see Section 4.3). Models

must be able to leverage knowledge and memory gained from the training dataset to solve a new task.

3 Method

3.1 Functional Hypothesis

Let

and

be the data input and output spaces, respectively. Denote by

(Xtrain,Ytrain)⊂(X,Y)

the training set, and

(Xtest,Ytest)⊂(X,Y)

the non-overlapping test set. Classical ML assumes that

Xtrain

and

Xtest

are drawn from the same distribution. Under this hypothesis, it is reasonable (for

frequentists) to ﬁnd a function

f:X → Y

in the functional space

that ﬁts both the train and the test

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FunctionalIndirectionNeuralEstimatorforBetterOut-of-distributionGeneralizationKhaPham1HungLe1ManNgo2TruyenTran11AppliedArticialIntelligenceInstitute,DeakinUniversity2FacultyofMathematicsandComputerScience,VNUHCM-UniversityofScience1{phti,thai.le,truyen.tran}@deakin.edu.au2nmman@hcmus.edu.vnAbstract...

展开>> 收起<<

Functional Indirection Neural Estimator for Better Out-of-distribution Generalization.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Functional Indirection Neural Estimator for Better Out-of-distribution Generalization

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: