Functional Indirection Neural Estimator for Better Out-of-distribution Generalization

2025-05-06 0 0 5.43MB 15 页 10玖币
侵权投诉
Functional Indirection Neural Estimator
for Better Out-of-distribution Generalization
Kha Pham1Hung Le1Man Ngo2Truyen Tran1
1Applied Artificial Intelligence Institute, Deakin University
2Faculty of Mathematics and Computer Science, VNUHCM-University of Science
1{phti, thai.le, truyen.tran}@deakin.edu.au
2nmman@hcmus.edu.vn
Abstract
The capacity to achieve out-of-distribution (OOD) generalization is a hallmark of
human intelligence and yet remains out of reach for machines. This remarkable
capability has been attributed to our abilities to make conceptual abstraction and
analogy, and to a mechanism known as indirection, which binds two representations
and uses one representation to refer to the other. Inspired by these mechanisms,
we hypothesize that OOD generalization may be achieved by performing analogy-
making and indirection in the functional space instead of the data space as in
current methods. To realize this, we design
FINE
(Functional Indirection Neural
Estimator), a neural framework that learns to compose functions that map data
input to output on-the-fly.
FINE
consists of a backbone network and a trainable
semantic memory of basis weight matrices. Upon seeing a new input-output
data pair,
FINE
dynamically constructs the backbone weights by mixing the basis
weights. The mixing coefficients are indirectly computed through querying a
separate corresponding semantic memory using the data pair. We demonstrate
empirically that
FINE
can strongly improve out-of-distribution generalization on
IQ tasks that involve geometric transformations. In particular, we train
FINE
and competing models on IQ tasks using images from the MNIST, Omniglot
and CIFAR100 datasets and test on tasks with unseen image classes from one or
different datasets and unseen transformation rules.
FINE
not only achieves the best
performance on all tasks but also is able to adapt to small-scale data scenarios.
1 Introduction
Every computer science problem can be solved with a higher level of indirection.
—Andrew Koenig, Butler Lampson, David J. Wheeler
Generalizing to new circumstances is a hallmark of intelligence [
16
,
4
,
11
]. In some Intelligence
Quotient (IQ) tests–a popular benchmark for human intelligence–one must leverage their prior
experience to identify the hidden abstract rules out of a concrete example (e.g., a transformation
of an image) and then apply the rules to the next (e.g., a new set of images of totally different
appearance). These tasks necessitate several key capabilities, including conceptual abstraction and
analogy-making [
22
]. Abstraction allows us to extend a concept to novel situations. It is also driven
by analogy-making, which maps the current situation to the previous experience stored in the memory.
Indeed, analogy-making has been argued to be one of the most important abilities of human cognition,
or even further, “a concept is a package of analogies” [
12
]. The ability for humans to traverse
seamlessly across concrete and abstraction levels suggests another mechanism known as indirection
to bind two representations and use one representation to refer to the other [16, 20].
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.12739v1 [cs.LG] 23 Oct 2022
x... ...
backbone
layer 1
backbone
layer t y
function composition
indirection
Wt
.
.
.
yt x+
t
=
Wq
t
.
.
.
Mvalue
t,1
Mvalue
t,2
Mvalue
t,s
Mkey
t,1
Mkey
t,2
Mkey
t,s
analogy-making
MEMORY
encoder encoder
The function composer
geometric transformation
Figure 1:
FINE
architecture. Above:
FINE
uses a pre-defined deep backbone architecture to
approximate a function mapping a given input embedding
x
to a given output embedding
y
. Below:
given the input
xt
and pseudo-output
yt
of the
tth
backbone layer,
FINE
first computes the query
Wq
t
which represents the relation between
xt
and
yt
. Then
FINE
performs analogy-making to compare the
query with past experiences in the form of value memories. Finally,
FINE
binds the value memories
with associated key memories via indirection and computes the weight
Wt
for the
tth
backbone layer.
Several deep learning models have successfully utilized analogy and indirection. The Transformer
[
30
] and RelationNet [
28
] learn analogies between data, through self-attention or pairwise functions.
The ESBN [
33
] goes further by incorporating the indirection mechanism to bind an entity to a symbol
and then reason on the symbols; and this has proved to be efficient on tasks involving abstract rules,
similar to those aforementioned IQ tests. However, a common drawback of these approaches is that
they operate on the data space, and thus are susceptible to out-of-distribution samples.
In this paper, we propose to perform analogy-making and indirection in functional spaces instead. We
aim to learn to compose a functional mapping from a given input to an output on-the-fly. By doing
so, we achieve two clear advantages. First, since the class of possible mappings is often restricted,
it may not require a large amount of training data to learn the distribution of functions. Second,
more importantly, since this approach performs indirection in functional spaces, it avoids bindings
between numerous entities and symbols in data spaces, thus may help improve the out-of-distribution
generalization capability.
To this end, we introduce a new class of problems that requires functional analogy-making and
indirection, which are deemed to be challenging for current data-oriented approaches. The tasks are
similar to popular IQ tasks in which the model is given hints about the hidden rules, then it has to
predict the missing answer following the rules. One reasonable approach is that models should be
able to compare the current task to what they saw previously to identify the rules between appearing
entities, and thus has to search on functional spaces instead of data spaces. More concretely, we
construct the IQ tasks by applying geometric transformations to images from MNIST dataset, hand-
written Omniglot dataset and real-image CIFAR100 dataset, where the training set and test set contain
disjoint image classes from the same or different datasets, and possibly disjoint transformation rules .
Second, we present a novel framework named Functional Indirection Neural Estimator (FINE) to
solve this class of problems (see Fig. 1 for the overall architecture of
FINE
).
FINE
consists of (a) a
neural backbone to approximate the functions and (b) a trainable key-value memory module that
stores the basis of the network weights that spans the space of possible functions defined by the
backbone. The weight basis memories allow
FINE
to perform analogy-making and indirection in the
function space. More concretely, when a new IQ task arrives, FINE first (1) takes the hint images to
make analogies with value memories, then (2) performs indirection to bind value memories with key
memories and finally (3) computes the approximated functions based on key memories. Throughout
2
?
AB C D
Rotation
?
AB C D
Black-White
Figure 2: Examples of two IQ tasks involving geometric transformations. Choices with green circles
are the correct solutions. Left. 90-degree rotation. Right. syntactic black-white transformation: a part
of the image is transformed to the opposite colors.
a comprehensive suite of experiments, we demonstrate that
FINE
outperforms the competing methods
and adapts more effectively in small data scenarios.
2 Tasks
For concreteness, we will focus on Intelligence Quotient (IQ) tests, which have been widely accepted
as one of the reliable benchmarks to measure human intelligence [
26
]. We will study a popular class
of IQ tasks that provides hints following some hidden rules and requires the player to choose among
given choices to fill in a placeholder so that the filled-in entity obeys the rules of the current task. In
order to succeed in these tasks, the player must be able to figure out the hidden rules and perform
analogy-making to select the correct choice. Moreover, once figuring out the rules for the current
task, a human player can almost always solve tasks with similar rules regardless of the appearing
entities given in the tasks. This remarkable ability of out-of-distribution generalization indicates that
humans treat objects and relations abstractly instead of relying on the raw sensory data.
We aim to solve IQ tasks that involve geometric transformations (e.g., see Fig. 2 for an example),
which include affine transformations (translation, rotation, shear, scale, and reflection), non-linear
transformations (fisheye, horizontal wave) and syntactic transformations (black-white, swap). Details
of transformations are given in Supplementary. In a task, the models are given 3 images
x, y
and
x0
, where
y
is the result of
x
after applying a geometric transformation. The models are then
asked to select
y0
among 4 choices
y1, y2, y3, y4
so that
(x0, y0)
follows the same rule as
(x, y)
(i.e.
if
y=f(x)
then
y0=f(x0)
for transformation
f
). The 4 choices include (i) one with correct
object/image and correct transformation (which is the solution), (ii) one with correct object/image
and incorrect transformation, (iii) one with incorrect object/image and correct transformation, and
(iv) one with both incorrect object/image and transformation.
Inspired by human capability, a reasonable approach to solve these tasks is that models should be able
to figure out the transformation (or relation) between objects/images and apply the transformation to
novel objects/images. The datasets can be classified into two main categories: single-transformation
datasets and multi-transformation datasets. Single-transformation datasets are ones that only include
a particular transformation, e.g. rotation. Note that the individual transformations of the same type
vary, e.g., rotations by different angles. Multi-transformation datasets, on the other hand, consist of
several transformation types. To test the generalization capability of the models, we build testing sets
including classes of images that have never been seen during training (see Section 4.1 and Section 4.2),
or even more challenging tasks including unseen rules and unseen datasets (see Section 4.3). Models
must be able to leverage knowledge and memory gained from the training dataset to solve a new task.
3 Method
3.1 Functional Hypothesis
Let
X
and
Y
be the data input and output spaces, respectively. Denote by
(Xtrain,Ytrain)(X,Y)
the training set, and
(Xtest,Ytest)(X,Y)
the non-overlapping test set. Classical ML assumes that
Xtrain
and
Xtest
are drawn from the same distribution. Under this hypothesis, it is reasonable (for
frequentists) to find a function
f:X → Y
in the functional space
F
that fits both the train and the test
3
摘要:

FunctionalIndirectionNeuralEstimatorforBetterOut-of-distributionGeneralizationKhaPham1HungLe1ManNgo2TruyenTran11AppliedArticialIntelligenceInstitute,DeakinUniversity2FacultyofMathematicsandComputerScience,VNUHCM-UniversityofScience1{phti,thai.le,truyen.tran}@deakin.edu.au2nmman@hcmus.edu.vnAbstract...

展开>> 收起<<
Functional Indirection Neural Estimator for Better Out-of-distribution Generalization.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:5.43MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注