Continual t ask learning in natural and artificial agents

2025-04-22 0 0 926.69KB 18 页 10玖币
侵权投诉
Continual task learning in natural and artificial agents
Timo Flesch1, Andrew Saxe2 and Christopher Summerfield1
1Department of Experimental Psychology, University of Oxford, Oxford, UK.
2Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre, UCL, London, UK.
Correspondence: {timo.flesch, christopher.summerfield}@psy.ox.ac.uk
Abstract
How do humans and other animals learn new tasks? A wave of brain recording studies has
investigated how neural representations change during task learning, with a focus on how tasks
can be acquired and coded in ways that minimise mutual interference. We review recent work
that has explored the geometry and dimensionality of neural task representations in neocortex,
and computational models that have exploited these findings to understand how the brain may
partition knowledge between tasks. We discuss how ideas from machine learning, including
those that combine supervised and unsupervised learning, are helping neuroscientists
understand how natural tasks are learned and coded in biological brains.
Keywords
Continual learning, neural networks, representational geometry, Hebbian gating.
Highlights
Both natural and artificial agents face the challenge of learning in ways that support
effective future behaviour
This may be achieved by different learning regimes, associated with distinct dynamics,
and differing dimensionality and geometry of neural task representations
Where two different tasks are learned, neural codes for task-relevant information may
be factorised in neocortex
Combinations of supervised and unsupervised learning mechanisms may help partition
task knowledge and avoid catastrophic interference
Declaration
Authors declare no competing interests
1. Natural tasks
In the natural world, humans and other animals behave in temporally structured ways that
depend on environmental context. For example, many mammals cycle systematically through
daily activities such as foraging, grooming, napping, and socialising. Humans live in complex
societies in which labour is shared among group members, with each adult performing multiple
successive roles, such as securing resources, caring for young, or exchanging social
information. In many settings, we can describe the behaviour of natural agents as comprising
a succession of distinct tasks for which a desired outcome (reward) is achieved by taking
actions (responses) to observations (stimuli) through the learning of latent causal processes
(rules).
The nature of task-driven behaviour, and the way that tasks are represented and implemented
in neural circuits, have been widely studied by cognitive and neural scientists. One important
finding is that switching between distinct tasks incurs a cost in decision accuracy and latency
[1]. This switch cost implies the existence of control mechanisms that ensure we remain “on
task”, possibly protecting ongoing behavioural routines from interference [2,3]. In primates,
there is good evidence that control signals originate in the prefrontal cortex (PFC) and
encourage task-appropriate behaviours by biasing neural activity in sensory and motor regions
[4]. For example, single cells in the PFC have been observed to respond to task rules [5], and
patients with PFC damage tend to select tasks erroneously, leading to disinhibited or
inappropriate behaviours [6].
How, then, are tasks coded in the PFC and interconnected regions? One key insight is that
mutual interference among tasks can be mitigated when they are coded in independent
subspaces of neural activity, such that the neural population vector evoked during task A is
uncorrelated with that occurring during task B [7,8]. Over the past decade, evidence for this
coding principle has emerged in domains as varied as skilled motor control [9], auditory
prediction [10], memory processes [11,12], and visual categorisation [13,14]. However, the
precise computational mechanisms by which tasks are encoded and implemented remain a
matter of ongoing debate.
2. Rich and lazy learning
One way to study how tasks could be neurally encoded is to simulate learning in a simple class
of computational model a neural network trained with gradient descent. Neural networks
uniquely allow researchers to form hypotheses about how neural codes form in biological
brains, because their representations emerge through optimisation rather than being hand-
crafted by the researcher [15]. One recent observation is that neural networks can learn to
perform tasks in different regimes that are characterised by qualitatively diverging learning
dynamics and distinct neural patterns at convergence [16,17]. In the lazy regime, which occurs
when network weights are initialised with a broader range of values (e.g., higher connection
strengths), the dimensionality of the input signals is rapidly expanded via random projections
to the hidden layer such that learning is mostly confined to the readout weights, and error
decreases exponentially [1720]. By contrast, in the rich regime, which occurs when weights
are initialised with low variance (weak connectivity), the hidden units learn highly structured
representations that are tailored to the specific demands of the task, and the loss curve tends to
pass through one or more saddle points before convergence [2124]. We illustrate using a
simple example learning an “exclusive or” (XOR) problem in Fig. 1A-D.
Figure 1. Rich and Lazy learning in neural networks. (A) The XOR (exclusive or) problem requires to provide
the same response A when either one or the other of two input units are set to one, and the response B when both
are 0 or both are 1. A linear classifier can’t learn to distinguish between the two classes. (B) Feedforward neural
network architecture that can solve the XOR task. The inputs are mapped into a hidden layer with non-linear
outputs, and from there into a non-linear response layer. (C) Effect of different initial weight variances on the
training dynamics of the network shown in (B). We distinguish between rich learning (with small initial weight
variance, light blue) and lazy learning (with large initial weight variance, dark blue). The change of the magnitude
of the input-to hidden (left) and hidden-to output (middle) depends strongly on initialisation strength. In lazy
initialised networks, the input-to hidden weights remain very close to their initial values and learning is confined
to the readout weights. In rich initialised networks, all weights adapt substantially. Moreover, rich-initialised
networks learn much slower than lazy-initialised networks (right). (D) Learned input-to-hidden weights after rich
(top) and lazy (bottom) learning. Under rich learning, the weights learn to point to the four input types. In contrast,
under lazy learning, the weights point in arbitrary directions, effectively performing a random mapping into a
higher-dimensional space.
These schemes may have complementary costs and benefits. High-dimensional coding
schemes maximise the number of discriminations that can be linearly read out from the
network, allowing agents to rapidly learn a new decision rule for a task [25]. Low-dimensional
coding schemes confer robustness through redundancy, because neurons exhibit overlapping
tuning properties, and promote generalisation, because they tend to correspond to simpler
input-output functions when the neural manifold extends in fewer directions [26,27].
Neural recordings have offered evidence for both rich and lazy coding schemes. One important
observation is that the variables that define a task observations, actions, outcomes and rules
are often encoded jointly by single neurons. For example, when monkeys make choices on
the basis of distinct cues, single cells tend to multiplex input and choice variables [28,29]. In
another study, the dimensionality of neural codes recorded during performance of a dual
memory task was found to approach its theoretical maximum, implying that neurons represent
every possible combination of relevant variables [12]. This finding is consistent with “lazy”
learning, implying that brains encode tasks via high-dimensional representations that enmesh
multiple task-relevant variables across the neural population.
However, there is also important evidence that neural systems learn representations that mirror
the structure of the task, as might be predicted in the “rich” regime. For example, it is often
observed that neurons active in one task are silent during another, and vice versa. For example,
when macaques were trained to categorise morphed animal images according to two
independent classification schemes, 29% of PFC neurons became active during a single
scheme, whereas only 2% of neurons were active du ring both [30]. Much more recently, a
similar finding was reported using modern two-photon imaging methods in the parietal cortex
of mice trained to perform both a grating discrimination task and a T-maze task. Over half of
the recorded neurons were active in at least one task, but a much smaller fraction was active in
both tasks [31]. In other words, the brain learns to partition task knowledge across independent
sets of neurons.
One recent study used a neural network model to explicitly compare the predictions of the rich
and lazy learning schemes to neural signals recorded from the human brain [13]. They
developed a task (similar to ref [30]) that involved discriminating naturalistic images in two
independent contexts. Human participants learn to make “plant / don’t plant” decisions about
quasi-naturalistic images of trees with continuously varying branch density and leaf density,
whose growth success was determined by leaf density in one “garden” (task A) and branch
density in the other (task B) (Fig. 2A). Neural networks could be trained to perform a stylised
version of this task under either rich or lazy learning schemes by varying the different initial
connection strengths (Fig. 2B). Multivariate methods used to visualise the representational
dissimilarity matrix (RDM) and corresponding neural geometry for the network hidden layer
under either scheme revealed that they made quite different predictions (Fig. 2C). Under lazy
learning, the network learned a high dimensional solution whose RDM simply recapitulated
the organisation of the input signals (into a grid defined by “leafiness” and “branchiness”). This
is expected because randomly expanding the dimensionality of the inputs does not distort their
similarity structure. However, under the rich scheme, the network compressed information that
was irrelevant dimension to each context, so that the hidden layer represented the relevant input
dimensions (leafiness and branchiness) on two neural planes lying at right angles in neural state
space (Fig. 2C-D). Strikingly, BOLD signals in the posterior parietal cortex (PPC) and
dorsomedial prefrontal cortex (dmPFC) exhibited a similar dimensionality and RDMs revealed
a comparable geometric arrangement onto “orthogonal planes”, providing evidence in support
of “rich” task representations in the human brain (Fig. 2E-F).
Neural systems can thus learn both structured, low-dimensional task representations, and
unstructured, high-dimensional codes. In artificial neural networks, the emerging regime
depends on the magnitude of initial connection strengths in the network [15]. In the brain, these
regimes may arise through other mechanisms such as pressure toward metabolic efficiency
(regularisation), or architectural constraints that enforce unlearned nonlinear expansions.
Whilst it remains unclear when, how and why either coding scheme might be adopted in the
biological brain, neural theory is emerging that may help clarify this issue. One recent paper
explored how representational structure is shaped by specific task demands [32]. Comparing
recurrent artificial neural networks trained on different cognitive tasks, the authors found that
those tasks that required flexible input-output mappings, such as the context-dependent
decision task outlined above, induced task-specific representations, similar to the ones
observed under rich learning in feedforward networks. In contrast, for tasks that did not require
such a flexible mapping, the authors observed completely random, unstructured
representations. This suggests that representational geometry is not only determined by
intrinsic factors such as initial connection strength, but flexibly adapts to the computational
demands of specific tasks. Rich task-specific representations might therefore arise when there
摘要:

ContinualtasklearninginnaturalandartificialagentsTimoFlesch1,AndrewSaxe2andChristopherSummerfield11DepartmentofExperimentalPsychology,UniversityofOxford,Oxford,UK.2GatsbyComputationalNeuroscienceUnit&SainsburyWellcomeCentre,UCL,London,UK.Correspondence:{timo.flesch,christopher.summerfield}@psy.ox.ac...

展开>> 收起<<
Continual t ask learning in natural and artificial agents.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:926.69KB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注