ONNEURAL CONSOLIDATION FOR TRANSFER IN REINFORCEMENT LEARNING Valentin Guillet

2025-05-02 0 0 658.43KB 8 页 10玖币
侵权投诉
ONNEURAL CONSOLIDATION FOR TRANSFER IN
REINFORCEMENT LEARNING
Valentin Guillet
ISAE-SUPAERO, Université de Toulouse, France
valentin.guillet@isae-supaero.fr
Dennis Wilson
ISAE-SUPAERO, Université de Toulouse, France
dennis.wilson@isae-supaero.fr
Carlos Aguilar-Melchor
Sandbox AQ, France
carlos@sandboxaq.com
Emmanuel Rachelson
ISAE-SUPAERO, Université de Toulouse, France
emmanuel.rachelson@isae-supaero.fr
ABSTRACT
Although transfer learning is considered to be a milestone in deep reinforcement learning, the mech-
anisms behind it are still poorly understood. In particular, predicting if knowledge can be transferred
between two given tasks is still an unresolved problem. In this work, we explore the use of network
distillation as a feature extraction method to better understand the context in which transfer can
occur. Notably, we show that distillation does not prevent knowledge transfer, including when trans-
ferring from multiple tasks to a new one, and we compare these results with transfer without prior
distillation. We focus our work on the Atari benchmark due to the variability between different
games, but also to their similarities in terms of visual features.
1 Introduction
In spite of the rapid progress made in Deep Reinforcement Learning (RL) [1] in the last decade, and although state-of-
the-art algorithms are more and more efficient, many fundamental issues still have not been solved and remain major
limitations in current approaches. In particular, existing algorithms train networks from scratch on each new task,
which is very computationally costly. This issue motivated the development of transfer learning [2, 3], the study of
how to transfer and reuse knowledge from a neural network to another in order to accelerate learning and benefit from
previously acquired abilities. Various methods for transfer have been proposed recently, from plain fine-tuning, to a
more complex use of distillation in a multi-task setting [4].
Although primarily developed for network compression [5], distillation is a technique that aims at copying the behavior
of a teacher neural network into a student one by ensuring the two represent the same function. It has been successfully
used to compress multiple teachers in a single student, thus achieving multi-task learning [6]. We investigate how
distillation in a multi-task context, which we refer to as network consolidation, is useful for knowledge transfer and
can help understand the underlying mechanisms behind transfer. Specifically, we compare different methods to achieve
consolidation on multiple visual RL tasks and discuss the importance of key details in the algorithmic design. We also
show that transfer can occur even when the consolidation process does not reach convergence. Finally, we argue that
consolidation filters out networks that would lead to negative transfer, while preserving the benefits of positive transfer
cases.
In order to study these claims, we propose an experimental protocol based on the use of the Actor-Mimic algorithm
[7] that alternates between training and consolidation phases. Our approach is motivated by current neurobiological
theories modeling knowledge transfer and lifelong learning in the mammalian brain, such as the Complementary
Learning Systems (CLS) theory [8]. CLS states that memorization is based on two distinct parts of the brain: the
hippocampus, responsible for short-term adaptation and rapid learning, and the neocortex, which assimilates this
knowledge slowly and retains it on a long-term basis.
Work published at the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEE ADPRL),
2022
arXiv:2210.02240v1 [cs.LG] 5 Oct 2022
On Neural Consolidation for Transfer in Reinforcement Learning
We present an overview of related works in Section 2, before describing our experimental protocol in Section 3.
Section 4 discusses the effect of using the Actor-Mimic algorithm for consolidation, on the performance obtained
within a given set of tasks. Section 5 focuses on knowledge transfer and generalization to new tasks. Section 6
provides a comparison baseline illustrating that consolidation mitigates the effects of negative transfer. We conclude
in Section 7.
2 Background and Related Work
Distillation was originally proposed by [4] and is among the most promising methods to achieve transfer between
tasks. [7] and [9] extend the core idea of learning several functions as one, and add an incentive to also copy the
features in order to guide the training process. Similarly, [10] builds a central network, encoding common behaviors,
to share knowledge between tasks.
One major challenge in RL today is lifelong learning, i.e. how to solve different tasks sequentially while avoiding
catastrophic forgetting. Different approaches exist to tackle this problem. We follow the division in three categories
proposed in [11]. One possibility is to periodically modify the network architecture when facing new tasks in order
to enhance its representative power [12, 13, 14]. Another approach is to use regularization to preserve previously
acquired knowledge [15, 16, 17]. Finally, the lifelong learning problem can be reduced to a multi-task learning one
by using a rehearsal strategy, memorizing every task encountered [18, 19, 20, 21, 22]. These three main categories are
not mutually exclusive, and many of these algorithms make use of different techniques that belong to two categories.
The idea of alternating between an active phase of pattern-separated learning and a passive phase of generalization
as inspired by CLS theory has also been explored before. In particular, [23] introduces the PLAID algorithm that
progressively grows a central network using distillation on newly encountered tasks. Similarly, [24] successively
compresses different expert networks in a knowledge base that is then reused by new experts via lateral layer-wise
connections [13]. [25] introduces a Self-Organising Map to DRL to simulate complementary learning in a neocortical
and a hippocampal system, improving learning on grid world control and demonstrating the biological plausibility of
artificial CLS.
Instead of learning to solve multiple tasks, another possibility is to learn how to be efficient at learning: this is the
meta-learning approach. One intuitive way to achieve this is by using a meta-algorithm to output a set of neural
network weights which are then used as initialization for solving new tasks [26]. On the other hand, [27] proposes the
use of a second network whose role is to deactivate part of a typical neural network. By analogy with the human brain,
this network is called the neuromodulatory network as it is responsible for activating or deactivating part of the main
network depending on the current task to solve. Finally, [28] proposes a framework for meta-algorithms which divides
them into a “What” part whose objective is to identify the current running task from context data, and a “How” part
responsible for producing a set of parameters for a neural network that will be able to solve this task.
3 Actor-Mimic Networks for consolidation in Lifelong Learning
In order to study the consolidation process and its interaction with knowledge transfer, we explore the use of the Actor-
Mimic (Network) algorithm [7, AMN] that acts as a policy distillation algorithm with an additional incentive to imitate
the teacher’s features. In standard policy distillation, as proposed by [4], the distilled network — also called student
network — learns to reproduce the output of multiple expert networks (policy regression objective) using supervised
learning. In addition, the AMN algorithm adds another feature regression objective that regularizes the features of
the student network (defined as the outputs of the second-to-last layer) towards the features of the experts. Intuitively,
the policy regression objective teaches the student how it should act while this feature regression objective teaches the
result of the expert’s “thinking process” that indicates why it should act that way.
The AMN algorithm makes it possible to consolidate several expert networks at the same time while extracting features
containing the same information as the experts. As the input states of target tasks can be quite different in nature (e.g.
graphical features, color palette, etc.), it is a desirable property for the extracted features in the consolidated network
to represent abstract concepts that facilitate generalization across tasks. To evaluate this property, we propose a new
training protocol composed of two phases that emulate day-night cycles: an active learning phase in which neural
networks —coined “expert networks”— are trained individually on a set of visual RL tasks, and a passive imitation
phase in which the knowledge acquired by all experts is consolidated into a central AMN that retains knowledge in
the long term.
During the active phase, each expert network is trained on its corresponding task using a standard RL algorithm. We
use Rainbow [29] in the present study, as implemented in the Dopamine framework [30], with the typical architecture
2
摘要:

ONNEURALCONSOLIDATIONFORTRANSFERINREINFORCEMENTLEARNINGValentinGuilletISAE-SUPAERO,UniversitédeToulouse,Francevalentin.guillet@isae-supaero.frDennisWilsonISAE-SUPAERO,UniversitédeToulouse,Francedennis.wilson@isae-supaero.frCarlosAguilar-MelchorSandboxAQ,Francecarlos@sandboxaq.comEmmanuelRachelsonISA...

展开>> 收起<<
ONNEURAL CONSOLIDATION FOR TRANSFER IN REINFORCEMENT LEARNING Valentin Guillet.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:658.43KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注