ONNEURAL CONSOLIDATION FOR TRANSFER IN REINFORCEMENT LEARNING Valentin Guillet

2025-05-02 0 0 658.43KB 8 页 10玖币

侵权投诉

ONNEURAL CONSOLIDATION FOR TRANSFER IN

REINFORCEMENT LEARNING

Valentin Guillet

ISAE-SUPAERO, Université de Toulouse, France

valentin.guillet@isae-supaero.fr

Dennis Wilson

ISAE-SUPAERO, Université de Toulouse, France

dennis.wilson@isae-supaero.fr

Carlos Aguilar-Melchor

Sandbox AQ, France

carlos@sandboxaq.com

Emmanuel Rachelson

ISAE-SUPAERO, Université de Toulouse, France

emmanuel.rachelson@isae-supaero.fr

ABSTRACT

Although transfer learning is considered to be a milestone in deep reinforcement learning, the mech-

anisms behind it are still poorly understood. In particular, predicting if knowledge can be transferred

between two given tasks is still an unresolved problem. In this work, we explore the use of network

distillation as a feature extraction method to better understand the context in which transfer can

occur. Notably, we show that distillation does not prevent knowledge transfer, including when trans-

ferring from multiple tasks to a new one, and we compare these results with transfer without prior

distillation. We focus our work on the Atari benchmark due to the variability between different

games, but also to their similarities in terms of visual features.

1 Introduction

In spite of the rapid progress made in Deep Reinforcement Learning (RL) [1] in the last decade, and although state-of-

the-art algorithms are more and more efﬁcient, many fundamental issues still have not been solved and remain major

limitations in current approaches. In particular, existing algorithms train networks from scratch on each new task,

which is very computationally costly. This issue motivated the development of transfer learning [2, 3], the study of

how to transfer and reuse knowledge from a neural network to another in order to accelerate learning and beneﬁt from

previously acquired abilities. Various methods for transfer have been proposed recently, from plain ﬁne-tuning, to a

more complex use of distillation in a multi-task setting [4].

Although primarily developed for network compression [5], distillation is a technique that aims at copying the behavior

of a teacher neural network into a student one by ensuring the two represent the same function. It has been successfully

used to compress multiple teachers in a single student, thus achieving multi-task learning [6]. We investigate how

distillation in a multi-task context, which we refer to as network consolidation, is useful for knowledge transfer and

can help understand the underlying mechanisms behind transfer. Speciﬁcally, we compare different methods to achieve

consolidation on multiple visual RL tasks and discuss the importance of key details in the algorithmic design. We also

show that transfer can occur even when the consolidation process does not reach convergence. Finally, we argue that

consolidation ﬁlters out networks that would lead to negative transfer, while preserving the beneﬁts of positive transfer

cases.

In order to study these claims, we propose an experimental protocol based on the use of the Actor-Mimic algorithm

[7] that alternates between training and consolidation phases. Our approach is motivated by current neurobiological

theories modeling knowledge transfer and lifelong learning in the mammalian brain, such as the Complementary

Learning Systems (CLS) theory [8]. CLS states that memorization is based on two distinct parts of the brain: the

hippocampus, responsible for short-term adaptation and rapid learning, and the neocortex, which assimilates this

knowledge slowly and retains it on a long-term basis.

Work published at the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEE ADPRL),

2022

arXiv:2210.02240v1 [cs.LG] 5 Oct 2022

On Neural Consolidation for Transfer in Reinforcement Learning

We present an overview of related works in Section 2, before describing our experimental protocol in Section 3.

Section 4 discusses the effect of using the Actor-Mimic algorithm for consolidation, on the performance obtained

within a given set of tasks. Section 5 focuses on knowledge transfer and generalization to new tasks. Section 6

provides a comparison baseline illustrating that consolidation mitigates the effects of negative transfer. We conclude

in Section 7.

2 Background and Related Work

Distillation was originally proposed by [4] and is among the most promising methods to achieve transfer between

tasks. [7] and [9] extend the core idea of learning several functions as one, and add an incentive to also copy the

features in order to guide the training process. Similarly, [10] builds a central network, encoding common behaviors,

to share knowledge between tasks.

One major challenge in RL today is lifelong learning, i.e. how to solve different tasks sequentially while avoiding

catastrophic forgetting. Different approaches exist to tackle this problem. We follow the division in three categories

proposed in [11]. One possibility is to periodically modify the network architecture when facing new tasks in order

to enhance its representative power [12, 13, 14]. Another approach is to use regularization to preserve previously

acquired knowledge [15, 16, 17]. Finally, the lifelong learning problem can be reduced to a multi-task learning one

by using a rehearsal strategy, memorizing every task encountered [18, 19, 20, 21, 22]. These three main categories are

not mutually exclusive, and many of these algorithms make use of different techniques that belong to two categories.

The idea of alternating between an active phase of pattern-separated learning and a passive phase of generalization

as inspired by CLS theory has also been explored before. In particular, [23] introduces the PLAID algorithm that

progressively grows a central network using distillation on newly encountered tasks. Similarly, [24] successively

compresses different expert networks in a knowledge base that is then reused by new experts via lateral layer-wise

connections [13]. [25] introduces a Self-Organising Map to DRL to simulate complementary learning in a neocortical

and a hippocampal system, improving learning on grid world control and demonstrating the biological plausibility of

artiﬁcial CLS.

Instead of learning to solve multiple tasks, another possibility is to learn how to be efﬁcient at learning: this is the

meta-learning approach. One intuitive way to achieve this is by using a meta-algorithm to output a set of neural

network weights which are then used as initialization for solving new tasks [26]. On the other hand, [27] proposes the

use of a second network whose role is to deactivate part of a typical neural network. By analogy with the human brain,

this network is called the neuromodulatory network as it is responsible for activating or deactivating part of the main

network depending on the current task to solve. Finally, [28] proposes a framework for meta-algorithms which divides

them into a “What” part whose objective is to identify the current running task from context data, and a “How” part

responsible for producing a set of parameters for a neural network that will be able to solve this task.

3 Actor-Mimic Networks for consolidation in Lifelong Learning

In order to study the consolidation process and its interaction with knowledge transfer, we explore the use of the Actor-

Mimic (Network) algorithm [7, AMN] that acts as a policy distillation algorithm with an additional incentive to imitate

the teacher’s features. In standard policy distillation, as proposed by [4], the distilled network — also called student

network — learns to reproduce the output of multiple expert networks (policy regression objective) using supervised

learning. In addition, the AMN algorithm adds another feature regression objective that regularizes the features of

the student network (deﬁned as the outputs of the second-to-last layer) towards the features of the experts. Intuitively,

the policy regression objective teaches the student how it should act while this feature regression objective teaches the

result of the expert’s “thinking process” that indicates why it should act that way.

The AMN algorithm makes it possible to consolidate several expert networks at the same time while extracting features

containing the same information as the experts. As the input states of target tasks can be quite different in nature (e.g.

graphical features, color palette, etc.), it is a desirable property for the extracted features in the consolidated network

to represent abstract concepts that facilitate generalization across tasks. To evaluate this property, we propose a new

training protocol composed of two phases that emulate day-night cycles: an active learning phase in which neural

networks —coined “expert networks”— are trained individually on a set of visual RL tasks, and a passive imitation

phase in which the knowledge acquired by all experts is consolidated into a central AMN that retains knowledge in

the long term.

During the active phase, each expert network is trained on its corresponding task using a standard RL algorithm. We

use Rainbow [29] in the present study, as implemented in the Dopamine framework [30], with the typical architecture

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ONNEURALCONSOLIDATIONFORTRANSFERINREINFORCEMENTLEARNINGValentinGuilletISAE-SUPAERO,UniversitédeToulouse,Francevalentin.guillet@isae-supaero.frDennisWilsonISAE-SUPAERO,UniversitédeToulouse,Francedennis.wilson@isae-supaero.frCarlosAguilar-MelchorSandboxAQ,Francecarlos@sandboxaq.comEmmanuelRachelsonISA...

展开>> 收起<<

ONNEURAL CONSOLIDATION FOR TRANSFER IN REINFORCEMENT LEARNING Valentin Guillet.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ONNEURAL CONSOLIDATION FOR TRANSFER IN REINFORCEMENT LEARNING Valentin Guillet

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: