Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEERSJ IROS22 Conference Kyoto Japan 2022. Broad-persistent Advice for Interactive Reinforcement Learning Scenarios

2025-05-06 0 0 1.14MB 5 页 10玖币

侵权投诉

Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEE/RSJ IROS’22 Conference, Kyoto, Japan, 2022.

Broad-persistent Advice for Interactive

Reinforcement Learning Scenarios

Francisco Cruz1, Adam Bignold2, Hung Son Nguyen3, Richard Dazeley3, and Peter Vamplew2

Abstract— The use of interactive advice in reinforcement

learning scenarios allows for speeding up the learning pro-

cess for autonomous agents. Current interactive reinforcement

learning research has been limited to real-time interactions that

offer relevant user advice to the current state only. Moreover,

the information provided by each interaction is not retained

and instead discarded by the agent after a single use. In this

paper, we present a method for retaining and reusing provided

knowledge, allowing trainers to give general advice relevant

to more than just the current state. Results obtained show

that the use of broad-persistent advice substantially improves

the performance of the agent while reducing the number of

interactions required for the trainer.

I. INTRODUCTION

Reinforcement learning (RL) is a method used for robot

control in order to learn optimal policy through interaction

with the environment, through trial and error [1]. The use of

RL in previous research shows that there is great potential for

using RL in robotic scenarios [2], [3]. Especially, deep RL

(DRL) has also achieved promising results in manipulation

skills [4], [5], and on how to grasp as well as legged

locomotion [6]. However, there is an open issue relating to

the performance in both RL and DRL algorithms, which is

the excessive time and resources required by the agent to

achieve acceptable outcomes [7], [8]. The larger and more

complex the state space is, the more computational costs will

be spent to ﬁnd the optimal policy.

In this regard, interactive RL (IntRL) allows for speeding

up the learning process by including a trainer to guide or

evaluate a learning agent’s behavior [9], [10]. The assistance

provided by the trainer reinforces the behavior the agent is

learning and shapes the exploration policy, resulting in a

reduced search space [11]. Figure 1 depicts the IntRL ap-

proach. Current IntRL techniques discard the advice sourced

from the human shortly after it has been used [12], [13],

increasing the dependency on the advisor to repeatedly

provide the same advice to maximize the agent’s use of it.

Moreover, current IntRL approaches allow trainers to

evaluate or recommend actions based only on the current

state of the environment [14], [15]. This constraint restricts

the trainer to providing advice relevant to the current state

1Francisco Cruz is with the School of Computer Science and

Engineering, University of New South Wales, Sydney, Australia.

f.cruz@unsw.edu.au

2Adam Bignold and Peter Vamplew are with the School of Engineer-

ing, IT and Physical Sciences, Federation University, Ballarat, Australia.

{a.bignold, p.vamplew}@federation.edu.au

3Hung Son Nguyen and Richard Dazeley are with the School of In-

formation Technology, Deakin University, Geelong, Australia. {hsngu,

richard.dazeley}@deakin.edu.au

Environment

RL Agent

action at

state st+1

reward rt+1

User

advice λt

Reinforcement Learning

Persistent

advice λ

Fig. 1: Interactive reinforcement learning framework. In

traditional RL an agent performs an action and observes

a new state and reward. In the ﬁgure, the environment is

represented by the simulated self-driving car scenario and

the RL agent may control the direction and speed of the car.

IntRL adds advice from a user acting as an external expert

in certain situations. Our proposal includes the use of broad-

persistent advice in order to minimize the interaction with

the trainer.

and no other, even when such advice may be applicable

to multiple states [16]. Restricting the time and utility of

advice affect negatively the interactive approach in terms of

creating an increasing demand on the user’s time, instead of

withholding potentially useful information for the agent [17].

This work presents a broad-persistent advising (BPA)

approach for IntRL to provide the agent with a method for

information retention and reuse of previous advice from a

trainer. This approach includes two components: generaliza-

tion and persistence. Agents using the BPA approach exhibit

better results than their non-using counterparts and with a

substantially reduced interaction count.

II. BROAD-PERSISTENT ADVICE

Recent studies [18], [19] suggest permanent agents that

record each interaction and the circumstances around par-

ticular states. The actions are taken again when the same

conditions are met in the future. As a consequence, the

recommendations from the advisor are used more effectively,

and the agent’s performance improves. Furthermore, as there

is no need to provide advice for each repeated state, less

interaction with the advisor is required.

However, as inaccurate advice is also possible, after a

certain amount of time, a mechanism for discarding or

ignoring advice is needed. Probabilistic policy reuse (PPR)

is a strategy for improving RL agents that use advice [20].

Where various exploration policies are available, PPR uses

probabilistic bias to decide which one to choose, with the

intention of balancing between random exploration, the use

arXiv:2210.05187v1 [cs.AI] 11 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Extendedabstractacceptedatthe2ndRL-CONFORMWorkshopatIEEE/RSJIROS'22Conference,Kyoto,Japan,2022.Broad-persistentAdviceforInteractiveReinforcementLearningScenariosFranciscoCruz1,AdamBignold2,HungSonNguyen3,RichardDazeley3,andPeterVamplew2AbstractTheuseofinteractiveadviceinreinforcementlearningscenari...

展开>> 收起<<

Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEERSJ IROS22 Conference Kyoto Japan 2022. Broad-persistent Advice for Interactive Reinforcement Learning Scenarios.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEERSJ IROS22 Conference Kyoto Japan 2022. Broad-persistent Advice for Interactive Reinforcement Learning Scenarios

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: