Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEERSJ IROS22 Conference Kyoto Japan 2022. Broad-persistent Advice for Interactive Reinforcement Learning Scenarios

2025-05-06 0 0 1.14MB 5 页 10玖币
侵权投诉
Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEE/RSJ IROS’22 Conference, Kyoto, Japan, 2022.
Broad-persistent Advice for Interactive
Reinforcement Learning Scenarios
Francisco Cruz1, Adam Bignold2, Hung Son Nguyen3, Richard Dazeley3, and Peter Vamplew2
Abstract The use of interactive advice in reinforcement
learning scenarios allows for speeding up the learning pro-
cess for autonomous agents. Current interactive reinforcement
learning research has been limited to real-time interactions that
offer relevant user advice to the current state only. Moreover,
the information provided by each interaction is not retained
and instead discarded by the agent after a single use. In this
paper, we present a method for retaining and reusing provided
knowledge, allowing trainers to give general advice relevant
to more than just the current state. Results obtained show
that the use of broad-persistent advice substantially improves
the performance of the agent while reducing the number of
interactions required for the trainer.
I. INTRODUCTION
Reinforcement learning (RL) is a method used for robot
control in order to learn optimal policy through interaction
with the environment, through trial and error [1]. The use of
RL in previous research shows that there is great potential for
using RL in robotic scenarios [2], [3]. Especially, deep RL
(DRL) has also achieved promising results in manipulation
skills [4], [5], and on how to grasp as well as legged
locomotion [6]. However, there is an open issue relating to
the performance in both RL and DRL algorithms, which is
the excessive time and resources required by the agent to
achieve acceptable outcomes [7], [8]. The larger and more
complex the state space is, the more computational costs will
be spent to find the optimal policy.
In this regard, interactive RL (IntRL) allows for speeding
up the learning process by including a trainer to guide or
evaluate a learning agent’s behavior [9], [10]. The assistance
provided by the trainer reinforces the behavior the agent is
learning and shapes the exploration policy, resulting in a
reduced search space [11]. Figure 1 depicts the IntRL ap-
proach. Current IntRL techniques discard the advice sourced
from the human shortly after it has been used [12], [13],
increasing the dependency on the advisor to repeatedly
provide the same advice to maximize the agent’s use of it.
Moreover, current IntRL approaches allow trainers to
evaluate or recommend actions based only on the current
state of the environment [14], [15]. This constraint restricts
the trainer to providing advice relevant to the current state
1Francisco Cruz is with the School of Computer Science and
Engineering, University of New South Wales, Sydney, Australia.
f.cruz@unsw.edu.au
2Adam Bignold and Peter Vamplew are with the School of Engineer-
ing, IT and Physical Sciences, Federation University, Ballarat, Australia.
{a.bignold, p.vamplew}@federation.edu.au
3Hung Son Nguyen and Richard Dazeley are with the School of In-
formation Technology, Deakin University, Geelong, Australia. {hsngu,
richard.dazeley}@deakin.edu.au
Environment
RL Agent
action at
state st+1
reward rt+1
User
advice λt
Reinforcement Learning
Persistent
advice λ
Fig. 1: Interactive reinforcement learning framework. In
traditional RL an agent performs an action and observes
a new state and reward. In the figure, the environment is
represented by the simulated self-driving car scenario and
the RL agent may control the direction and speed of the car.
IntRL adds advice from a user acting as an external expert
in certain situations. Our proposal includes the use of broad-
persistent advice in order to minimize the interaction with
the trainer.
and no other, even when such advice may be applicable
to multiple states [16]. Restricting the time and utility of
advice affect negatively the interactive approach in terms of
creating an increasing demand on the user’s time, instead of
withholding potentially useful information for the agent [17].
This work presents a broad-persistent advising (BPA)
approach for IntRL to provide the agent with a method for
information retention and reuse of previous advice from a
trainer. This approach includes two components: generaliza-
tion and persistence. Agents using the BPA approach exhibit
better results than their non-using counterparts and with a
substantially reduced interaction count.
II. BROAD-PERSISTENT ADVICE
Recent studies [18], [19] suggest permanent agents that
record each interaction and the circumstances around par-
ticular states. The actions are taken again when the same
conditions are met in the future. As a consequence, the
recommendations from the advisor are used more effectively,
and the agent’s performance improves. Furthermore, as there
is no need to provide advice for each repeated state, less
interaction with the advisor is required.
However, as inaccurate advice is also possible, after a
certain amount of time, a mechanism for discarding or
ignoring advice is needed. Probabilistic policy reuse (PPR)
is a strategy for improving RL agents that use advice [20].
Where various exploration policies are available, PPR uses
probabilistic bias to decide which one to choose, with the
intention of balancing between random exploration, the use
arXiv:2210.05187v1 [cs.AI] 11 Oct 2022
摘要:

Extendedabstractacceptedatthe2ndRL-CONFORMWorkshopatIEEE/RSJIROS'22Conference,Kyoto,Japan,2022.Broad-persistentAdviceforInteractiveReinforcementLearningScenariosFranciscoCruz1,AdamBignold2,HungSonNguyen3,RichardDazeley3,andPeterVamplew2Abstract—Theuseofinteractiveadviceinreinforcementlearningscenari...

展开>> 收起<<
Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEERSJ IROS22 Conference Kyoto Japan 2022. Broad-persistent Advice for Interactive Reinforcement Learning Scenarios.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:5 页 大小:1.14MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注