Observed Adversaries in Deep Reinforcement Learning Eugene Lim and Harold Soh National University of Singapore

2025-04-24 0 0 477.06KB 5 页 10玖币
侵权投诉
Observed Adversaries in Deep Reinforcement Learning
Eugene Lim and Harold Soh
National University of Singapore
13 Computing Drive
Singapore 117417
{elimwj,hsoh}@comp.nus.edu.sg
Abstract
In this work, we point out the problem of observed adver-
saries for deep policies. Specifically, recent work has shown
that deep reinforcement learning is susceptible to adversarial
attacks where an observed adversary acts under environmen-
tal constraints to invoke natural but adversarial observations.
This setting is particularly relevant for HRI since HRI-related
robots are expected to perform their tasks around and with
other agents. In this work, we demonstrate that this effect
persists even with low-dimensional observations. We further
show that these adversarial attacks transfer across victims,
which potentially allows malicious attackers to train an ad-
versary without access to the target victim.
1 Introduction
Recent years have seen a significant gain in robot capabili-
ties, driven in-part by progress in artificial intelligence and
machine learning. In particular, deep learning has emerged
as a dominant methodology for crafting data-driven compo-
nents in robot systems (Punjani and Abbeel 2015; Levine
et al. 2016). However, the robustness of such methods have
recently come under scrutiny. Specifically, concerns have
been raised about the susceptibility of deep methods to ad-
versarial attacks (Szegedy et al. 2013). For example, recent
work has shown that small optimized pixel perturbations can
drastically change the predictions of computer vision mod-
els (Szegedy et al. 2013; Goodfellow, Shlens, and Szegedy
2014).
In this work, we focus on deep reinforcement learning
(DRL) (Mnih et al. 2015; Schulman et al. 2015, 2017),
which has been used to obtain policies for various robot
tasks, including those involving human-robot interaction
(HRI) (Modares et al. 2016; Khamassi et al. 2018; Xie and
Park 2021). Early works (Huang et al. 2017) showed that
adversarially modified inputs (similar to those used against
computer vision models) can be detrimental to agent be-
havior. Recently, Gleave et al. (2020) demonstrated that ar-
tificial agents are vulnerable under a more realistic threat
model: natural observations that occur as a result of an ad-
versary’s behavior under environmental constraints. These
observed adversaries are not able to arbitrarily modify the
Presented at the AI-HRI Symposium at AAAI Fall Symposium Se-
ries (FSS) 2022
victim’s inputs, yet are able to significantly affect the vic-
tim’s behavior.
Here, we build upon Gleave et al. (2020) and show that
the observed adversary attacks are potentially even more in-
sidious. While it is natural to suspect that this vulnerability
mainly stems from the faulty perception of high-dimensional
observations, our experiments show that deep policies re-
main susceptible in low-dimensional settings where the en-
vironmental state is fully-observed. In other words, deep
policies are not robust to observed adversaries even in ar-
guably simple settings. We further show that an observed
adversary can successfully attack previously unseen victims,
which has broader downstream implications.
In the following, we will first detail experiments designed
to investigate observed adversary attacks. We focus on Prox-
imal Policy Optimization (PPO) (Schulman et al. 2017), a
popular model-free RL method that has been widely used,
including for HRI (Xie and Park 2021). We then present our
results related to the severity and transferrability of attacks.
Finally, we discuss the implications on our findings on HRI
and potential future work that is needed to address the ro-
bustness of deep RL and advance the development of trust-
worthy robots.
2 Background & Related Work
There is a rich literature on adversarial attacks on machine
learning algorithms. Famous examples include adversarial
attacks on deep computer vision models (Szegedy et al.
2013; Goodfellow, Shlens, and Szegedy 2014). Typically,
these attacks involve solving an optimization problem to find
the smallest perturbation on image pixels that is required to
raise the classification loss. White-box attacks such as Fast
Gradient Sign Method (FGSM) and Projected Gradient De-
scent (PGD) approximate the solution to this optimization
problem. This approach has been extended to black-box set-
tings mainly by exploiting the transferability of adversarial
examples (Papernot et al. 2016a,b).
Recently, Huang et al. (2017) showed that gradient-based
attacks are also effective in RL settings. However, these at-
tacks assume a powerful adversary who is able to directly
modify the victim/robot’s observations. Gleave et al. (2020)
worked under a more realistic setting where the adversaries
are just agents acting in a multi-agent environment along-
side the victims. In their work, they train an adversary to
arXiv:2210.06787v1 [cs.LG] 13 Oct 2022
摘要:

ObservedAdversariesinDeepReinforcementLearningEugeneLimandHaroldSohNationalUniversityofSingapore13ComputingDriveSingapore117417felimwj,hsohg@comp.nus.edu.sgAbstractInthiswork,wepointouttheproblemofobservedadver-sariesfordeeppolicies.Specically,recentworkhasshownthatdeepreinforcementlearningissuscep...

展开>> 收起<<
Observed Adversaries in Deep Reinforcement Learning Eugene Lim and Harold Soh National University of Singapore.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:477.06KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注