Observed Adversaries in Deep Reinforcement Learning Eugene Lim and Harold Soh National University of Singapore

2025-04-24 0 0 477.06KB 5 页 10玖币

侵权投诉

Observed Adversaries in Deep Reinforcement Learning

Eugene Lim and Harold Soh

National University of Singapore

13 Computing Drive

Singapore 117417

{elimwj,hsoh}@comp.nus.edu.sg

Abstract

In this work, we point out the problem of observed adver-

saries for deep policies. Speciﬁcally, recent work has shown

that deep reinforcement learning is susceptible to adversarial

attacks where an observed adversary acts under environmen-

tal constraints to invoke natural but adversarial observations.

This setting is particularly relevant for HRI since HRI-related

robots are expected to perform their tasks around and with

other agents. In this work, we demonstrate that this effect

persists even with low-dimensional observations. We further

show that these adversarial attacks transfer across victims,

which potentially allows malicious attackers to train an ad-

versary without access to the target victim.

1 Introduction

Recent years have seen a signiﬁcant gain in robot capabili-

ties, driven in-part by progress in artiﬁcial intelligence and

machine learning. In particular, deep learning has emerged

as a dominant methodology for crafting data-driven compo-

nents in robot systems (Punjani and Abbeel 2015; Levine

et al. 2016). However, the robustness of such methods have

recently come under scrutiny. Speciﬁcally, concerns have

been raised about the susceptibility of deep methods to ad-

versarial attacks (Szegedy et al. 2013). For example, recent

work has shown that small optimized pixel perturbations can

drastically change the predictions of computer vision mod-

els (Szegedy et al. 2013; Goodfellow, Shlens, and Szegedy

2014).

In this work, we focus on deep reinforcement learning

(DRL) (Mnih et al. 2015; Schulman et al. 2015, 2017),

which has been used to obtain policies for various robot

tasks, including those involving human-robot interaction

(HRI) (Modares et al. 2016; Khamassi et al. 2018; Xie and

Park 2021). Early works (Huang et al. 2017) showed that

adversarially modiﬁed inputs (similar to those used against

computer vision models) can be detrimental to agent be-

havior. Recently, Gleave et al. (2020) demonstrated that ar-

tiﬁcial agents are vulnerable under a more realistic threat

model: natural observations that occur as a result of an ad-

versary’s behavior under environmental constraints. These

observed adversaries are not able to arbitrarily modify the

Presented at the AI-HRI Symposium at AAAI Fall Symposium Se-

ries (FSS) 2022

victim’s inputs, yet are able to signiﬁcantly affect the vic-

tim’s behavior.

Here, we build upon Gleave et al. (2020) and show that

the observed adversary attacks are potentially even more in-

sidious. While it is natural to suspect that this vulnerability

mainly stems from the faulty perception of high-dimensional

observations, our experiments show that deep policies re-

main susceptible in low-dimensional settings where the en-

vironmental state is fully-observed. In other words, deep

policies are not robust to observed adversaries even in ar-

guably simple settings. We further show that an observed

adversary can successfully attack previously unseen victims,

which has broader downstream implications.

In the following, we will ﬁrst detail experiments designed

to investigate observed adversary attacks. We focus on Prox-

imal Policy Optimization (PPO) (Schulman et al. 2017), a

popular model-free RL method that has been widely used,

including for HRI (Xie and Park 2021). We then present our

results related to the severity and transferrability of attacks.

Finally, we discuss the implications on our ﬁndings on HRI

and potential future work that is needed to address the ro-

bustness of deep RL and advance the development of trust-

worthy robots.

2 Background & Related Work

There is a rich literature on adversarial attacks on machine

learning algorithms. Famous examples include adversarial

attacks on deep computer vision models (Szegedy et al.

2013; Goodfellow, Shlens, and Szegedy 2014). Typically,

these attacks involve solving an optimization problem to ﬁnd

the smallest perturbation on image pixels that is required to

raise the classiﬁcation loss. White-box attacks such as Fast

Gradient Sign Method (FGSM) and Projected Gradient De-

scent (PGD) approximate the solution to this optimization

problem. This approach has been extended to black-box set-

tings mainly by exploiting the transferability of adversarial

examples (Papernot et al. 2016a,b).

Recently, Huang et al. (2017) showed that gradient-based

attacks are also effective in RL settings. However, these at-

tacks assume a powerful adversary who is able to directly

modify the victim/robot’s observations. Gleave et al. (2020)

worked under a more realistic setting where the adversaries

are just agents acting in a multi-agent environment along-

side the victims. In their work, they train an adversary to

arXiv:2210.06787v1 [cs.LG] 13 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ObservedAdversariesinDeepReinforcementLearningEugeneLimandHaroldSohNationalUniversityofSingapore13ComputingDriveSingapore117417felimwj,hsohg@comp.nus.edu.sgAbstractInthiswork,wepointouttheproblemofobservedadver-sariesfordeeppolicies.Specically,recentworkhasshownthatdeepreinforcementlearningissuscep...

展开>> 收起<<

Observed Adversaries in Deep Reinforcement Learning Eugene Lim and Harold Soh National University of Singapore.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Observed Adversaries in Deep Reinforcement Learning Eugene Lim and Harold Soh National University of Singapore

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: