Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers Yan Wang Gautham Vasan A. Rupam Mahmood

2025-04-29 0 0 3.34MB 7 页 10玖币
侵权投诉
Real-Time Reinforcement Learning for Vision-Based Robotics
Utilizing Local and Remote Computers
Yan Wang∗†, Gautham Vasan∗†, A. Rupam Mahmood
Abstract Real-time learning is crucial for robotic agents
adapting to ever-changing, non-stationary environments. A
common setup for a robotic agent is to have two different
computers simultaneously: a resource-limited local computer
tethered to the robot and a powerful remote computer con-
nected wirelessly. Given such a setup, it is unclear to what
extent the performance of a learning system can be affected
by resource limitations and how to efficiently use the wirelessly
connected powerful computer to compensate for any perfor-
mance loss. In this paper, we implement a real-time learning
system called the Remote-Local Distributed (ReLoD) system
to distribute computations of two deep reinforcement learning
(RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy
Optimization (PPO), between a local and a remote computer.
The performance of the system is evaluated on two vision-based
control tasks developed using a robotic arm and a mobile robot.
Our results show that SAC’s performance degrades heavily
on a resource-limited local computer. Strikingly, when all
computations of the learning system are deployed on a remote
workstation, SAC fails to compensate for the performance loss,
indicating that, without careful consideration, using a powerful
remote computer may not result in performance improvement.
However, a carefully chosen distribution of computations of
SAC consistently and substantially improves its performance
on both tasks. On the other hand, the performance of PPO
remains largely unaffected by the distribution of computations.
In addition, when all computations happen solely on a powerful
tethered computer, the performance of our system remains on
par with an existing system that is well-tuned for using a single
machine. ReLoD is the only publicly available system for real-
time RL that applies to multiple robots for vision-based tasks.
The source code can be found at https://github.com/
rlai-lab/relod
I. INTRODUCTION
Building robotic agents capable of adapting to their en-
vironments based on environmental interactions is one of
the long-standing goals of embodied artificial intelligence.
Such a capability entails learning on the fly as the agent
interacts with the physical world, also known as real-time
learning. When learning in real-time, the real world does not
pause while the agent computes actions or makes learning
updates (Mahmood et al. 2018a, Ramstedt & Pal 2019).
Moreover, the agent obtains sensorimotor information from
various onboard devices and executes action commands at
a specific frequency. Given these constraints, a real-time
learning agent must compute an action within a chosen action
*Equal Contribution.
Department of Computing Science, University of Alberta, Edmonton AB.,
Canada, T6G 2E8
CIFAR AI Chair, Alberta Machine Intelligence Institute (Amii)
Email: {yan28, vasan, armahmood}@ualberta.ca
Video: https://youtu.be/7iZKryi1xSY
Environment Interface Process
Robot Communicator Process
Sensor
Thread
Actuator
Thread
Camera Communicator
Process
Sensor
Thread
Image
Array
Actuation
Array
Reward, Observation (including image),
and Reset Computation
Proprioception
Array
Shared
Memory
Actuation
Computation
Agent Processes
Local processes
Remote Processes
(Gradient updates,
replay buffer, etc.)
Environment Processes
(in local machine)
Fig. 1: Our proposed system ReLoD distributes computations
of a learning system between a local and a remote computer.
cycle time and perform learning updates without disrupting
the periodic execution of actions (Yuan & Mahmood 2022).
Reinforcement Learning (RL) is a natural way of formu-
lating real-time control learning tasks. Although many deep
RL methods have been developed to solve complex motor
control problems (Schulman et al. 2017, Abdolmaleki et
al. 2018, Haarnoja et al. 2018) they do not easily extend
to the real-time learning setting that operates under time
and resource constraints, for example, in quadrotors and
mobile robot bases. While approaches including learning
from demonstrations (Gupta et al. 2016, Vasan & Pilarski
2017), sim-to-real (Peng et al. 2018, Bousmalis et al. 2018),
and offline RL (Levine et al. 2020) have been used to develop
pre-trained agents, there has been relatively little interest in
studying real-time learning in the real world.
State-of-the-art RL algorithms such as SAC are compu-
tationally intensive, and hence for real-time robotic control,
they go together with a computationally powerful computer
tethered to the robot (Yuan & Mahmood 2022). On the other
hand, a robot agent deployed in the real world typically
uses a tethered resource-limited computer and a wirelessly
connected workstation (Haarnoja et al. 2019, Bloesch et al.
2021). In this paper, we use local to refer to the computer
tethered to the robot and remote to refer to the wirelessly
connected computer. Computations of a learning system
using these two computers can be distributed in different
ways (e.g., see Fig. 1). However, it is unclear how much
arXiv:2210.02317v2 [cs.RO] 26 Jun 2023
the performance of a learning system is impacted due to the
wireless connection or resource limitations. Moreover, prior
works do not systematically study distributions of computa-
tions between local and remote computers nor suggest how
to achieve an effective distribution.
In this paper, we develop two vision-based tasks using a
robotic arm and a mobile robot, and propose a real-time RL
system called the Remote-Local Distributed (ReLoD) system.
Similarly to Yuan and Mahmood’s (2022) work, ReLoD
parallelizes computations of RL algorithms to maintain small
action-cycle times and reduce the computational overhead of
real-time learning. But unlike the prior work, it is designed
to utilize both a local and a remote computer. ReLoD sup-
ports three different modes of distribution: Remote-Only that
allocates all computations on the remote computer, Local-
Only that allocates all computations on the local computer,
and Remote-Local that carefully distributes the computations
between the two computers in a specific way.
Our results show that the performance of SAC on a teth-
ered resource-limited computer drops substantially compared
to its performance on a tethered powerful workstation. Sur-
prisingly, when all computations of SAC are deployed on a
wirelessly connected powerful workstation, the performance
does not improve notably, which contradicts our intuition
since this mode fully utilizes the workstation. On the other
hand, SAC’s Remote-Local mode consistently improves its
performance by a large margin on both tasks, which indicates
that a careful distribution of computations is essential to
utilize a powerful remote workstation. However, the Remote-
Local mode only benefits computationally expensive and
sample efficient methods like SAC since the relatively sim-
pler learning algorithm PPO learns similar policies in all
three modes. We also notice that the highest average return
attained by PPO is about one-third of the highest average
return attained by SAC, which indicates that SAC is more
effective in complex robotic control tasks.
Our system in the Local-Only mode can achieve a perfor-
mance that is on par with a system well-tuned for a single
computer (Yuan & Mahmood 2022), though the latter overall
learns slightly faster. This property makes our system suitable
for conventional RL studies as well.
II. RELATED WORK
A system comparable to ours is SenseAct, which provides
a computational framework for robotic learning experiments
to be reproducible in different locations and under diverse
conditions (Mahmood et al. 2018b). Although SenseAct
enables the systematic design of robotic tasks for RL, it
does not address how to distribute computations of a real-
time learning agent between two computers, and the original
work does not contain vision-based tasks. We use the guiding
principles of SenseAct to design the vision tasks and sys-
tematically study the effectiveness of different distributions
of computations of a learning agent.
Krishnan et al. (2019) introduced an open-source simulator
and a gym environment for quadrotors. Given that these
aerial robots need to accomplish their tasks with limited
onboard energy, it is prohibitive to run current computation-
ally intensive RL methods on the onboard hardware. Since
onboard computing is scarce and updating RL policies with
existing methods is computationally intensive, they carefully
designed policies considering the power and computational
resources available onboard. However, they focused on sim-
to-real techniques for learning, making their approach un-
suited for real-time learning.
Nair et al. (2015) proposed a distributed learning archi-
tecture called the GORILA framework that mainly focuses
on using multiple actors and learners to collect data in
parallel and accelerate training in simulation using clusters
of CPUs and GPUs. GORILA is conceptually akin to the
DistBelief (Dean et al. 2012) architecture. In contrast to
the GORILA framework, our system focuses primarily on
how best to distribute the computations of a learning system
between a resource-limited local computer and a powerful
remote computer to enable effective real-time learning. In
addition, the GORILA framework is customized to Deep Q-
Networks (DQN), while our system supports two different
policy gradient algorithms using a common agent interface.
Lambert et al. (2019) used a model-based reinforcement
learning approach for high-frequency control of a small
quadcopter. Their proposed system is similar to our Remote-
Only mode. A recent paper by Smith et al. (2022) demon-
strated real-time learning of walking gait from scratch on
a Unitree A1 quadrupedal robot on various terrains. Their
real-time synchronous training of SAC on a laptop is similar
to our Local-Only mode. The effectiveness of both these
approaches on vision-based tasks is untested.
Bloesch et al. (2021) used a distributed version of Max-
imum aposteriori Policy Optimization (MPO) (Abdolmaleki
et al. 2018) to learn a vision-based control policy that
can walk with Robotis OP3 bipedal robots. The robot’s
onboard computer samples actions and periodically synchro-
nizes the policy’s neural network weights with a remote
learning process at the start of each episode. Haarnoja et
al. (2019) also proposed a similar asynchronous learning
system tailored to learn a stable gait using SAC and the
minitaur robot (Kenneally et al. 2016). These tasks do not use
images. Although their proposed systems are similar to our
Remote-Local mode, these two papers aim at solving tasks
instead of systematically comparing different distributions of
computations of a learning agent between a resource-limited
computer and a powerful computer. In addition, their systems
are tailored to specific tasks and algorithms and are not
publicly available, while our system is open-source, task-
agnostic, and compatible with multiple algorithms.
III. BACKGROUND
Reinforcement learning is a setting where an agent learns
to control through trial-and-error interactions with its en-
vironment. The agent-environment interaction is modeled
with a Markov Decision Process (MDP), where an agent
interacts with its environment at discrete timesteps. At the
current timestep t, the agent is at state St∈ S, where it
takes an action At∈ A using a probability distribution π
摘要:

Real-TimeReinforcementLearningforVision-BasedRoboticsUtilizingLocalandRemoteComputersYanWang∗†,GauthamVasan∗†,A.RupamMahmood†Abstract—Real-timelearningiscrucialforroboticagentsadaptingtoever-changing,non-stationaryenvironments.Acommonsetupforaroboticagentistohavetwodifferentcomputerssimultaneously:a...

展开>> 收起<<
Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers Yan Wang Gautham Vasan A. Rupam Mahmood.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:3.34MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注