Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers Yan Wang Gautham Vasan A. Rupam Mahmood

2025-04-29 0 0 3.34MB 7 页 10玖币

侵权投诉

Real-Time Reinforcement Learning for Vision-Based Robotics

Utilizing Local and Remote Computers

Yan Wang∗†, Gautham Vasan∗†, A. Rupam Mahmood†

Abstract— Real-time learning is crucial for robotic agents

adapting to ever-changing, non-stationary environments. A

common setup for a robotic agent is to have two different

computers simultaneously: a resource-limited local computer

tethered to the robot and a powerful remote computer con-

nected wirelessly. Given such a setup, it is unclear to what

extent the performance of a learning system can be affected

by resource limitations and how to efﬁciently use the wirelessly

connected powerful computer to compensate for any perfor-

mance loss. In this paper, we implement a real-time learning

system called the Remote-Local Distributed (ReLoD) system

to distribute computations of two deep reinforcement learning

(RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy

Optimization (PPO), between a local and a remote computer.

The performance of the system is evaluated on two vision-based

control tasks developed using a robotic arm and a mobile robot.

Our results show that SAC’s performance degrades heavily

on a resource-limited local computer. Strikingly, when all

computations of the learning system are deployed on a remote

workstation, SAC fails to compensate for the performance loss,

indicating that, without careful consideration, using a powerful

remote computer may not result in performance improvement.

However, a carefully chosen distribution of computations of

SAC consistently and substantially improves its performance

on both tasks. On the other hand, the performance of PPO

remains largely unaffected by the distribution of computations.

In addition, when all computations happen solely on a powerful

tethered computer, the performance of our system remains on

par with an existing system that is well-tuned for using a single

machine. ReLoD is the only publicly available system for real-

time RL that applies to multiple robots for vision-based tasks.

The source code can be found at https://github.com/

rlai-lab/relod

I. INTRODUCTION

Building robotic agents capable of adapting to their en-

vironments based on environmental interactions is one of

the long-standing goals of embodied artiﬁcial intelligence.

Such a capability entails learning on the ﬂy as the agent

interacts with the physical world, also known as real-time

learning. When learning in real-time, the real world does not

pause while the agent computes actions or makes learning

updates (Mahmood et al. 2018a, Ramstedt & Pal 2019).

Moreover, the agent obtains sensorimotor information from

various onboard devices and executes action commands at

a speciﬁc frequency. Given these constraints, a real-time

learning agent must compute an action within a chosen action

*Equal Contribution.

†Department of Computing Science, University of Alberta, Edmonton AB.,

Canada, T6G 2E8

CIFAR AI Chair, Alberta Machine Intelligence Institute (Amii)

Email: {yan28, vasan, armahmood}@ualberta.ca

Video: https://youtu.be/7iZKryi1xSY

Environment Interface Process

Robot Communicator Process

Sensor

Thread

Actuator

Thread

Camera Communicator

Process

Sensor

Thread

Image

Array

Actuation

Array

Reward, Observation (including image),

and Reset Computation

Proprioception

Array

Shared

Memory

Actuation

Computation

Agent Processes

Local processes

Remote Processes

(Gradient updates,

replay buffer, etc.)

Environment Processes

(in local machine)

Fig. 1: Our proposed system ReLoD distributes computations

of a learning system between a local and a remote computer.

cycle time and perform learning updates without disrupting

the periodic execution of actions (Yuan & Mahmood 2022).

Reinforcement Learning (RL) is a natural way of formu-

lating real-time control learning tasks. Although many deep

RL methods have been developed to solve complex motor

control problems (Schulman et al. 2017, Abdolmaleki et

al. 2018, Haarnoja et al. 2018) they do not easily extend

to the real-time learning setting that operates under time

and resource constraints, for example, in quadrotors and

mobile robot bases. While approaches including learning

from demonstrations (Gupta et al. 2016, Vasan & Pilarski

2017), sim-to-real (Peng et al. 2018, Bousmalis et al. 2018),

and ofﬂine RL (Levine et al. 2020) have been used to develop

pre-trained agents, there has been relatively little interest in

studying real-time learning in the real world.

State-of-the-art RL algorithms such as SAC are compu-

tationally intensive, and hence for real-time robotic control,

they go together with a computationally powerful computer

tethered to the robot (Yuan & Mahmood 2022). On the other

hand, a robot agent deployed in the real world typically

uses a tethered resource-limited computer and a wirelessly

connected workstation (Haarnoja et al. 2019, Bloesch et al.

2021). In this paper, we use local to refer to the computer

tethered to the robot and remote to refer to the wirelessly

connected computer. Computations of a learning system

using these two computers can be distributed in different

ways (e.g., see Fig. 1). However, it is unclear how much

arXiv:2210.02317v2 [cs.RO] 26 Jun 2023

the performance of a learning system is impacted due to the

wireless connection or resource limitations. Moreover, prior

works do not systematically study distributions of computa-

tions between local and remote computers nor suggest how

to achieve an effective distribution.

In this paper, we develop two vision-based tasks using a

robotic arm and a mobile robot, and propose a real-time RL

system called the Remote-Local Distributed (ReLoD) system.

Similarly to Yuan and Mahmood’s (2022) work, ReLoD

parallelizes computations of RL algorithms to maintain small

action-cycle times and reduce the computational overhead of

real-time learning. But unlike the prior work, it is designed

to utilize both a local and a remote computer. ReLoD sup-

ports three different modes of distribution: Remote-Only that

allocates all computations on the remote computer, Local-

Only that allocates all computations on the local computer,

and Remote-Local that carefully distributes the computations

between the two computers in a speciﬁc way.

Our results show that the performance of SAC on a teth-

ered resource-limited computer drops substantially compared

to its performance on a tethered powerful workstation. Sur-

prisingly, when all computations of SAC are deployed on a

wirelessly connected powerful workstation, the performance

does not improve notably, which contradicts our intuition

since this mode fully utilizes the workstation. On the other

hand, SAC’s Remote-Local mode consistently improves its

performance by a large margin on both tasks, which indicates

that a careful distribution of computations is essential to

utilize a powerful remote workstation. However, the Remote-

Local mode only beneﬁts computationally expensive and

sample efﬁcient methods like SAC since the relatively sim-

pler learning algorithm PPO learns similar policies in all

three modes. We also notice that the highest average return

attained by PPO is about one-third of the highest average

return attained by SAC, which indicates that SAC is more

effective in complex robotic control tasks.

Our system in the Local-Only mode can achieve a perfor-

mance that is on par with a system well-tuned for a single

computer (Yuan & Mahmood 2022), though the latter overall

learns slightly faster. This property makes our system suitable

for conventional RL studies as well.

II. RELATED WORK

A system comparable to ours is SenseAct, which provides

a computational framework for robotic learning experiments

to be reproducible in different locations and under diverse

conditions (Mahmood et al. 2018b). Although SenseAct

enables the systematic design of robotic tasks for RL, it

does not address how to distribute computations of a real-

time learning agent between two computers, and the original

work does not contain vision-based tasks. We use the guiding

principles of SenseAct to design the vision tasks and sys-

tematically study the effectiveness of different distributions

of computations of a learning agent.

Krishnan et al. (2019) introduced an open-source simulator

and a gym environment for quadrotors. Given that these

aerial robots need to accomplish their tasks with limited

onboard energy, it is prohibitive to run current computation-

ally intensive RL methods on the onboard hardware. Since

onboard computing is scarce and updating RL policies with

existing methods is computationally intensive, they carefully

designed policies considering the power and computational

resources available onboard. However, they focused on sim-

to-real techniques for learning, making their approach un-

suited for real-time learning.

Nair et al. (2015) proposed a distributed learning archi-

tecture called the GORILA framework that mainly focuses

on using multiple actors and learners to collect data in

parallel and accelerate training in simulation using clusters

of CPUs and GPUs. GORILA is conceptually akin to the

DistBelief (Dean et al. 2012) architecture. In contrast to

the GORILA framework, our system focuses primarily on

how best to distribute the computations of a learning system

between a resource-limited local computer and a powerful

remote computer to enable effective real-time learning. In

addition, the GORILA framework is customized to Deep Q-

Networks (DQN), while our system supports two different

policy gradient algorithms using a common agent interface.

Lambert et al. (2019) used a model-based reinforcement

learning approach for high-frequency control of a small

quadcopter. Their proposed system is similar to our Remote-

Only mode. A recent paper by Smith et al. (2022) demon-

strated real-time learning of walking gait from scratch on

a Unitree A1 quadrupedal robot on various terrains. Their

real-time synchronous training of SAC on a laptop is similar

to our Local-Only mode. The effectiveness of both these

approaches on vision-based tasks is untested.

Bloesch et al. (2021) used a distributed version of Max-

imum aposteriori Policy Optimization (MPO) (Abdolmaleki

et al. 2018) to learn a vision-based control policy that

can walk with Robotis OP3 bipedal robots. The robot’s

onboard computer samples actions and periodically synchro-

nizes the policy’s neural network weights with a remote

learning process at the start of each episode. Haarnoja et

al. (2019) also proposed a similar asynchronous learning

system tailored to learn a stable gait using SAC and the

minitaur robot (Kenneally et al. 2016). These tasks do not use

images. Although their proposed systems are similar to our

Remote-Local mode, these two papers aim at solving tasks

instead of systematically comparing different distributions of

computations of a learning agent between a resource-limited

computer and a powerful computer. In addition, their systems

are tailored to speciﬁc tasks and algorithms and are not

publicly available, while our system is open-source, task-

agnostic, and compatible with multiple algorithms.

III. BACKGROUND

Reinforcement learning is a setting where an agent learns

to control through trial-and-error interactions with its en-

vironment. The agent-environment interaction is modeled

with a Markov Decision Process (MDP), where an agent

interacts with its environment at discrete timesteps. At the

current timestep t, the agent is at state St∈ S, where it

takes an action At∈ A using a probability distribution π

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Real-TimeReinforcementLearningforVision-BasedRoboticsUtilizingLocalandRemoteComputersYanWang∗†,GauthamVasan∗†,A.RupamMahmood†Abstract—Real-timelearningiscrucialforroboticagentsadaptingtoever-changing,non-stationaryenvironments.Acommonsetupforaroboticagentistohavetwodifferentcomputerssimultaneously:a...

展开>> 收起<<

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers Yan Wang Gautham Vasan A. Rupam Mahmood.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers Yan Wang Gautham Vasan A. Rupam Mahmood

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: