Coordination With Humans Via Strategy Matching Michelle Zhao Reid Simmons Henny Admoni Abstract Human and robot partners increasingly need to

2025-05-06 0 0 1.38MB 8 页 10玖币
侵权投诉
Coordination With Humans Via Strategy Matching
Michelle Zhao, Reid Simmons, Henny Admoni
Abstract Human and robot partners increasingly need to
work together to perform tasks as a team. Robots designed for
such collaboration must reason about how their task-completion
strategies interplay with the behavior and skills of their human
team members as they coordinate on achieving joint goals. Our
goal in this work is to develop a computational framework
for robot adaptation to human partners in human-robot team
collaborations. We first present an algorithm for autonomously
recognizing available task-completion strategies by observing
human-human teams performing a collaborative task. By
transforming team actions into low dimensional representations
using hidden Markov models, we can identify strategies without
prior knowledge. Robot policies are learned on each of the
identified strategies to construct a Mixture-of-Experts model
that adapts to the task strategies of unseen human partners.
We evaluate our model on a collaborative cooking task using an
Overcooked simulator. Results of an online user study with 125
participants demonstrate that our framework improves the task
performance and collaborative fluency of human-agent teams,
as compared to state of the art reinforcement learning methods.
I. INTRODUCTION
Robots increasingly serve as collaborative partners in
applications where humans cannot operate alone, such as
robot-assisted elder care [1] and cooking [2]. In such ap-
plications, humans may employ any number of equally
reasonable strategies to achieve their goals. Robots designed
for collaboration should be able to adapt their behavior in
order to coordinate with the different strategies employed by
different human partners [3].
Poor coordination between partners can lead to inefficient
collaboration on tasks. Consider a robot and human partner
working together to make a sandwich. Both partners get
condiments, believing the other is getting the bread. This
disfluency results in task inefficiencies: production time
increases since one agent now must locate the bread before
they are able to begin assembling the sandwich; and cleanup
time increases since double the amount of condiments have
been taken out. In collaborative tasks without explicit verbal
communication, teams can be even more susceptible to
disfluencies like these. In order to overcome disfluencies,
team members must be able to coordinate by inferring their
partner’s strategy from their observable actions.
In this work, we explore the following question: How
can a robot partner recognize the task strategy employed
by a human partner, and adapt its own response online?
Prior work leverages human-human team demonstrations to
learn robot behavior policies for collaborative tasks [4], using
Authors are with the Robotics Institute, Carnegie Mellon
University, Pittsburgh, PA, USA {mzhao2, rsimmons,
hadmoni}@andrew.cmu.org
Annotated
Human
Team
Trajectories
Latent HMM
Representation
Strategy Clustering
Apprenticeship
Learning
Policy
Library
Strategy
Beliefs
Unseen
Human
Partner
World
!!
"
!!
#
"$!
$%$&$'
Online Belief
Update
Offline Strategy
Recognition and
Policy Training
Online Mixture-of-
Experts Belief Update
×
"$"
"$#
Fig. 1: The offline training paradigm consists of strategy recognition and
training a policy library of strategy-specific agent policies. Beliefs over
strategies are updated online during interactions with new partners. Actions
taken by the robot are sampled from a belief-weighted combination over
action distributions generated by each strategy-specific policy.
human data in aggregate, without separating the demonstra-
tions by strategy. When a dataset contains one class that is
underrepresented, the trained model often generalizes well
to the majority class but poorly to the minority class, a
problem addressed by techniques including undersampling
[5]. This type of aggregate model performs well with the
“average” or most common human behavior, but may miss
underrepresented strategies. On the other hand, an agent
that can distinguish between strategies will likely generalize
better to a greater diversity of human behavior.
We propose a method for a robot to identify and adapt
to discrete, task-oriented strategies that determine the team’s
behavioral patterns. We situate our work in the simulated
Overcooked domain, a human-robot collaborative cooking
testbed. Using data collected in [4] of human-human team
task demonstrations for five different kitchen environments,
we annotate the trajectories to represent them as high level
task sequences. Next, we transform the annotated sequences
of team actions into low dimensional representations using
hidden Markov models. Clustering on the low-dimensional
representations extracts groups of similar team behavior,
which define discrete strategies representing different team
approaches employed on a collaborative task. Robot policies
are trained using apprenticeship learning [6] to imitate dis-
tinct strategies. The resulting agent is a Mixture-of-Experts
model that maintains a dynamic belief over the strategy space
for unseen human partners at test time.
We conducted an online user study to investigate the
utility of coordination on behavior in human-robot teams.
125 participants performed a collaborative task with our
proposed agent as well as with an existing baseline agent
[4] in the five environments for which we had data. Our
approach improved team task performance in two of the five
tested environments and team collaborative fluency in three
of the five tested environments.
arXiv:2210.15099v2 [cs.RO] 7 Nov 2022
II. RELATED WORK
1) Ad-Hoc Teaming: Ad-hoc teaming in Human Robot
Interaction (HRI) requires the ability of robot agents to adapt
to unseen partners [7, 8], who may differ in knowledge,
skill, and behavior. Prior work [7] proposes a general pur-
pose algorithm that reuses knowledge learned from previous
teammates or experts to quickly adapt to new teammates. The
approach takes two forms: (1) model-based, which develops
a model of previous teammates’ behaviors to predict and
plan in online, and (2) policy-based, which learns policies
for previous teammates and selects an appropriate policy
online. Another important challenge in ad-hoc teaming is
modeling uncertainty over partner characteristics [9, 10]. In
the Overcooked environment, [4] showed that incorporating
human models learned from data improves the performance
of agents compared to agents trained to play with themselves.
Instead of training agents to partner with a general human
proxy model as in [4], we train a library of strategy-specific
agent policies that represent different coordination behavior
patterns. Distinguishing strategy allows for a policy library
that captures differences in team coordination patterns that
may otherwise wash out in a single general model.
2) Multi-agent Reinforcement Learning: In cooperative
multi-agent settings, self-play (SP) trains a team of agents
that work well together. A collaborative agent that excels
with the partners with which it was trained may not gen-
eralize well to new partners at test time, especially when
the new partners differ significantly from the pool used for
training [11]. Other-play (OP) [12] addresses this problem,
demonstrating improved zero-shot coordination with human-
AI performance on the Hanabi game [13]. A self-play train-
ing paradigm that assembles agents representing untrained,
partially trained, and fully trained partners by extracting
agent models at different checkpoints in the training duration
has been shown to produce robust agents trained on the
suite of partners [14]. Prior work [15] models opponents in
deep multi-agent reinforcement learning settings by training
neural-based models on the hidden state observations of
opponents. A Mixture-of-Experts architecture maintains a
distribution over different opponent strategies, allowing this
model to integrate different strategy patterns.
3) Adaptation in Human-Robot Interaction: Past research
has studied how robots can adapt and learn from human
partners. Key to robot-to-human adaptation is understand-
ing people’s behavior through observation. Markov Deci-
sion Processes (MDPs) are a common framework for goal
recognition [16]. By learning a model of human intent and
preferences [17], robots can reason over different types of
human partners [18, 19]. Similar in vein to our work, [20]
applied a best-response approach to selecting policies from
a library of response policies that best match a particular
player type. Building an understanding of the human partner
requires multi-faceted models of humans that capture nu-
anced differences. Our work on adaptation focuses primarily
on adapting robot behavior to the task approach (strategy) of
a human partner. Our adaptation approach is similar to [21],
Cramped Room
Forced CoordinationCoordination Ring Counter Circuit
Asymmetric Advantages
Fig. 2: The Overcooked experimental layouts. Environments vary in the
amount of constrained space, actions available to different player positions,
and interdependence of player actions to achieve the objective.
where human demonstrations are clustered into dominant
types and a reward function is learned for each type, for
which Bayesian inference is used to adapt to new users.
III. PRELIMINARIES
1) Task Scenario: In order to study human-robot col-
laboration, we study the Overcooked environment [4], a
collaborative cooking task. Dyads (consisting of robot agents
or humans) collaborate in a constrained shared environment
(Fig 2). Their objective is to prepare an order (onion soup)
and serve it as many times as possible in an allotted time.
2) Strategies: In the Overcooked task, agents must per-
form sequences of high-level tasks to serve orders. Examples
of high-level tasks include picking up onions and plates,
placing onions into pots, and serving soup. Each high-level
task requires a sequence of lower level subtasks (i.e. motion
primitives). Teams collaborate on shared tasks in different
ways. For example, in role specialization, players take sole
responsibility for particular tasks, whereas in complete-as-
needed approaches, each partner performs the next required
task. In addition to role-oriented strategies, collaborative
approaches also prescribe the order in which tasks are
performed. Teams that serve dishes while the next orders
are cooking employ more time-efficient strategies. We define
collaborative strategies as the sequence in which high-level
tasks are interleaved and distributed across teammates. Since
actions of all team members are involved in task approach,
strategy is computed at the team level.
3) MDP Formulation: The task is modeled as a two-
player Markov decision process (MDP) defined by tuple
hSS, A={A1,A2},T, Ri.SS is the set of states. The
action space of a game with two players is A=A1×A2. The
set of actions available to each player iis Ai. The transition
function Tdetermines how the state changes based on a
joint action by both players, T:SS ×(A1,A2)SS.
R:SS Ris the team reward function. πirepresents agent
is policy. Z={z1, ..., zK}represents the set of possible
team collaborative strategies. We further denote a policy that
corresponds to strategy zkas πk.
IV. APPROACH
We introduce MESH (Matching Emergent Strategies to
Humans) as an approach for coordination of collaborative
摘要:

CoordinationWithHumansViaStrategyMatchingMichelleZhao,ReidSimmons,HennyAdmoniAbstract—Humanandrobotpartnersincreasinglyneedtoworktogethertoperformtasksasateam.Robotsdesignedforsuchcollaborationmustreasonabouthowtheirtask-completionstrategiesinterplaywiththebehaviorandskillsoftheirhumanteammembersast...

展开>> 收起<<
Coordination With Humans Via Strategy Matching Michelle Zhao Reid Simmons Henny Admoni Abstract Human and robot partners increasingly need to.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.38MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注