Coordination With Humans Via Strategy Matching Michelle Zhao Reid Simmons Henny Admoni Abstract Human and robot partners increasingly need to

2025-05-06 2 0 1.38MB 8 页 10玖币

侵权投诉

Coordination With Humans Via Strategy Matching

Michelle Zhao, Reid Simmons, Henny Admoni

Abstract— Human and robot partners increasingly need to

work together to perform tasks as a team. Robots designed for

such collaboration must reason about how their task-completion

strategies interplay with the behavior and skills of their human

team members as they coordinate on achieving joint goals. Our

goal in this work is to develop a computational framework

for robot adaptation to human partners in human-robot team

collaborations. We ﬁrst present an algorithm for autonomously

recognizing available task-completion strategies by observing

human-human teams performing a collaborative task. By

transforming team actions into low dimensional representations

using hidden Markov models, we can identify strategies without

prior knowledge. Robot policies are learned on each of the

identiﬁed strategies to construct a Mixture-of-Experts model

that adapts to the task strategies of unseen human partners.

We evaluate our model on a collaborative cooking task using an

Overcooked simulator. Results of an online user study with 125

participants demonstrate that our framework improves the task

performance and collaborative ﬂuency of human-agent teams,

as compared to state of the art reinforcement learning methods.

I. INTRODUCTION

Robots increasingly serve as collaborative partners in

applications where humans cannot operate alone, such as

robot-assisted elder care [1] and cooking [2]. In such ap-

plications, humans may employ any number of equally

reasonable strategies to achieve their goals. Robots designed

for collaboration should be able to adapt their behavior in

order to coordinate with the different strategies employed by

different human partners [3].

Poor coordination between partners can lead to inefﬁcient

collaboration on tasks. Consider a robot and human partner

working together to make a sandwich. Both partners get

condiments, believing the other is getting the bread. This

disﬂuency results in task inefﬁciencies: production time

increases since one agent now must locate the bread before

they are able to begin assembling the sandwich; and cleanup

time increases since double the amount of condiments have

been taken out. In collaborative tasks without explicit verbal

communication, teams can be even more susceptible to

disﬂuencies like these. In order to overcome disﬂuencies,

team members must be able to coordinate by inferring their

partner’s strategy from their observable actions.

In this work, we explore the following question: How

can a robot partner recognize the task strategy employed

by a human partner, and adapt its own response online?

Prior work leverages human-human team demonstrations to

learn robot behavior policies for collaborative tasks [4], using

Authors are with the Robotics Institute, Carnegie Mellon

University, Pittsburgh, PA, USA {mzhao2, rsimmons,

hadmoni}@andrew.cmu.org

Annotated

Human

Team

Trajectories

Latent HMM

Representation

Strategy Clustering

Apprenticeship

Learning

Policy

Library

⋮

Strategy

Beliefs

Unseen

Human

Partner

World

"$!

⋮

$%$&$'

⋯

Online Belief

Update

Offline Strategy

Recognition and

Policy Training

Online Mixture-of-

Experts Belief Update

"$"

"$#

Fig. 1: The ofﬂine training paradigm consists of strategy recognition and

training a policy library of strategy-speciﬁc agent policies. Beliefs over

strategies are updated online during interactions with new partners. Actions

taken by the robot are sampled from a belief-weighted combination over

action distributions generated by each strategy-speciﬁc policy.

human data in aggregate, without separating the demonstra-

tions by strategy. When a dataset contains one class that is

underrepresented, the trained model often generalizes well

to the majority class but poorly to the minority class, a

problem addressed by techniques including undersampling

[5]. This type of aggregate model performs well with the

“average” or most common human behavior, but may miss

underrepresented strategies. On the other hand, an agent

that can distinguish between strategies will likely generalize

better to a greater diversity of human behavior.

We propose a method for a robot to identify and adapt

to discrete, task-oriented strategies that determine the team’s

behavioral patterns. We situate our work in the simulated

Overcooked domain, a human-robot collaborative cooking

testbed. Using data collected in [4] of human-human team

task demonstrations for ﬁve different kitchen environments,

we annotate the trajectories to represent them as high level

task sequences. Next, we transform the annotated sequences

of team actions into low dimensional representations using

hidden Markov models. Clustering on the low-dimensional

representations extracts groups of similar team behavior,

which deﬁne discrete strategies representing different team

approaches employed on a collaborative task. Robot policies

are trained using apprenticeship learning [6] to imitate dis-

tinct strategies. The resulting agent is a Mixture-of-Experts

model that maintains a dynamic belief over the strategy space

for unseen human partners at test time.

We conducted an online user study to investigate the

utility of coordination on behavior in human-robot teams.

125 participants performed a collaborative task with our

proposed agent as well as with an existing baseline agent

[4] in the ﬁve environments for which we had data. Our

approach improved team task performance in two of the ﬁve

tested environments and team collaborative ﬂuency in three

of the ﬁve tested environments.

arXiv:2210.15099v2 [cs.RO] 7 Nov 2022

II. RELATED WORK

1) Ad-Hoc Teaming: Ad-hoc teaming in Human Robot

Interaction (HRI) requires the ability of robot agents to adapt

to unseen partners [7, 8], who may differ in knowledge,

skill, and behavior. Prior work [7] proposes a general pur-

pose algorithm that reuses knowledge learned from previous

teammates or experts to quickly adapt to new teammates. The

approach takes two forms: (1) model-based, which develops

a model of previous teammates’ behaviors to predict and

plan in online, and (2) policy-based, which learns policies

for previous teammates and selects an appropriate policy

online. Another important challenge in ad-hoc teaming is

modeling uncertainty over partner characteristics [9, 10]. In

the Overcooked environment, [4] showed that incorporating

human models learned from data improves the performance

of agents compared to agents trained to play with themselves.

Instead of training agents to partner with a general human

proxy model as in [4], we train a library of strategy-speciﬁc

agent policies that represent different coordination behavior

patterns. Distinguishing strategy allows for a policy library

that captures differences in team coordination patterns that

may otherwise wash out in a single general model.

2) Multi-agent Reinforcement Learning: In cooperative

multi-agent settings, self-play (SP) trains a team of agents

that work well together. A collaborative agent that excels

with the partners with which it was trained may not gen-

eralize well to new partners at test time, especially when

the new partners differ signiﬁcantly from the pool used for

training [11]. Other-play (OP) [12] addresses this problem,

demonstrating improved zero-shot coordination with human-

AI performance on the Hanabi game [13]. A self-play train-

ing paradigm that assembles agents representing untrained,

partially trained, and fully trained partners by extracting

agent models at different checkpoints in the training duration

has been shown to produce robust agents trained on the

suite of partners [14]. Prior work [15] models opponents in

deep multi-agent reinforcement learning settings by training

neural-based models on the hidden state observations of

opponents. A Mixture-of-Experts architecture maintains a

distribution over different opponent strategies, allowing this

model to integrate different strategy patterns.

3) Adaptation in Human-Robot Interaction: Past research

has studied how robots can adapt and learn from human

partners. Key to robot-to-human adaptation is understand-

ing people’s behavior through observation. Markov Deci-

sion Processes (MDPs) are a common framework for goal

recognition [16]. By learning a model of human intent and

preferences [17], robots can reason over different types of

human partners [18, 19]. Similar in vein to our work, [20]

applied a best-response approach to selecting policies from

a library of response policies that best match a particular

player type. Building an understanding of the human partner

requires multi-faceted models of humans that capture nu-

anced differences. Our work on adaptation focuses primarily

on adapting robot behavior to the task approach (strategy) of

a human partner. Our adaptation approach is similar to [21],

Cramped Room

Forced CoordinationCoordination Ring Counter Circuit

Asymmetric Advantages

Fig. 2: The Overcooked experimental layouts. Environments vary in the

amount of constrained space, actions available to different player positions,

and interdependence of player actions to achieve the objective.

where human demonstrations are clustered into dominant

types and a reward function is learned for each type, for

which Bayesian inference is used to adapt to new users.

III. PRELIMINARIES

1) Task Scenario: In order to study human-robot col-

laboration, we study the Overcooked environment [4], a

collaborative cooking task. Dyads (consisting of robot agents

or humans) collaborate in a constrained shared environment

(Fig 2). Their objective is to prepare an order (onion soup)

and serve it as many times as possible in an allotted time.

2) Strategies: In the Overcooked task, agents must per-

form sequences of high-level tasks to serve orders. Examples

of high-level tasks include picking up onions and plates,

placing onions into pots, and serving soup. Each high-level

task requires a sequence of lower level subtasks (i.e. motion

primitives). Teams collaborate on shared tasks in different

ways. For example, in role specialization, players take sole

responsibility for particular tasks, whereas in complete-as-

needed approaches, each partner performs the next required

task. In addition to role-oriented strategies, collaborative

approaches also prescribe the order in which tasks are

performed. Teams that serve dishes while the next orders

are cooking employ more time-efﬁcient strategies. We deﬁne

collaborative strategies as the sequence in which high-level

tasks are interleaved and distributed across teammates. Since

actions of all team members are involved in task approach,

strategy is computed at the team level.

3) MDP Formulation: The task is modeled as a two-

player Markov decision process (MDP) deﬁned by tuple

hSS, A={A1,A2},T, Ri.SS is the set of states. The

action space of a game with two players is A=A1×A2. The

set of actions available to each player iis Ai. The transition

function Tdetermines how the state changes based on a

joint action by both players, T:SS ×(A1,A2)→SS.

R:SS →Ris the team reward function. πirepresents agent

i’s policy. Z={z1, ..., zK}represents the set of possible

team collaborative strategies. We further denote a policy that

corresponds to strategy zkas πk.

IV. APPROACH

We introduce MESH (Matching Emergent Strategies to

Humans) as an approach for coordination of collaborative

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CoordinationWithHumansViaStrategyMatchingMichelleZhao,ReidSimmons,HennyAdmoniAbstractHumanandrobotpartnersincreasinglyneedtoworktogethertoperformtasksasateam.Robotsdesignedforsuchcollaborationmustreasonabouthowtheirtask-completionstrategiesinterplaywiththebehaviorandskillsoftheirhumanteammembersast...

展开>> 收起<<

Coordination With Humans Via Strategy Matching Michelle Zhao Reid Simmons Henny Admoni Abstract Human and robot partners increasingly need to.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Coordination With Humans Via Strategy Matching Michelle Zhao Reid Simmons Henny Admoni Abstract Human and robot partners increasingly need to

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: