Cost Aware Asynchronous Multi-Agent Active Search Arundhati Banerjee

2025-04-27 0 0 882.52KB 8 页 10玖币

侵权投诉

Cost Aware Asynchronous Multi-Agent

Active Search

Arundhati Banerjee

School of Computer Science

Carnegie Mellon University

Pittsburgh, U.S.A

arundhat@cs.cmu.edu

Ramina Ghods

School of Computer Science

Carnegie Mellon University

Pittsburgh, U.S.A

rghods@andrew.cmu.edu

Jeff Schneider

School of Computer Science

Carnegie Mellon University

Pittsburgh, U.S.A

schneide@cs.cmu.edu

Abstract—Multi-agent active search requires autonomous

agents to choose sensing actions that efﬁciently locate targets. In a

realistic setting, agents also must consider the costs that their deci-

sions incur. Previously proposed active search algorithms simplify

the problem by ignoring uncertainty in the agent’s environment,

using myopic decision making, and/or overlooking costs. In this

paper, we introduce an online active search algorithm to detect

targets in an unknown environment by making adaptive cost-

aware decisions regarding the agent’s actions. Our algorithm

combines principles from Thompson Sampling (for search space

exploration and decentralized multi-agent decision making),

Monte Carlo Tree Search (for long horizon planning) and pareto-

optimal conﬁdence bounds (for multi-objective optimization in an

unknown environment) to propose an online lookahead planner

that removes all the simpliﬁcations. We analyze the algorithm’s

performance in simulation to show its efﬁcacy in cost aware active

search.

I. INTRODUCTION

Active search [

], [

] in real world applications like

environmental monitoring, or search and rescue, involves

autonomous robots (agents) accurately detecting targets by

making sequential adaptive data collection decisions while

minimizing the usage of resources like energy and time.

Previous studies have used various constraints and reductions

for resource efﬁcient adaptive search. Such algorithms generally

include parameters to trade-off the informativeness of the

collected data with the cost of such data collection. One

approach to adaptive sensing in robotics is to reduce it

to a planning problem assuming full observability of the

environment [

], [

]. Imposing a cost budget then reduces

to constrained path planning between the known start and goal

locations. Unfortunately, this is in contrast with the real world

where the agent’s environment, the number of targets and their

locations may be unknown and the agent may have access only

to noisy observations from sensing actions. All these factors

increase the difﬁculty of cost effective active search.

Besides cost efﬁciency, executing active search with multiple

agents creates an additional challenge. Centralized planning in

a multi-agent setting is often impractical due to communication

constraints [

], [

]. Further, a real world system dependent on

a central coordinator that expects synchronicity from all agents

is susceptible to communication or agent failure.

In our problem formulation, the agents are not entirely

independent actors and therefore still share information with

their peers in the team when possible. However, we do not

require them to communicate synchronously and instead assume

that each agent is able to independently plan and execute its

next sensing action using whatever information it already has

and happens to receive.

In this paper, we propose a novel cost-aware asynchronous

multi-agent active search algorithm called CAST (Cost Aware

Active Search of Sparse Targets) to enable agents to detect

sparsely distributed targets in an unknown environment using

noisy observations from region sensing actions, without any

central control or synchronous inter-agent communication.

CAST performs cost-aware active search knowing only the

costs of its feasible actions without requiring a pre-speciﬁed

cost budget. It combines Thompson sampling with Monte Carlo

Tree Search for lookahead planning and multi-agent decision

making, along with Lower Conﬁdence Bound (LCB) style

pareto optimization to tradeoff expected future reward with

the associated costs. We demonstrate the efﬁcacy of CAST

with a set of simulation results across different team sizes and

number of targets in the search space.

II. PROBLEM FORMULATION

Consider a team of autonomous agents actively sensing

regions of the search space looking for targets. To plan its

next sensing action, each cost-aware agent has to trade-off the

expected future reward of detecting a target with the overall

costs it will incur in travelling to the appropriate location and

executing the action. Given previous observations, it adaptively

makes such data-collection decisions online while minimizing

the associated costs as much as possible. Unfortunately, this

problem is NP-hard [7].

Sensing setup:

We ﬁrst describe our setup for active search

with region sensing. Consider a gridded search environment

with ground truth described by a sparse matrix having

non-

zero entries at the locations of the

targets. We deﬁne the

ﬂattened ground truth vector as

β∈Rn

where each entry is

either 1 (target) or 0 (no target).

is the search vector that we

want to recover. The sensing model for an agent

at time

t=xj

Tβ+j

t,where j

t∼ N (0, σ2).(1)

t∈Rn

is the (ﬂattened) sensing action at time

. We deﬁne

our action space

(

t∈ A

) to include only hierarchical spatial

pyramid sensing actions [

is the agent’s observation

arXiv:2210.02259v1 [cs.LG] 5 Oct 2022

and

j

is a random, i.i.d added noise.

(xj

t, yj

is agent

’s

measurement at time

. Note that the agents using this region

sensing model must trade-off between sensing a wider area with

lower accuracy versus a highly accurate sensing action over a

smaller region. The support of the vector

is appropriately

weighted so that

kxtk2= 1

to ensure each sensing action has

a constant power. This helps us in modeling observation noise

as a function of the agent’s distance from the region [

]. Since

each action has a constant power and every observation has an

i.i.d added noise with a constant variance, the signal to noise

ratio in the unit squares comprising the rectangular sensing

block reduces as the size of the sensing region is increased.

Cost model:

We introduce the additional realistic setting

that the sensing actions have different associated costs. First,

we consider that the agent travelling from location

to location

incurs a travel time cost

cd(a, b)

Second, we assume that

executing each sensing action at location

incurs a time cost

cs(b)

. Therefore,

time steps after starting from location

an agent

has executed actions

{xj

t}T

t=1

and incurs a total

cost deﬁned by Cj(T) = PT

t=1 cd(xj

t−1, xj

t) + cs(xj

t).

Communication setup:

We assume that communication,

although unreliable, will be available sometimes and the agents

should utilize it when possible. The agents share their own past

measurements asynchronously with teammates, but do not wait

for communication from their teammates at any time since this

wait could be arbitrarily long and thus cause a loss of valuable

sensing time. In the absence of synchronicity, we also do not

require that the set of available past measurements remain

consistent across agents since communication problems can

disrupt it. We will denote the set of measurements available to

an agent

at time

t={(xt0, yt0)|{t0}⊆{1, . . . , t−1}}

|Dj

t| ≤ t−1

, which includes its own past observations and

those received from its teammates till time t.

III. RELATED WORK

Autonomous target search has diverse applications in envi-

ronment monitoring [

], wildlife protection [

] as well as

search and rescue operations [

]. In robotics, informative path

planning (IPP) problems focus on adaptive decision making to

reach a speciﬁed goal state or region. In contrast to our setting,

common IPP algorithms consider a known environment [

are myopic or use non-adaptive lookahead [

], and assume

weakly coupled sensing and movement actions [15].

Bayesian optimization and active learning methods are

another approach to active search [

], [

]. Unfortunately,

they mostly address single-agent systems, or if multi-agent

they assume central coordination [

], [

] and except for

[

] lack any realistic assumptions on sensing actions. Multi-

agent asynchronous active search algorithms proposed in [

[

] tackle several of these challenges but they are myopic

in nature. Further, [

] introduced cost effective non-myopic

active search but their simpliﬁed setting excludes simultaneous

evaluation of multiple search points with different costs.

We assume a constant travelling speed and compute the Euclidean distance

between locations aand b.

Our active search formulation has close similarities with

planning under uncertainty using a Partially Observable Markov

Decision Process (POMDP) [

]. Monte Carlo Tree Search

(MCTS) [

], [

] has found success as a generic online

planning algorithm in large POMDPs [

], but is mostly limited

to the single agent setting [28], [29].

Decentralized POMDP (Dec-POMDP) [

], [

] is another

framework for decentralized active information gathering using

multiple agents which is typically solved using ofﬂine, cen-

tralized planning followed by online, decentralized execution

[

], [

]. Decentralized MCTS (Dec-MCTS) algorithms have

also been proposed for multi-robot active perception under a

cost budget [

], [

] but they typically rely on each agent

maintaining a joint probability distribution over its own belief

as well as those of the other agents.

Finally, cost aware active search can be viewed as a multi-

objective sequential decision making problem. [

] developed

an MCTS algorithm for cost budgeted POMDPs using a

scalarized version of the reward-cost trade-off whereas [

]

introduced multi-objective MCTS (MO-MCTS) for discovering

global pareto-optimal decision sequences in the search tree.

Unfortunately, MO-MCTS is computationally expensive and

unsuitable for online planning. [

] proposed the Pareto MCTS

algorithm for multi-objective IPP but they ignore uncertainty

due to partial observability in the search space.

IV. OUR PROPOSED ALGORITHM: CAST

A. Background

We ﬁrst brieﬂy describe the concepts essential to the planning

and decision making components of our algorithm.

Monte Carlo Tree Search (MCTS)

is an online algorithm

that combines tree search with random sampling in a domain-

independent manner. In our setup, a cost-aware agent would

beneﬁt from the ability to lookahead into the adaptive evolution

of its belief about the target’s distribution in the environment

in response to possible observations from the actions it

might execute. We therefore consider MCTS as the basis for

developing our online planning method with ﬁnite horizon

lookahead. Unfortunately, the presence of uncertainty about

targets (number and location) in the unknown environment

together with the noisy observations introduces additional

challenges in our problem formulation.

Pareto optimality:

Our formulation of cost aware active

search described in Section II can be viewed as a multi-

objective sequential decision making problem. A common

approach to solving such multi-objective optimization problems

is scalarization i.e. considering a weighted combination result-

ing in a single-objective problem that trades off the different

objectives. However, tuning the weight attributed to each

objective is challenging since they might be scaling quantities

having different units and their relative importance might be

context dependent. In contrast, pareto optimization builds on

the idea that some solutions to the multi-objective optimization

problem are categorically worse than others and are dominated

by a set of pareto-optimal solution vectors forming a pareto

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CostAwareAsynchronousMulti-AgentActiveSearchArundhatiBanerjeeSchoolofComputerScienceCarnegieMellonUniversityPittsburgh,U.S.Aarundhat@cs.cmu.eduRaminaGhodsSchoolofComputerScienceCarnegieMellonUniversityPittsburgh,U.S.Arghods@andrew.cmu.eduJeffSchneiderSchoolofComputerScienceCarnegieMellonUniversityPi...

展开>> 收起<<

Cost Aware Asynchronous Multi-Agent Active Search Arundhati Banerjee.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Cost Aware Asynchronous Multi-Agent Active Search Arundhati Banerjee

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: