Cost Aware Asynchronous Multi-Agent Active Search Arundhati Banerjee

2025-04-27 0 0 882.52KB 8 页 10玖币
侵权投诉
Cost Aware Asynchronous Multi-Agent
Active Search
Arundhati Banerjee
School of Computer Science
Carnegie Mellon University
Pittsburgh, U.S.A
arundhat@cs.cmu.edu
Ramina Ghods
School of Computer Science
Carnegie Mellon University
Pittsburgh, U.S.A
rghods@andrew.cmu.edu
Jeff Schneider
School of Computer Science
Carnegie Mellon University
Pittsburgh, U.S.A
schneide@cs.cmu.edu
Abstract—Multi-agent active search requires autonomous
agents to choose sensing actions that efficiently locate targets. In a
realistic setting, agents also must consider the costs that their deci-
sions incur. Previously proposed active search algorithms simplify
the problem by ignoring uncertainty in the agent’s environment,
using myopic decision making, and/or overlooking costs. In this
paper, we introduce an online active search algorithm to detect
targets in an unknown environment by making adaptive cost-
aware decisions regarding the agent’s actions. Our algorithm
combines principles from Thompson Sampling (for search space
exploration and decentralized multi-agent decision making),
Monte Carlo Tree Search (for long horizon planning) and pareto-
optimal confidence bounds (for multi-objective optimization in an
unknown environment) to propose an online lookahead planner
that removes all the simplifications. We analyze the algorithm’s
performance in simulation to show its efficacy in cost aware active
search.
I. INTRODUCTION
Active search [
1
], [
2
] in real world applications like
environmental monitoring, or search and rescue, involves
autonomous robots (agents) accurately detecting targets by
making sequential adaptive data collection decisions while
minimizing the usage of resources like energy and time.
Previous studies have used various constraints and reductions
for resource efficient adaptive search. Such algorithms generally
include parameters to trade-off the informativeness of the
collected data with the cost of such data collection. One
approach to adaptive sensing in robotics is to reduce it
to a planning problem assuming full observability of the
environment [
3
], [
4
]. Imposing a cost budget then reduces
to constrained path planning between the known start and goal
locations. Unfortunately, this is in contrast with the real world
where the agent’s environment, the number of targets and their
locations may be unknown and the agent may have access only
to noisy observations from sensing actions. All these factors
increase the difficulty of cost effective active search.
Besides cost efficiency, executing active search with multiple
agents creates an additional challenge. Centralized planning in
a multi-agent setting is often impractical due to communication
constraints [
5
], [
6
]. Further, a real world system dependent on
a central coordinator that expects synchronicity from all agents
is susceptible to communication or agent failure.
In our problem formulation, the agents are not entirely
independent actors and therefore still share information with
their peers in the team when possible. However, we do not
require them to communicate synchronously and instead assume
that each agent is able to independently plan and execute its
next sensing action using whatever information it already has
and happens to receive.
In this paper, we propose a novel cost-aware asynchronous
multi-agent active search algorithm called CAST (Cost Aware
Active Search of Sparse Targets) to enable agents to detect
sparsely distributed targets in an unknown environment using
noisy observations from region sensing actions, without any
central control or synchronous inter-agent communication.
CAST performs cost-aware active search knowing only the
costs of its feasible actions without requiring a pre-specified
cost budget. It combines Thompson sampling with Monte Carlo
Tree Search for lookahead planning and multi-agent decision
making, along with Lower Confidence Bound (LCB) style
pareto optimization to tradeoff expected future reward with
the associated costs. We demonstrate the efficacy of CAST
with a set of simulation results across different team sizes and
number of targets in the search space.
II. PROBLEM FORMULATION
Consider a team of autonomous agents actively sensing
regions of the search space looking for targets. To plan its
next sensing action, each cost-aware agent has to trade-off the
expected future reward of detecting a target with the overall
costs it will incur in travelling to the appropriate location and
executing the action. Given previous observations, it adaptively
makes such data-collection decisions online while minimizing
the associated costs as much as possible. Unfortunately, this
problem is NP-hard [7].
Sensing setup:
We first describe our setup for active search
with region sensing. Consider a gridded search environment
with ground truth described by a sparse matrix having
k
non-
zero entries at the locations of the
k
targets. We define the
flattened ground truth vector as
βRn
where each entry is
either 1 (target) or 0 (no target).
β
is the search vector that we
want to recover. The sensing model for an agent
j
at time
t
is
yj
t=xj
t
Tβ+j
t,where j
t N (0, σ2).(1)
xj
tRn
is the (flattened) sensing action at time
t
. We define
our action space
A
(
xj
t∈ A
) to include only hierarchical spatial
pyramid sensing actions [
8
].
yj
t
is the agent’s observation
arXiv:2210.02259v1 [cs.LG] 5 Oct 2022
and
j
t
is a random, i.i.d added noise.
(xj
t, yj
t)
is agent
j
s
measurement at time
t
. Note that the agents using this region
sensing model must trade-off between sensing a wider area with
lower accuracy versus a highly accurate sensing action over a
smaller region. The support of the vector
xt
is appropriately
weighted so that
kxtk2= 1
to ensure each sensing action has
a constant power. This helps us in modeling observation noise
as a function of the agent’s distance from the region [
9
]. Since
each action has a constant power and every observation has an
i.i.d added noise with a constant variance, the signal to noise
ratio in the unit squares comprising the rectangular sensing
block reduces as the size of the sensing region is increased.
Cost model:
We introduce the additional realistic setting
that the sensing actions have different associated costs. First,
we consider that the agent travelling from location
a
to location
b
incurs a travel time cost
cd(a, b)
.
1
Second, we assume that
executing each sensing action at location
b
incurs a time cost
cs(b)
. Therefore,
T
time steps after starting from location
x0
,
an agent
j
has executed actions
{xj
t}T
t=1
and incurs a total
cost defined by Cj(T) = PT
t=1 cd(xj
t1, xj
t) + cs(xj
t).
Communication setup:
We assume that communication,
although unreliable, will be available sometimes and the agents
should utilize it when possible. The agents share their own past
measurements asynchronously with teammates, but do not wait
for communication from their teammates at any time since this
wait could be arbitrarily long and thus cause a loss of valuable
sensing time. In the absence of synchronicity, we also do not
require that the set of available past measurements remain
consistent across agents since communication problems can
disrupt it. We will denote the set of measurements available to
an agent
j
at time
t
by
Dj
t={(xt0, yt0)|{t0}⊆{1, . . . , t1}}
,
|Dj
t| ≤ t1
, which includes its own past observations and
those received from its teammates till time t.
III. RELATED WORK
Autonomous target search has diverse applications in envi-
ronment monitoring [
10
], wildlife protection [
11
] as well as
search and rescue operations [
12
]. In robotics, informative path
planning (IPP) problems focus on adaptive decision making to
reach a specified goal state or region. In contrast to our setting,
common IPP algorithms consider a known environment [
13
],
are myopic or use non-adaptive lookahead [
14
], and assume
weakly coupled sensing and movement actions [15].
Bayesian optimization and active learning methods are
another approach to active search [
16
], [
17
], [
18
]. Unfortunately,
they mostly address single-agent systems, or if multi-agent
they assume central coordination [
19
], [
20
] and except for
[
21
] lack any realistic assumptions on sensing actions. Multi-
agent asynchronous active search algorithms proposed in [
9
],
[
22
] tackle several of these challenges but they are myopic
in nature. Further, [
23
] introduced cost effective non-myopic
active search but their simplified setting excludes simultaneous
evaluation of multiple search points with different costs.
1
We assume a constant travelling speed and compute the Euclidean distance
between locations aand b.
Our active search formulation has close similarities with
planning under uncertainty using a Partially Observable Markov
Decision Process (POMDP) [
24
]. Monte Carlo Tree Search
(MCTS) [
25
], [
26
] has found success as a generic online
planning algorithm in large POMDPs [
27
], but is mostly limited
to the single agent setting [28], [29].
Decentralized POMDP (Dec-POMDP) [
30
], [
31
] is another
framework for decentralized active information gathering using
multiple agents which is typically solved using offline, cen-
tralized planning followed by online, decentralized execution
[
32
], [
33
]. Decentralized MCTS (Dec-MCTS) algorithms have
also been proposed for multi-robot active perception under a
cost budget [
34
], [
35
] but they typically rely on each agent
maintaining a joint probability distribution over its own belief
as well as those of the other agents.
Finally, cost aware active search can be viewed as a multi-
objective sequential decision making problem. [
36
] developed
an MCTS algorithm for cost budgeted POMDPs using a
scalarized version of the reward-cost trade-off whereas [
37
]
introduced multi-objective MCTS (MO-MCTS) for discovering
global pareto-optimal decision sequences in the search tree.
Unfortunately, MO-MCTS is computationally expensive and
unsuitable for online planning. [
38
] proposed the Pareto MCTS
algorithm for multi-objective IPP but they ignore uncertainty
due to partial observability in the search space.
IV. OUR PROPOSED ALGORITHM: CAST
A. Background
We first briefly describe the concepts essential to the planning
and decision making components of our algorithm.
Monte Carlo Tree Search (MCTS)
is an online algorithm
that combines tree search with random sampling in a domain-
independent manner. In our setup, a cost-aware agent would
benefit from the ability to lookahead into the adaptive evolution
of its belief about the target’s distribution in the environment
in response to possible observations from the actions it
might execute. We therefore consider MCTS as the basis for
developing our online planning method with finite horizon
lookahead. Unfortunately, the presence of uncertainty about
targets (number and location) in the unknown environment
together with the noisy observations introduces additional
challenges in our problem formulation.
Pareto optimality:
Our formulation of cost aware active
search described in Section II can be viewed as a multi-
objective sequential decision making problem. A common
approach to solving such multi-objective optimization problems
is scalarization i.e. considering a weighted combination result-
ing in a single-objective problem that trades off the different
objectives. However, tuning the weight attributed to each
objective is challenging since they might be scaling quantities
having different units and their relative importance might be
context dependent. In contrast, pareto optimization builds on
the idea that some solutions to the multi-objective optimization
problem are categorically worse than others and are dominated
by a set of pareto-optimal solution vectors forming a pareto
摘要:

CostAwareAsynchronousMulti-AgentActiveSearchArundhatiBanerjeeSchoolofComputerScienceCarnegieMellonUniversityPittsburgh,U.S.Aarundhat@cs.cmu.eduRaminaGhodsSchoolofComputerScienceCarnegieMellonUniversityPittsburgh,U.S.Arghods@andrew.cmu.eduJeffSchneiderSchoolofComputerScienceCarnegieMellonUniversityPi...

展开>> 收起<<
Cost Aware Asynchronous Multi-Agent Active Search Arundhati Banerjee.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:882.52KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注