Phantom - A RL-driven multi-agent framework to model complex systems

2025-04-26 0 0 1017.86KB 9 页 10玖币
侵权投诉
Phantom - A RL-driven multi-agent framework to model
complex systems
Leo Ardon
J.P. Morgan AI Research
leo.ardon@jpmorgan.com
Jared Vann
J.P. Morgan AI Research
jared.vann@jpmorgan.com
Deepeka Garg
J.P. Morgan AI Research
deepeka.garg@jpmorgan.com
Thomas Spooner
Sutter Hill Ventures
spooner10000@gmail.com
Sumitra Ganesh
J.P. Morgan AI Research
sumitra.ganesh@jpmorgan.com
ABSTRACT
Agent based modeling (ABM) is a computational approach to mod-
eling complex systems by specifying the behavior of autonomous
decision-making components or agents in the system and allowing
the system dynamics to emerge from their interactions. Recent
advances in the eld of Multi-agent reinforcement learning (MARL)
have made it feasible to study the equilibrium of complex envi-
ronments where multiple agents learn simultaneously. However,
most ABM frameworks are not RL-native, in that they do not oer
concepts and interfaces that are compatible with the use of MARL
to learn agent behaviors. In this paper, we introduce a new open-
source framework, Phantom, to bridge the gap between ABM and
MARL. Phantom is an RL-driven framework for agent-based mod-
eling of complex multi-agent systems including, but not limited to
economic systems and markets. The framework aims to provide the
tools to simplify the ABM specication in a MARL-compatible way
- including features to encode dynamic partial observability, agent
utility functions, heterogeneity in agent preferences or types, and
constraints on the order in which agents can act (e.g. Stackelberg
games, or more complex turn-taking environments). In this paper,
we present these features, their design rationale and present two
new environments leveraging the framework.
KEYWORDS
Reinforcement Learning, Agent-based Model, Multi-agent, Simula-
tion Framework
1 INTRODUCTION
Agent based modeling (ABM) is a paradigm to model complex
systems in a bottoms-up manner by specifying the behavior of au-
tonomous decision-making components in the system (or agents);
and allowing the system dynamics to emerge from their interac-
tions. Drawing upon their real-world counterparts they seek to
model, agents assess the state of the world and make decisions that
will aect the rest of the system inducing the emergence of non-
trivial phenomena. ABM oers several advantages over traditional
dierential equations modeling often used to study system dynam-
ics. First, the description of problems is more natural because the
real world is composed of autonomous entities. Second, it oers
exibility in the way the agents are modeled, with the option to
replicate the heterogeneity of behaviors observed in real life.
Recent advances in the eld of Reinforcement Learning (RL) have
brought another dimension to the study of complex multi-agent
systems with the introduction of an autonomous learning compo-
nent to the ABM paradigm. This line of research seeks to study the
equilibrium of such non-stationary environments where multiple
agents learn at the same time, by playing against or with each other.
Multi-agents reinforcement learning (MARL) techniques have been
applied to autonomous vehicles, cooperative agents systems and
trading simulators [4].
However, most frameworks for agent-based modeling are not
RL-native, in that they do not oer concepts and interfaces that
are compatible with the use of MARL to learn agent behaviors
in a specied ABM. Our goal with Phantom is to bridge the gap
between ABMs and MARL. Phantom is an RL-driven framework
for agent-based modeling of complex multi-agent systems such as
economic systems and markets. It leverages the power of MARL to
automatically learn agent behaviors or policies, and the equilibria
of complex general-sum games. To enable this, the framework
provides tools to specify the ABM in MARL-compatible terms -
including features to encode dynamic partial observability, agent
utility / reward functions, heterogeneity in agent preferences or
types, and constraints on the order in which agents can act.
In this paper, we elaborate on the architecture and design of
the Phantom framework and provide details about the main fea-
tures and their rationale
1
. Finally, we show how this framework
can be used to model complex environments such as markets like
the digital ads market or even operational problem in the supply
chain environment. We also evaluate the scalability of the proposed
framework by running experiments involving a high number of
agents.
2 PRINCIPAL FEATURES
2.1 Partial Observability
The agents in an ABM interact by sharing information with each
other, that can aect their behavior and eventually lead to un-
covering interesting phenomena. However, in many real-world
applications not all the information shared across the system is
available for all the agents to consume e.g. a bidder entering an
auction does not know how much its competitors are willing to bid,
a market maker might only be able to observe the pricing inquiries
it receives and its own transactions, a customer using a ride-sharing
app might only see local drivers.
1
We provide the code in the supplementary materials and will open source the
framework.
arXiv:2210.06012v3 [cs.AI] 19 May 2023
Most real-world problems have a strong component of partial
observability and it was therefore crucial for our framework to
support partially observable environments seamlessly and with
the guarantee that there will be no information leakage among
the agents. We propose in our framework, a customizable network
model to design complex relationships between the dierent agents
in the system and we oer a safe mechanism to ensure that only the
specied information is shared with the other agents, guaranteeing
true partial observability.
2.1.1 Network Model.
In Phantom, we model the relationship between agents in the
system as a network or graph where each vertex / node represents
an agent and each edge represents an open line of communication
between two agents. One of our main desiderata for the framework
was the ability to support complex and dynamic connectivity pat-
terns between the agents. For this reason, we decided to treat the
network component as a rst-class citizen of the framework. The
network can be seen as the physical layer on which the informa-
tion is sent through, which means that two agents will only be
able to communicate if an edge exists between the two vertices
representing them. This property of the framework turns out to be
particularly powerful to express partial observability.
The network being a component on its own, it is possible to
encapsulate logic to update the network dynamically and repli-
cate as closely as possible real-world interactions. For example,
in a global currency (FX) market, agents might enter and exit the
market at dierent times depending on their time-zone. In a ride-
sharing market, the connectivity of a customer to drivers depends
on geographical proximity which might vary with time as the agent
moves. These examples require dynamic or stochastic networks
which can be implemented in Phantom by extending a well-dened
network interface. Users can implement their logic in a custom
Network class and update the network topology at any point during
the experiments, with the guarantee that two agents will be able to
share information if and only if there are connected.
As part of the framework, we provide two dierent implementa-
tions of the network that already cover a range of use cases [
1
,
7
,
14
].
The rst one is a
static network
where the connectivity between
the agents is dened upfront and remains static throughout train-
ing and simulation. The second implementation, more robust, is
a
stochastic network
where each edge connecting two agents,
is associated with a probability of existing. The network can be
‘re-sampled‘ during RL-training between the episodes, to yield a
new structure which can impact the behaviors of the agents in the
system (Figure 1). Adding stochasticity in the connectivity among
agents helps prevent the MARL algorithms from overtting to a
specic network topology and is particularly useful to generalize
the learned policies over a range of possible connectivity patterns
when the actual graph is not known a priori [1].
2.1.2 Messages and Views.
In Phantom, we oer two mechanisms to share information with
a neighbour agent. The rst one, which we qualify as ‘active, take
the form of a Message intentionally sent at a time
𝑡
from one agent
to another. This active way of sharing information ensures that
the information has been consumed by the receiver of the message.
A message is triggered by an event in the system, such as a new
Sampling
A
B
C
D
E
F
A
B
C
D
E
F
A
B
C
0.5
D
0.8
E
0.2
1.0
F
0.6
...
Stochastic Network Definition Network Instances
Figure 1: Overview of the Stochastic Network where each
edge is associated with a probability of connecting two
agents. The network can be sampled during the experiment
and yields dierent structures of the relationships between
the agents. This type of network can be used to model
complex system with dynamic relationships.
time step or the reception of another message. The emission of a
new message is often associated with the decision making process
choosing the information the agent wants to share.
On the other hand, the second mechanism to share information
is referred to as ‘passive. The agent simply exposes specic infor-
mation for others to consume but does not actively send it. We use
Views to encapsulate the data to be shared. Each agent generates a
customized view for each of its neighbors with only the information
required. The views are regularly updated but no notication is
sent to the other agents. It is entirely up to them to decide if and
when they consume that information. The collection of views from
all the neighbors of a given agent represents the context of that
agent at a given time
𝑡
and can be used to make decisions. Views are
particularly useful when the data exposed changes frequently and
does not necessarily require an action from the other agent; instead
of sending a message for every change the View will be updated
without much processing from the system leading to higher overall
performances.
As opposed to some other frameworks using message buses to
expose an agent’s state to the other participants in the system,
Phantom enforces the communication to only occur through the
edges of the underlying network characterizing the connectivity
between agents. In eect, Messages and Views can only be shared
with an agent’s neighbors when there exists an edge connecting the
two nodes representing the agents. This aspect of the framework,
as well as the ability to have dierent Views for dierent neighbors,
are designed to easily encode the partial observability associated
with many real-world problems. Implementing such a property in a
subscription based model over a message bus would require ad-hoc
validation logic to evaluate whether an agent can subscribe to a
particular topic. The direct use of the underlying network to pass
摘要:

Phantom-ARL-drivenmulti-agentframeworktomodelcomplexsystemsLeoArdonJ.P.MorganAIResearchleo.ardon@jpmorgan.comJaredVannJ.P.MorganAIResearchjared.vann@jpmorgan.comDeepekaGargJ.P.MorganAIResearchdeepeka.garg@jpmorgan.comThomasSpoonerSutterHillVenturesspooner10000@gmail.comSumitraGaneshJ.P.MorganAIResea...

展开>> 收起<<
Phantom - A RL-driven multi-agent framework to model complex systems.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:1017.86KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注