Phantom - A RL-driven multi-agent framework to model complex systems

2025-04-26 0 0 1017.86KB 9 页 10玖币

侵权投诉

Phantom - A RL-driven multi-agent framework to model

complex systems

Leo Ardon

J.P. Morgan AI Research

leo.ardon@jpmorgan.com

Jared Vann

J.P. Morgan AI Research

jared.vann@jpmorgan.com

Deepeka Garg

J.P. Morgan AI Research

deepeka.garg@jpmorgan.com

Thomas Spooner

Sutter Hill Ventures

spooner10000@gmail.com

Sumitra Ganesh

J.P. Morgan AI Research

sumitra.ganesh@jpmorgan.com

ABSTRACT

Agent based modeling (ABM) is a computational approach to mod-

eling complex systems by specifying the behavior of autonomous

decision-making components or agents in the system and allowing

the system dynamics to emerge from their interactions. Recent

advances in the eld of Multi-agent reinforcement learning (MARL)

have made it feasible to study the equilibrium of complex envi-

ronments where multiple agents learn simultaneously. However,

most ABM frameworks are not RL-native, in that they do not oer

concepts and interfaces that are compatible with the use of MARL

to learn agent behaviors. In this paper, we introduce a new open-

source framework, Phantom, to bridge the gap between ABM and

MARL. Phantom is an RL-driven framework for agent-based mod-

eling of complex multi-agent systems including, but not limited to

economic systems and markets. The framework aims to provide the

tools to simplify the ABM specication in a MARL-compatible way

- including features to encode dynamic partial observability, agent

utility functions, heterogeneity in agent preferences or types, and

constraints on the order in which agents can act (e.g. Stackelberg

games, or more complex turn-taking environments). In this paper,

we present these features, their design rationale and present two

new environments leveraging the framework.

KEYWORDS

Reinforcement Learning, Agent-based Model, Multi-agent, Simula-

tion Framework

1 INTRODUCTION

Agent based modeling (ABM) is a paradigm to model complex

systems in a bottoms-up manner by specifying the behavior of au-

tonomous decision-making components in the system (or agents);

and allowing the system dynamics to emerge from their interac-

tions. Drawing upon their real-world counterparts they seek to

model, agents assess the state of the world and make decisions that

will aect the rest of the system inducing the emergence of non-

trivial phenomena. ABM oers several advantages over traditional

dierential equations modeling often used to study system dynam-

ics. First, the description of problems is more natural because the

real world is composed of autonomous entities. Second, it oers

exibility in the way the agents are modeled, with the option to

replicate the heterogeneity of behaviors observed in real life.

Recent advances in the eld of Reinforcement Learning (RL) have

brought another dimension to the study of complex multi-agent

systems with the introduction of an autonomous learning compo-

nent to the ABM paradigm. This line of research seeks to study the

equilibrium of such non-stationary environments where multiple

agents learn at the same time, by playing against or with each other.

Multi-agents reinforcement learning (MARL) techniques have been

applied to autonomous vehicles, cooperative agents systems and

trading simulators [4].

However, most frameworks for agent-based modeling are not

RL-native, in that they do not oer concepts and interfaces that

are compatible with the use of MARL to learn agent behaviors

in a specied ABM. Our goal with Phantom is to bridge the gap

between ABMs and MARL. Phantom is an RL-driven framework

for agent-based modeling of complex multi-agent systems such as

economic systems and markets. It leverages the power of MARL to

automatically learn agent behaviors or policies, and the equilibria

of complex general-sum games. To enable this, the framework

provides tools to specify the ABM in MARL-compatible terms -

including features to encode dynamic partial observability, agent

utility / reward functions, heterogeneity in agent preferences or

types, and constraints on the order in which agents can act.

In this paper, we elaborate on the architecture and design of

the Phantom framework and provide details about the main fea-

tures and their rationale

. Finally, we show how this framework

can be used to model complex environments such as markets like

the digital ads market or even operational problem in the supply

chain environment. We also evaluate the scalability of the proposed

framework by running experiments involving a high number of

agents.

2 PRINCIPAL FEATURES

2.1 Partial Observability

The agents in an ABM interact by sharing information with each

other, that can aect their behavior and eventually lead to un-

covering interesting phenomena. However, in many real-world

applications not all the information shared across the system is

available for all the agents to consume e.g. a bidder entering an

auction does not know how much its competitors are willing to bid,

a market maker might only be able to observe the pricing inquiries

it receives and its own transactions, a customer using a ride-sharing

app might only see local drivers.

We provide the code in the supplementary materials and will open source the

framework.

arXiv:2210.06012v3 [cs.AI] 19 May 2023

Most real-world problems have a strong component of partial

observability and it was therefore crucial for our framework to

support partially observable environments seamlessly and with

the guarantee that there will be no information leakage among

the agents. We propose in our framework, a customizable network

model to design complex relationships between the dierent agents

in the system and we oer a safe mechanism to ensure that only the

specied information is shared with the other agents, guaranteeing

true partial observability.

2.1.1 Network Model.

In Phantom, we model the relationship between agents in the

system as a network or graph where each vertex / node represents

an agent and each edge represents an open line of communication

between two agents. One of our main desiderata for the framework

was the ability to support complex and dynamic connectivity pat-

terns between the agents. For this reason, we decided to treat the

network component as a rst-class citizen of the framework. The

network can be seen as the physical layer on which the informa-

tion is sent through, which means that two agents will only be

able to communicate if an edge exists between the two vertices

representing them. This property of the framework turns out to be

particularly powerful to express partial observability.

The network being a component on its own, it is possible to

encapsulate logic to update the network dynamically and repli-

cate as closely as possible real-world interactions. For example,

in a global currency (FX) market, agents might enter and exit the

market at dierent times depending on their time-zone. In a ride-

sharing market, the connectivity of a customer to drivers depends

on geographical proximity which might vary with time as the agent

moves. These examples require dynamic or stochastic networks

which can be implemented in Phantom by extending a well-dened

network interface. Users can implement their logic in a custom

Network class and update the network topology at any point during

the experiments, with the guarantee that two agents will be able to

share information if and only if there are connected.

As part of the framework, we provide two dierent implementa-

tions of the network that already cover a range of use cases [

The rst one is a

static network

where the connectivity between

the agents is dened upfront and remains static throughout train-

ing and simulation. The second implementation, more robust, is

stochastic network

where each edge connecting two agents,

is associated with a probability of existing. The network can be

‘re-sampled‘ during RL-training between the episodes, to yield a

new structure which can impact the behaviors of the agents in the

system (Figure 1). Adding stochasticity in the connectivity among

agents helps prevent the MARL algorithms from overtting to a

specic network topology and is particularly useful to generalize

the learned policies over a range of possible connectivity patterns

when the actual graph is not known a priori [1].

2.1.2 Messages and Views.

In Phantom, we oer two mechanisms to share information with

a neighbour agent. The rst one, which we qualify as ‘active‘, take

the form of a Message intentionally sent at a time

𝑡

from one agent

to another. This active way of sharing information ensures that

the information has been consumed by the receiver of the message.

A message is triggered by an event in the system, such as a new

Sampling

0.5

0.8

0.2

1.0

0.6

...

Stochastic Network Definition Network Instances

Figure 1: Overview of the Stochastic Network where each

edge is associated with a probability of connecting two

agents. The network can be sampled during the experiment

and yields dierent structures of the relationships between

the agents. This type of network can be used to model

complex system with dynamic relationships.

time step or the reception of another message. The emission of a

new message is often associated with the decision making process

choosing the information the agent wants to share.

On the other hand, the second mechanism to share information

is referred to as ‘passive‘. The agent simply exposes specic infor-

mation for others to consume but does not actively send it. We use

Views to encapsulate the data to be shared. Each agent generates a

customized view for each of its neighbors with only the information

required. The views are regularly updated but no notication is

sent to the other agents. It is entirely up to them to decide if and

when they consume that information. The collection of views from

all the neighbors of a given agent represents the context of that

agent at a given time

𝑡

and can be used to make decisions. Views are

particularly useful when the data exposed changes frequently and

does not necessarily require an action from the other agent; instead

of sending a message for every change the View will be updated

without much processing from the system leading to higher overall

performances.

As opposed to some other frameworks using message buses to

expose an agent’s state to the other participants in the system,

Phantom enforces the communication to only occur through the

edges of the underlying network characterizing the connectivity

between agents. In eect, Messages and Views can only be shared

with an agent’s neighbors when there exists an edge connecting the

two nodes representing the agents. This aspect of the framework,

as well as the ability to have dierent Views for dierent neighbors,

are designed to easily encode the partial observability associated

with many real-world problems. Implementing such a property in a

subscription based model over a message bus would require ad-hoc

validation logic to evaluate whether an agent can subscribe to a

particular topic. The direct use of the underlying network to pass

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Phantom-ARL-drivenmulti-agentframeworktomodelcomplexsystemsLeoArdonJ.P.MorganAIResearchleo.ardon@jpmorgan.comJaredVannJ.P.MorganAIResearchjared.vann@jpmorgan.comDeepekaGargJ.P.MorganAIResearchdeepeka.garg@jpmorgan.comThomasSpoonerSutterHillVenturesspooner10000@gmail.comSumitraGaneshJ.P.MorganAIResea...

展开>> 收起<<

Phantom - A RL-driven multi-agent framework to model complex systems.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Phantom - A RL-driven multi-agent framework to model complex systems

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: