
Most real-world problems have a strong component of partial
observability and it was therefore crucial for our framework to
support partially observable environments seamlessly and with
the guarantee that there will be no information leakage among
the agents. We propose in our framework, a customizable network
model to design complex relationships between the dierent agents
in the system and we oer a safe mechanism to ensure that only the
specied information is shared with the other agents, guaranteeing
true partial observability.
2.1.1 Network Model.
In Phantom, we model the relationship between agents in the
system as a network or graph where each vertex / node represents
an agent and each edge represents an open line of communication
between two agents. One of our main desiderata for the framework
was the ability to support complex and dynamic connectivity pat-
terns between the agents. For this reason, we decided to treat the
network component as a rst-class citizen of the framework. The
network can be seen as the physical layer on which the informa-
tion is sent through, which means that two agents will only be
able to communicate if an edge exists between the two vertices
representing them. This property of the framework turns out to be
particularly powerful to express partial observability.
The network being a component on its own, it is possible to
encapsulate logic to update the network dynamically and repli-
cate as closely as possible real-world interactions. For example,
in a global currency (FX) market, agents might enter and exit the
market at dierent times depending on their time-zone. In a ride-
sharing market, the connectivity of a customer to drivers depends
on geographical proximity which might vary with time as the agent
moves. These examples require dynamic or stochastic networks
which can be implemented in Phantom by extending a well-dened
network interface. Users can implement their logic in a custom
Network class and update the network topology at any point during
the experiments, with the guarantee that two agents will be able to
share information if and only if there are connected.
As part of the framework, we provide two dierent implementa-
tions of the network that already cover a range of use cases [
1
,
7
,
14
].
The rst one is a
static network
where the connectivity between
the agents is dened upfront and remains static throughout train-
ing and simulation. The second implementation, more robust, is
a
stochastic network
where each edge connecting two agents,
is associated with a probability of existing. The network can be
‘re-sampled‘ during RL-training between the episodes, to yield a
new structure which can impact the behaviors of the agents in the
system (Figure 1). Adding stochasticity in the connectivity among
agents helps prevent the MARL algorithms from overtting to a
specic network topology and is particularly useful to generalize
the learned policies over a range of possible connectivity patterns
when the actual graph is not known a priori [1].
2.1.2 Messages and Views.
In Phantom, we oer two mechanisms to share information with
a neighbour agent. The rst one, which we qualify as ‘active‘, take
the form of a Message intentionally sent at a time
𝑡
from one agent
to another. This active way of sharing information ensures that
the information has been consumed by the receiver of the message.
A message is triggered by an event in the system, such as a new
Sampling
A
B
C
D
E
F
A
B
C
D
E
F
A
B
C
0.5
D
0.8
E
0.2
1.0
F
0.6
...
Stochastic Network Definition Network Instances
Figure 1: Overview of the Stochastic Network where each
edge is associated with a probability of connecting two
agents. The network can be sampled during the experiment
and yields dierent structures of the relationships between
the agents. This type of network can be used to model
complex system with dynamic relationships.
time step or the reception of another message. The emission of a
new message is often associated with the decision making process
choosing the information the agent wants to share.
On the other hand, the second mechanism to share information
is referred to as ‘passive‘. The agent simply exposes specic infor-
mation for others to consume but does not actively send it. We use
Views to encapsulate the data to be shared. Each agent generates a
customized view for each of its neighbors with only the information
required. The views are regularly updated but no notication is
sent to the other agents. It is entirely up to them to decide if and
when they consume that information. The collection of views from
all the neighbors of a given agent represents the context of that
agent at a given time
𝑡
and can be used to make decisions. Views are
particularly useful when the data exposed changes frequently and
does not necessarily require an action from the other agent; instead
of sending a message for every change the View will be updated
without much processing from the system leading to higher overall
performances.
As opposed to some other frameworks using message buses to
expose an agent’s state to the other participants in the system,
Phantom enforces the communication to only occur through the
edges of the underlying network characterizing the connectivity
between agents. In eect, Messages and Views can only be shared
with an agent’s neighbors when there exists an edge connecting the
two nodes representing the agents. This aspect of the framework,
as well as the ability to have dierent Views for dierent neighbors,
are designed to easily encode the partial observability associated
with many real-world problems. Implementing such a property in a
subscription based model over a message bus would require ad-hoc
validation logic to evaluate whether an agent can subscribe to a
particular topic. The direct use of the underlying network to pass