Preprint Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning

2025-05-02 0 0 5.3MB 17 页 10玖币
侵权投诉
Preprint
Entity Divider with Language Grounding in Multi-Agent
Reinforcement Learning
Ziluo Ding1Wanpeng Zhang1Junpeng Yue2Xiangjun Wang3Tiejun Huang1Zongqing Lu1
1Peking University 2Tianjin University 3inspir.ai
Abstract
We investigate the use of natural language to drive the generalization of policies
in multi-agent settings. Unlike single-agent settings, the generalization of policies
should also consider the influence of other agents. Besides, with the increas-
ing number of entities in multi-agent settings, more agent-entity interactions are
needed for language grounding, and the enormous search space could impede the
learning process. Moreover, given a simple general instruction, e.g., beating all
enemies, agents are required to decompose it into multiple subgoals and figure out
the right one to focus on. Inspired by previous work, we try to address these is-
sues at the entity level and propose a novel framework for language grounding in
multi-agent reinforcement learning, entity divider (EnDi). EnDi enables agents to
independently learn subgoal division at the entity level and act in the environment
based on the associated entities. The subgoal division is regularized by opponent
modeling to avoid subgoal conflicts and promote coordinated strategies. Empir-
ically, EnDi demonstrates the strong generalization ability to unseen games with
new dynamics and expresses the superiority over existing methods.
1 Introduction
The generalization of reinforcement learning (RL) agents to new environments is challenging, even
to environments slightly different from those seen during training (Finn et al.,2017). Recently,
language grounding has been proven to be an effective way to grant RL agents the generalization
ability (Zhong et al.,2019;Hanjie et al.,2021). By relating the dynamics of the environment with
the text manual specifying the environment dynamics at the entity level, the language-based agent
can adapt to new settings with unseen entities or dynamics. In addition, language-based RL provides
a framework for enabling agents to reach user-specified goal states described by natural language
(K¨
uttler et al.,2020;Tellex et al.,2020;Branavan et al.,2012). Language description can express
abstract goals as sets of constraints on the states and drive generalization.
However, in multi-agent settings, things could be different. First, the policies of others also affect the
dynamics of the environment, leading to non-stationarity. Therefore, the generalization of policies
should also consider the influence of others. Second, with the increasing number of entities in
multi-agent settings, so is the number of agent-entity interactions needed for language grounding.
The enormous search space could impede the learning process. Third, sometimes it is unrealistic
to give detailed instructions to tell exactly what to do for each agent. On the contrary, a simple
goal instruction, e.g., beating all enemies or collecting all the treasuries, is more convenient and
effective. Therefore, learning subgoal division and cultivating coordinated strategies based on one
single general instruction is required.
The key to generalization in previous works (Zhong et al.,2019;Hanjie et al.,2021) is grounding
language to dynamics at the entity level. By doing so, agents can reason over the dynamic rules
of all the entities in the environment. Since the dynamic of the entity is the basic component of
the dynamics of the environment, such language grounding is invariant to a new distribution of
dynamics or tasks, making the generalization more reliable. Inspired by this, in multi-agent settings,
the influence of policies of others should also be reflected at the entity level for better generalization.
Equal contribution
1
arXiv:2210.13942v1 [cs.LG] 25 Oct 2022
Preprint
In addition, after jointly grounding the text manual and the language-based goal (goal instruction) to
environment entities, each entity has been associated with explicit dynamic rules and relationships
with the goal state. Thus, the entities with language grounding can also be utilized to form better
strategies.
We present two goal-based multi-agent environments based on two previous single-agent settings,
i.e., MESSENGER (Hanjie et al.,2021) and RTFM (Zhong et al.,2019), which require generaliza-
tion to new dynamics (i.e., how entities behave), entity references, and partially observable envi-
ronments. Agents are given a document that specifies environment dynamics and a language-based
goal. Note that one goal may contain multiple subgoals. Thus, one single agent may struggle or be
unable to finish it. In particular, after identifying relevant information in the language descriptions,
agents need to decompose the general goal into many subgoals and figure out the optimal subgoal di-
vision strategy. Note that we focus on interactive environments that are easily converted to symbolic
representations, instead of raw visual observations, for efficiency, interpretability, and emphasis on
abstractions over perception.
In this paper, we propose a novel framework for language grounding in multi-agent reinforcement
learning (MARL), entity divider (EnDi), to enable agents independently learn subgoal division
strategies at the entity level. Specifically, each EnDi agent first generates a language-based rep-
resentation for the environment and decomposes the goal instruction into two subgoals: self and
others. Note that the subgoal can be described at the entity level since language descriptions have
given the explicit relationship between the goal and all entities. Then, the EnDi agent acts in the
environment based on the associated entities of the self subgoal. To consider the influence of oth-
ers, the EnDi agent has two policy heads. One is to interact with the environment, and another is
for opponent modeling. The EnDi agent is jointly trained end-to-end using reinforcement learning
and supervised learning for two policy heads, respectively. The gradient signal of the supervised
learning from the opponent modeling is used to regularize the subgoal division of others.
Our framework is the first attempt to address the challenges of grounding language for generalization
to unseen dynamics in multi-agent settings. EnDi can be instantiated on many existing language
grounding modules and is currently built and evaluated in two multi-agent environments mentioned
above. Empirically, we demonstrate that EnDi outperforms existing language-based methods in all
tasks by a large margin. Importantly, EnDi also expresses the best generalization ability on unseen
games, i.e., zero-shot transfer. By ablation studies, we verify the effectiveness of each component,
and EnDi indeed can obtain coordinated subgoal division strategies by opponent modeling. We also
argue that many language grounding problems can be addressed at the entity level.
2 Related Work
Language grounded policy-learning. Language grounding refers to learning the meaning of nat-
ural language units, e.g., utterances, phrases, or words, by leveraging the non-linguistic context. In
many previous works (Wang et al.,2019;Blukis et al.,2019;Janner et al.,2018;K¨
uttler et al.,2020;
Tellex et al.,2020;Branavan et al.,2012), the text conveys the goal or instruction to the agent, and
the agent produces behaviors in response after the language grounding. Thus, it encourages a strong
connection between the given instruction and the policy.
More recently, many works have extensively explored the generalization from many different per-
spectives. Hill et al. (2020a;2019;2020b) investigated the generalization regarding novel entity
combinations, from synthetic template commands to natural instructions given by humans and the
number of objects. Choi et al. (2021) proposed a language-guided policy learning algorithm, en-
abling learning new tasks quickly with language corrections. In addition, Co-Reyes et al. (2018)
proposed to guide policies by language to generalize on new tasks by meta learning. Huang et al.
(2022) utilized the generalization of large language models to achieve zero-shot planners.
However, all these works may not generalize to a new distribution of dynamics or tasks since they
encourage a strong connection between the given instruction and the policy, not the dynamics of the
environment.
Language grounding to dynamics of environments. A different line of research has focused
on utilizing manuals as auxiliary information to aid generalization. These text manuals provide
2
Preprint
descriptions of the entities in the environment and their dynamics, e.g., how they interact with other
entities. Agents can figure out the dynamics of the environment based on the manual.
Narasimhan et al. (2018) explored transfer methods by simply concatenating the text description of
an entity and the entity itself to facilitate policy generalization across tasks. RTFM (Zhong et al.,
2019) builds the codependent representations of text manual and observation of the environment,
denoted as txt2π, based on bidirectional feature-wise linear modulation (FiLM2). A key challenge
in RTFM is multi-modal multi-step reasoning with texts associated with multiple entities. EMMA
(Hanjie et al.,2021) uses an entity-conditioned attention module that allows for selective focus over
relevant descriptions in the text manual for each entity in the environment called MESSENGER.
A key challenge in MESSENGER is the adversarial train-evaluation split without prior entity-text
grounding. Recently, SILG (Zhong et al.,2021) unifies a collection of diverse grounded language
learning environments under a common interface, including RTFM and MESSENGER. All these
works have demonstrated the generalization of policies to a new environment with unseen entities
and text descriptions.
Compared with the previous works, our work moves one step forward, investigating language
grounding at the entity level in multi-agent settings.
Subgoal Assignment. In the goal-based multi-agent setting, in order to complete the goal more
efficiently, agents need to coordinate with each other, e.g., making a reasonable subgoal division.
There are many language-free MARL models (Wang et al.,2020;2019;Jeon et al.,2022;Tang et al.,
2018;Yang et al.,2019) focusing on task allocation or behavioral diversity, which exhibit a similar
effect as subgoal assignment.
However, without the help of inherited generalization of natural language, it is unlikely for the
agent to perform well in environments unseen during training, which is supported by many previous
works (Zhong et al.,2019;Hanjie et al.,2021). In addition, it is hard to depict the goal state
sometimes without natural language (Liu et al.,2022), which makes the subgoal assignment even
more challenging.
3 Preliminaries
Our objective is to ground language to environment dynamics and entities for generalization to
unseen multi-agent environments. Note that an entity is an object represented as a symbol in the
observation, and dynamics refer to how entities behave in the environment.
POSG. We model the multi-agent decision-making problem as a partially observable stochastic
game (POSG). For nagents, at each timestep t, agent iobtains its own partial observation oi,t
from the global state st, takes action ai,t following its policy πθ(ai,t|oi,t), and receives a reward
ri,t(st,at)where atdenotes the joint action of all agents. Then the environment transitions to the
next state st+1 given the current state and joint action according to transition probability function
T(st+1|st,at). Agent iaims to maximize the expected return Ri=PT
t=1 γt1ri,t, where γis the
discount factor and Tis the episode time horizon.
Manual
the essential goal is
held by the sphere.
Path 1 Path 2
Item
Agent Target
Monster
Distractor
Monster Wall Path 1 Path 2
Wiki
mysterious, soldiers beat cold.
arcane, gleaming beat fire.
fanatical, shimmering beat lightning.
blessed, grandmasters beat poison.
ghost, imp are order of the forest.
jaguar, wolf are rebel enclave.
bat, zombie are star alliance.
defeat the order of the forest.
Task
Agent Goal Message Enemy
the deadly enemy is
carried by the mage.
the blade possesses
the secret directive.
grandmasters
kanata
poison
ghost
shimmering
sword
lightning
imp
beta
alpha
gleaming
spear
fire
zombie
Figure 1: Illustrations of two modified multi-agent environments, i.e., multi-agent RTFM (left) and
multi-agent MESSENGER (right).
3
Preprint
Observation:
Text Manual and Language Goal:
Ground 1Ground 1 Ground 1Ground 1
CC
Conv Nets
GumbelSoftmax
Ground 2Ground 2 Ground 2Ground 2
FC Layer
FC Layer
Concatenate
i
m1
z1: The red monsters are enemies.
z2: The white boys are moving.
Goal: The goal is to win the war.
supsup
Forward Back
Propagation Element-Wise
Multiplication
self
policy
X
i
O
i
l
i
l
i
O
i
l
,gz
,gz
i
O
i
l
,gz
i
O
i
O
i
a
i
a
RLRL
,gz
,gz
Self
Agent Others'
Agent
self
goal
X
others
goal
X
others
policy
X
others
π
self
π
Figure 2: Overview architecture of EnDi. EnDi has two policy heads. One is to interact with the
environment, and another is for opponent modeling. The gradients from supervised learning and
reinforcement learning jointly influence the formation of entity masks.
Multi-Agent RTFM is extended from Zhong et al. (2019), where for each task all agents are given
the same text information based on collected human-written natural language templates, including
a document of environment dynamics and an underspecified goal. Figure 1illustrates an instance
of the game. Concretely, the dynamics consist of monsters (e.g., wolf, goblin), teams (e.g., or-
der of the forest), element types (e.g., fire, poison), item modifiers (e.g., fanatical, arcane), and
items (e.g., sword, hammer). When the agents encounter a weapon/monster, they can pick up the
weapon/engage in combat with the monster. Moreover, a monster moves towards the nearest ob-
servable agent with 60% probability, otherwise moves randomly. The general goal is to eliminate
all monsters in a given team with the appropriate weapons. The game environment is rendered as a
matrix of texts where each grid describes the entity occupying the grid.
Multi-Agent MESSENGER is built on Hanjie et al. (2021). For each task, all agents are provided
with the same text manual. The manual contains descriptions of the entities, the dynamics, and the
goal, obtained through crowdsourced human writers. In addition, each entity can take one of three
roles: an enemy, a message, or a target. There are three possible movement types for each entity,
i.e., stationary, chasing, or fleeing. Agents are required to bring all the messages to the targets while
avoiding enemies. If agents touch an enemy in the game or reach the target without first obtaining
the message, they lose the game. Unlike RTFM, the game environment is rendered as a matrix of
symbols without any prior mapping between the entity symbols and their descriptions.
4 Methodology
In goal-based MARL, it is important for agents to coordinate with each other. Otherwise, subgoal
conflicts (multiple agents focusing on the same subgoal) can impede the completion of the general
goal. To this end, we introduce EnDi for language grounding in MARL to enable agents indepen-
dently learn the subgoal division at the entity level for better generalization. For each task, agents
are given a text manual zZdescribing the environment dynamics and a language-based goal
gGas language knowledge. Apart from this, at each timestep t, each agent iobtains a h×w
grid observation oi,t and outputs a distribution over the action π(ai,t|oi,t, z, g). Note that we omit
the subscript tin the following for convenience.
4.1 Overall Framework
Language Grounding Module. Given a series of language knowledge, the agent first grounds the
language to the observation of the environment, adopting the existing language grounding module
to generate a language-based representation X=Ground(o, z, g)Rh×w×d,. It captures the
relationship between the goal, the manual, and observation. Thus, agents are likely to understand
the environment dynamics instead of memorizing any particular information, which is verified by
4
摘要:

PreprintEntityDividerwithLanguageGroundinginMulti-AgentReinforcementLearningZiluoDing1yWanpengZhang1yJunpengYue2XiangjunWang3TiejunHuang1ZongqingLu11PekingUniversity2TianjinUniversity3inspir.aiAbstractWeinvestigatetheuseofnaturallanguagetodrivethegeneralizationofpoliciesinmulti-agentsettings.Unlikes...

展开>> 收起<<
Preprint Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:5.3MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注