Preprint Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning

2025-05-02 1 0 5.3MB 17 页 10玖币

侵权投诉

Preprint

Entity Divider with Language Grounding in Multi-Agent

Reinforcement Learning

Ziluo Ding1†Wanpeng Zhang1†Junpeng Yue2Xiangjun Wang3Tiejun Huang1Zongqing Lu1

1Peking University 2Tianjin University 3inspir.ai

Abstract

We investigate the use of natural language to drive the generalization of policies

in multi-agent settings. Unlike single-agent settings, the generalization of policies

should also consider the inﬂuence of other agents. Besides, with the increas-

ing number of entities in multi-agent settings, more agent-entity interactions are

needed for language grounding, and the enormous search space could impede the

learning process. Moreover, given a simple general instruction, e.g., beating all

enemies, agents are required to decompose it into multiple subgoals and ﬁgure out

the right one to focus on. Inspired by previous work, we try to address these is-

sues at the entity level and propose a novel framework for language grounding in

multi-agent reinforcement learning, entity divider (EnDi). EnDi enables agents to

independently learn subgoal division at the entity level and act in the environment

based on the associated entities. The subgoal division is regularized by opponent

modeling to avoid subgoal conﬂicts and promote coordinated strategies. Empir-

ically, EnDi demonstrates the strong generalization ability to unseen games with

new dynamics and expresses the superiority over existing methods.

1 Introduction

The generalization of reinforcement learning (RL) agents to new environments is challenging, even

to environments slightly different from those seen during training (Finn et al.,2017). Recently,

language grounding has been proven to be an effective way to grant RL agents the generalization

ability (Zhong et al.,2019;Hanjie et al.,2021). By relating the dynamics of the environment with

the text manual specifying the environment dynamics at the entity level, the language-based agent

can adapt to new settings with unseen entities or dynamics. In addition, language-based RL provides

a framework for enabling agents to reach user-speciﬁed goal states described by natural language

(K¨

uttler et al.,2020;Tellex et al.,2020;Branavan et al.,2012). Language description can express

abstract goals as sets of constraints on the states and drive generalization.

However, in multi-agent settings, things could be different. First, the policies of others also affect the

dynamics of the environment, leading to non-stationarity. Therefore, the generalization of policies

should also consider the inﬂuence of others. Second, with the increasing number of entities in

multi-agent settings, so is the number of agent-entity interactions needed for language grounding.

The enormous search space could impede the learning process. Third, sometimes it is unrealistic

to give detailed instructions to tell exactly what to do for each agent. On the contrary, a simple

goal instruction, e.g., beating all enemies or collecting all the treasuries, is more convenient and

effective. Therefore, learning subgoal division and cultivating coordinated strategies based on one

single general instruction is required.

The key to generalization in previous works (Zhong et al.,2019;Hanjie et al.,2021) is grounding

language to dynamics at the entity level. By doing so, agents can reason over the dynamic rules

of all the entities in the environment. Since the dynamic of the entity is the basic component of

the dynamics of the environment, such language grounding is invariant to a new distribution of

dynamics or tasks, making the generalization more reliable. Inspired by this, in multi-agent settings,

the inﬂuence of policies of others should also be reﬂected at the entity level for better generalization.

†Equal contribution

arXiv:2210.13942v1 [cs.LG] 25 Oct 2022

Preprint

In addition, after jointly grounding the text manual and the language-based goal (goal instruction) to

environment entities, each entity has been associated with explicit dynamic rules and relationships

with the goal state. Thus, the entities with language grounding can also be utilized to form better

strategies.

We present two goal-based multi-agent environments based on two previous single-agent settings,

i.e., MESSENGER (Hanjie et al.,2021) and RTFM (Zhong et al.,2019), which require generaliza-

tion to new dynamics (i.e., how entities behave), entity references, and partially observable envi-

ronments. Agents are given a document that speciﬁes environment dynamics and a language-based

goal. Note that one goal may contain multiple subgoals. Thus, one single agent may struggle or be

unable to ﬁnish it. In particular, after identifying relevant information in the language descriptions,

agents need to decompose the general goal into many subgoals and ﬁgure out the optimal subgoal di-

vision strategy. Note that we focus on interactive environments that are easily converted to symbolic

representations, instead of raw visual observations, for efﬁciency, interpretability, and emphasis on

abstractions over perception.

In this paper, we propose a novel framework for language grounding in multi-agent reinforcement

learning (MARL), entity divider (EnDi), to enable agents independently learn subgoal division

strategies at the entity level. Speciﬁcally, each EnDi agent ﬁrst generates a language-based rep-

resentation for the environment and decomposes the goal instruction into two subgoals: self and

others. Note that the subgoal can be described at the entity level since language descriptions have

given the explicit relationship between the goal and all entities. Then, the EnDi agent acts in the

environment based on the associated entities of the self subgoal. To consider the inﬂuence of oth-

ers, the EnDi agent has two policy heads. One is to interact with the environment, and another is

for opponent modeling. The EnDi agent is jointly trained end-to-end using reinforcement learning

and supervised learning for two policy heads, respectively. The gradient signal of the supervised

learning from the opponent modeling is used to regularize the subgoal division of others.

Our framework is the ﬁrst attempt to address the challenges of grounding language for generalization

to unseen dynamics in multi-agent settings. EnDi can be instantiated on many existing language

grounding modules and is currently built and evaluated in two multi-agent environments mentioned

above. Empirically, we demonstrate that EnDi outperforms existing language-based methods in all

tasks by a large margin. Importantly, EnDi also expresses the best generalization ability on unseen

games, i.e., zero-shot transfer. By ablation studies, we verify the effectiveness of each component,

and EnDi indeed can obtain coordinated subgoal division strategies by opponent modeling. We also

argue that many language grounding problems can be addressed at the entity level.

2 Related Work

Language grounded policy-learning. Language grounding refers to learning the meaning of nat-

ural language units, e.g., utterances, phrases, or words, by leveraging the non-linguistic context. In

many previous works (Wang et al.,2019;Blukis et al.,2019;Janner et al.,2018;K¨

uttler et al.,2020;

Tellex et al.,2020;Branavan et al.,2012), the text conveys the goal or instruction to the agent, and

the agent produces behaviors in response after the language grounding. Thus, it encourages a strong

connection between the given instruction and the policy.

More recently, many works have extensively explored the generalization from many different per-

spectives. Hill et al. (2020a;2019;2020b) investigated the generalization regarding novel entity

combinations, from synthetic template commands to natural instructions given by humans and the

number of objects. Choi et al. (2021) proposed a language-guided policy learning algorithm, en-

abling learning new tasks quickly with language corrections. In addition, Co-Reyes et al. (2018)

proposed to guide policies by language to generalize on new tasks by meta learning. Huang et al.

(2022) utilized the generalization of large language models to achieve zero-shot planners.

However, all these works may not generalize to a new distribution of dynamics or tasks since they

encourage a strong connection between the given instruction and the policy, not the dynamics of the

environment.

Language grounding to dynamics of environments. A different line of research has focused

on utilizing manuals as auxiliary information to aid generalization. These text manuals provide

Preprint

descriptions of the entities in the environment and their dynamics, e.g., how they interact with other

entities. Agents can ﬁgure out the dynamics of the environment based on the manual.

Narasimhan et al. (2018) explored transfer methods by simply concatenating the text description of

an entity and the entity itself to facilitate policy generalization across tasks. RTFM (Zhong et al.,

2019) builds the codependent representations of text manual and observation of the environment,

denoted as txt2π, based on bidirectional feature-wise linear modulation (FiLM2). A key challenge

in RTFM is multi-modal multi-step reasoning with texts associated with multiple entities. EMMA

(Hanjie et al.,2021) uses an entity-conditioned attention module that allows for selective focus over

relevant descriptions in the text manual for each entity in the environment called MESSENGER.

A key challenge in MESSENGER is the adversarial train-evaluation split without prior entity-text

grounding. Recently, SILG (Zhong et al.,2021) uniﬁes a collection of diverse grounded language

learning environments under a common interface, including RTFM and MESSENGER. All these

works have demonstrated the generalization of policies to a new environment with unseen entities

and text descriptions.

Compared with the previous works, our work moves one step forward, investigating language

grounding at the entity level in multi-agent settings.

Subgoal Assignment. In the goal-based multi-agent setting, in order to complete the goal more

efﬁciently, agents need to coordinate with each other, e.g., making a reasonable subgoal division.

There are many language-free MARL models (Wang et al.,2020;2019;Jeon et al.,2022;Tang et al.,

2018;Yang et al.,2019) focusing on task allocation or behavioral diversity, which exhibit a similar

effect as subgoal assignment.

However, without the help of inherited generalization of natural language, it is unlikely for the

agent to perform well in environments unseen during training, which is supported by many previous

works (Zhong et al.,2019;Hanjie et al.,2021). In addition, it is hard to depict the goal state

sometimes without natural language (Liu et al.,2022), which makes the subgoal assignment even

more challenging.

3 Preliminaries

Our objective is to ground language to environment dynamics and entities for generalization to

unseen multi-agent environments. Note that an entity is an object represented as a symbol in the

observation, and dynamics refer to how entities behave in the environment.

POSG. We model the multi-agent decision-making problem as a partially observable stochastic

game (POSG). For nagents, at each timestep t, agent iobtains its own partial observation oi,t

from the global state st, takes action ai,t following its policy πθ(ai,t|oi,t), and receives a reward

ri,t(st,at)where atdenotes the joint action of all agents. Then the environment transitions to the

next state st+1 given the current state and joint action according to transition probability function

T(st+1|st,at). Agent iaims to maximize the expected return Ri=PT

t=1 γt−1ri,t, where γis the

discount factor and Tis the episode time horizon.

Manual

the essential goal is

held by the sphere.

Path 1 Path 2

Item

Agent Target

Monster

Distractor

Monster Wall Path 1 Path 2

Wiki

mysterious, soldiers beat cold.

arcane, gleaming beat fire.

fanatical, shimmering beat lightning.

blessed, grandmasters beat poison.

ghost, imp are order of the forest.

jaguar, wolf are rebel enclave.

bat, zombie are star alliance.

defeat the order of the forest.

Task

Agent Goal Message Enemy

the deadly enemy is

carried by the mage.

the blade possesses

the secret directive.

grandmasters

kanata

poison

ghost

shimmering

sword

lightning

imp

beta

alpha

gleaming

spear

fire

zombie

Figure 1: Illustrations of two modiﬁed multi-agent environments, i.e., multi-agent RTFM (left) and

multi-agent MESSENGER (right).

Preprint

Observation:

Text Manual and Language Goal:

Ground 1Ground 1 Ground 1Ground 1

Conv Nets

Gumbel–Softmax

Ground 2Ground 2 Ground 2Ground 2

FC Layer

Concatenate

m−1

z1: The red monsters are enemies.

z2: The white boys are moving.

Goal: The goal is to win the war.

supsup

Forward Back

Propagation Element-Wise

Multiplication

self

policy

i−

,gz

i−

,gz

i−

RLRL

,gz

Self

Agent Others'

Agent

self

goal

others

goal

others

policy

others

self

Figure 2: Overview architecture of EnDi. EnDi has two policy heads. One is to interact with the

environment, and another is for opponent modeling. The gradients from supervised learning and

reinforcement learning jointly inﬂuence the formation of entity masks.

Multi-Agent RTFM is extended from Zhong et al. (2019), where for each task all agents are given

the same text information based on collected human-written natural language templates, including

a document of environment dynamics and an underspeciﬁed goal. Figure 1illustrates an instance

of the game. Concretely, the dynamics consist of monsters (e.g., wolf, goblin), teams (e.g., or-

der of the forest), element types (e.g., ﬁre, poison), item modiﬁers (e.g., fanatical, arcane), and

items (e.g., sword, hammer). When the agents encounter a weapon/monster, they can pick up the

weapon/engage in combat with the monster. Moreover, a monster moves towards the nearest ob-

servable agent with 60% probability, otherwise moves randomly. The general goal is to eliminate

all monsters in a given team with the appropriate weapons. The game environment is rendered as a

matrix of texts where each grid describes the entity occupying the grid.

Multi-Agent MESSENGER is built on Hanjie et al. (2021). For each task, all agents are provided

with the same text manual. The manual contains descriptions of the entities, the dynamics, and the

goal, obtained through crowdsourced human writers. In addition, each entity can take one of three

roles: an enemy, a message, or a target. There are three possible movement types for each entity,

i.e., stationary, chasing, or ﬂeeing. Agents are required to bring all the messages to the targets while

avoiding enemies. If agents touch an enemy in the game or reach the target without ﬁrst obtaining

the message, they lose the game. Unlike RTFM, the game environment is rendered as a matrix of

symbols without any prior mapping between the entity symbols and their descriptions.

4 Methodology

In goal-based MARL, it is important for agents to coordinate with each other. Otherwise, subgoal

conﬂicts (multiple agents focusing on the same subgoal) can impede the completion of the general

goal. To this end, we introduce EnDi for language grounding in MARL to enable agents indepen-

dently learn the subgoal division at the entity level for better generalization. For each task, agents

are given a text manual z∈Zdescribing the environment dynamics and a language-based goal

g∈Gas language knowledge. Apart from this, at each timestep t, each agent iobtains a h×w

grid observation oi,t and outputs a distribution over the action π(ai,t|oi,t, z, g). Note that we omit

the subscript tin the following for convenience.

4.1 Overall Framework

Language Grounding Module. Given a series of language knowledge, the agent ﬁrst grounds the

language to the observation of the environment, adopting the existing language grounding module

to generate a language-based representation X=Ground(o, z, g)∈Rh×w×d,. It captures the

relationship between the goal, the manual, and observation. Thus, agents are likely to understand

the environment dynamics instead of memorizing any particular information, which is veriﬁed by

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PreprintEntityDividerwithLanguageGroundinginMulti-AgentReinforcementLearningZiluoDing1yWanpengZhang1yJunpengYue2XiangjunWang3TiejunHuang1ZongqingLu11PekingUniversity2TianjinUniversity3inspir.aiAbstractWeinvestigatetheuseofnaturallanguagetodrivethegeneralizationofpoliciesinmulti-agentsettings.Unlikes...

展开>> 收起<<

Preprint Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Preprint Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: