Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers Wanjun Zhong1 Tingting Ma2 Jiahai Wang1 Jian Yin1

2025-05-03 0 0 634.85KB 12 页 10玖币
侵权投诉
Disentangling Reasoning Capabilities from Language Models with
Compositional Reasoning Transformers
Wanjun Zhong1
, Tingting Ma2, Jiahai Wang1, Jian Yin1,
Tiejun Zhao2,Chin-Yew Lin3and Nan Duan3
1Sun Yat-sen University 2Harbin Institute of Technology
3Microsoft Research Asia
zhongwj25@mail2.sysu.edu.cn, hittingtingma@gmail.com
{wangjiah,issjyin}@mail.sysu.edu.cn, tjzhao@hit.edu.cn
{cyl, nanduan}@microsoft.com;
Abstract
This paper presents ReasonFormer, a unified
reasoning framework for mirroring the modu-
lar and compositional reasoning process of hu-
mans in complex decision-making. Inspired
by dual-process theory in cognitive science,
the representation module (automatic think-
ing) and reasoning modules (controlled think-
ing) are decoupled to capture different levels
of cognition. Upon the top of the represen-
tation module, the pre-trained reasoning mod-
ules are modular and professional in specific
and fundamental reasoning skills (e.g., logic,
simple QA, etc). To mimic the controlled
compositional thinking process, different rea-
soning modules are dynamically activated and
composed in both parallel and cascaded man-
ners to control what reasoning skills are acti-
vated and how deep the reasoning process will
be reached to solve the current problems. The
unified reasoning framework solves multiple
tasks with a single model, and is trained and in-
ferred in an end-to-end manner. Evaluated on
11 datasets requiring different reasoning skills
and complexity, ReasonFormer demonstrates
substantial performance boosts, revealing the
compositional reasoning ability. Few-shot ex-
periments exhibit better generalization ability
by learning to compose pre-trained skills for
new tasks with limited data, and decoupling
the representation module and the reasoning
modules. Further analysis shows the modular-
ity of reasoning modules as different tasks ac-
tivate distinct reasoning skills at different rea-
soning depths.
1 Introduction
Prevailing language models (LMs) (Devlin et al.,
2018;Brown et al.,2020) demonstrate impressive
performance in natural language processing tasks,
and have ushered in a new trend in AI research. De-
spite the emerging fervor, the homogeneous LMs
Equal contributions during internship at Microsoft Re-
search Asia.
Question: What cause car accident?
Semantic Understanding
(Intuitive) System 1
(Controlled) System 2
Step 1: Memorizing Fact Knowledge
Driving relates to {speed, attention, rule following}
Alcohol hurts attention
Step 2: Logical Deduction
alcohol affect attention driving accident
Step 3: Answering Question
alcohol, over-speeding, distraction …
Figure 1: Compositional reasoning process of humans
in complex decision-making. Humans solve the prob-
lems by cascaded executions of fundamental skills.
relying on a single call of the model are less mod-
ular and are hard to explicitly model the complex
reasoning process (Helwe et al.,2021) like humans.
In the dual-process theory (Daniel,2017) in cog-
nitive psychology, there are two cognitive systems
interacted to form a whole reasoning process. Sys-
tem 1 (automatic thinking) generates intuitive pat-
terns of ideas, and System 2 (controlled thinking)
constructs reasoning in an orderly logical series
of compositional reasoning processes. Besides, in
the process of System 2, different functional brain
areas could be modular and interact with each other.
System 2 can decide how to compose different rea-
soning skills and when to stop thinking. As the
example shown in Fig. 1, when finding the cause
of a car accident, humans intuitively comprehend
the question (System 1), and then conduct com-
positional reasoning (System 2: recalling fact
logical deduction answering question).
We would like to incorporate this mechanism
into AI models in decision-making, and make the
following assumptions: (1) the representation mod-
ule (System 1) and reasoning module (System 2)
arXiv:2210.11265v2 [cs.CL] 7 Dec 2022
can be decoupled and (2) the “complicated" rea-
soning process can be disentangled into multi-step
executions of compositional “fundamental" reason-
ing modules, whose compositionality can be learnt
with limited data. Also, the “fundamental" nature
of basic reasoning skills allows them to have rich
training instances for reliable skill pre-training.
Under these motivations, this paper proposes the
modular and compositional reasoning framework -
ReasonFormer
, to mirror human’s compositional
reasoning process, with the following characteris-
tics: (1) the representation module and reasoning
modules are decoupled; (2) reasoning modules are
modular and professional in fundamental reason-
ing skills; (3) reasoning modules are compositional
in parallel and cascaded manner, to dynamically
decide the activated reasoning skills and the reason-
ing complexity; (4) the general-purpose reasoning
framework is end-to-end and unified in solving
multiple tasks with one model.
Specifically, the representation module learns
contextual representations of problems. Upon the
top of the it, there are cascaded reasoning mod-
ules to perform compositional multi-step reasoning.
The reasoning modules are pre-trained to expert
in specific reasoning skills (e.g., logic, QA, fact,
etc.). These pre-trained reasoning skills are con-
sidered relatively fundamental and have rich re-
sources. Two additional blocks complete the whole
framework: the reasoning router and the reason-
ing adapter. The reasoning router decides which
reasoning skills are activated in each reasoning
step, and when to stop the reasoning process. The
adapter adapts the reused reasoning modules to
different steps of the reasoning process.
We comprehensively evaluate the framework on
11 datasets emphasizing different reasoning skills
and complexity, and highlight the following find-
ings: (1) Substantial performance boosts demon-
strate models’ harvest of compositional reasoning
ability, and both the reasoning-centric pre-training
and reasoning adapter bring compounding perfor-
mance gains. (2) Results of few-shot experiments
show that specialized modules enables better gener-
alization by learning to compose pre-trained skills
for low-resource tasks, and decoupling of repre-
sentation module and reasoning modules. (3) Fur-
ther analysis reveals the distinct reasoning skills
required for different tasks at different reasoning
depths, shoring up the modularity of reasoning
modules.
2 Reasoning Skills Formulation
The compositional reasoning process of LMs’ re-
lies on the pre-training of several fundamental rea-
soning skills and their compositionality. Hence, the
selection of skills is critical.
Selection Principles.
There are two major prin-
ciples in selecting skills: (1)
Fundamental
: Com-
plex problems can be decomposed and solved by
simpler basic skills. So the basic skills should be
more fundamental, well-defined, and can be cov-
ered in the required skill set of as many tasks as pos-
sible; (2)
Resourceful
: Reliable skill pre-training
requires large-scale pre-training data. However, in
the real-world scenario, the annotated data is ex-
pensive to obtain for most reasoning tasks. So it is
expected that there are already rich resource or data
can be collected via self(semi)-supervised manner.
Basic Skills Selection.
Humans always solve
complex problem with fundamental skills, like un-
derstanding key information (e.g., entity and its
type) of events, recalling related facts, understand-
ing causal relations between events, and extracting
answers for the question. This motivates us to se-
lect the following basic skills: the
logic ability
to logically deduce the cause or consequence of
events;
simple question answering (QA)
to un-
derstand the context and answer simple questions;
named entity recognition (NER)
to identify im-
portant entities in the context;
natural language
inference (NLI)
to identify semantic relevance of
two sentences and
factual knowledge
to memo-
rize commonsense knowledge and understand daily
events. There is an additional
general
skill to learn
the commonly shared knowledge across selected
skills. We keep this setting in our paper as they are
relatively well defined and resourceful 1.
We adopt self-supervised methods to construct
pre-training corpus for {logic ability,factual knowl-
edge,NER}, semi-supervised method to construct
pre-training corpus for simple QA, and large-scale
supervised data for NLI. Further details are given
i4.2 and examples are given in Appendix A.
3ReasonFormer Framework
As shown in Fig. 2, the general-purpose reason-
ing framework is built based on encoder-decoder
1
It is worth noting that this selection is tentative. There
are plausible ways for selecting basic skills or knowledge
domains, which also inspire future directions.
𝑆𝑘𝑖𝑙𝑙 𝑅𝑜𝑢𝑡𝑒𝑟!+𝑆𝑡𝑜𝑝 𝐺𝑎𝑡𝑒!
Compositional Reasoning Modules (System 2)
Iterative cascaded reasoning at the 𝒊𝒕𝒉 𝒔𝒕𝒆𝒑
Question:
What cause car
accident? Please
give the answer:
Encoder
Input Decoder Output
Representation
Module
(System 1)
Reasoning
Modules
(System 2)
Transformer
Layer
𝑛×
Distraction
Alcohol
𝑂𝑢𝑡𝑝𝑢𝑡
𝑅$%&'
𝑅() 𝑅*+,!&
𝑅-./.0%* LN
MHA
Adapter
Adapter
FFN
+
LN
+
RM Layer w/ Adapter
Figure 2: ReasonFormer framework. The representation module (§ 3.1) and reasoning modules (RMs) (§ 3.2) are
decoupled to form the compositional reasoning process. The RMs are pre-trained with different reasoning skills
Rskill 2). The reasoning adapter (§ 3.2.1) adapts the shared RMs to different reasoning steps. Router decides
activated skills. Stop gate decides when to stop reasoning (§ 3.2.2). Red lines indicate cascaded reasoning process.
architecture to process multiple tasks (i.e., all pre-
training tasks and downstream tasks) with a unified
model, where all tasks are tackled as unified text-to-
text generation tasks. We first reformat all the tasks
into the same format using hard prompts (Sanh
et al.,2021). For example, the question-answering
task input can be prompted with the template: The
question is {Question}. Please give the answer:",
and the expected output is the answer text.
Given the prompted task inputs, the modular and
compositional framework consists of two compo-
nents in its encoder: the representation module
(System 1) and the reasoning modules (System 2).
The
representation module
3.1) captures the
intuitive understanding of problems by calculating
initial contextual representations. Upon the top of
the representation module, there are several pre-
trained
reasoning modules
3.2) with different
reasoning skills, waiting for interaction to form a
compositional reasoning process. For reasoning
process organization, there are
reasoning routers
3.2.2) to decide the (parallel) activated skills and
when to stop the (cascaded) reasoning process.
3.1 Representation Module
Similar to the perceptive function of System 1,
the representation module targets basic contex-
tual understanding, and builds the foundation of
the following-up reasoning process. As LMs ex-
hibit impressive ability on contextual understand-
ing, we build the representation module with cas-
caded Transformer layers. Given the tokenized
input
X
with length
m
, the initial representations
learnt from representation module are denoted as:
H0={h0
[CLS],h0
1,h0
2..., h0
m}(1)
where [CLS] is a special token.
3.2 Reasoning Modules
To simulate the cognitive process (System 2)
formed by controlled interaction between various
functional areas in human brains, the reasoning
modules are modular and compositional. Reason-
ing modules (
RMs
) learn different reasoning skills
specified during pre-training, and are automatically
composed during downstream adaptation (§ 3.3)
with reasoning router (§ 3.2.2). Compositionality
is not only at the parallel level (different skills), but
also at the cascaded level (multi-step reasoning)
Since different reasoning steps intuitively model
different levels of information, there are additional
reasoning adapters
to adapt the reused modules
to different reasoning steps.
3.2.1 Reasoning Modules Architecture
Each reasoning module is implemented by sev-
eral Transformer layers. As shown in Fig.2(b),
摘要:

DisentanglingReasoningCapabilitiesfromLanguageModelswithCompositionalReasoningTransformersWanjunZhong1,TingtingMa2,JiahaiWang1,JianYin1,TiejunZhao2,Chin-YewLin3andNanDuan31SunYat-senUniversity2HarbinInstituteofTechnology3MicrosoftResearchAsiazhongwj25@mail2.sysu.edu.cn,hittingtingma@gmail.com{wang...

展开>> 收起<<
Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers Wanjun Zhong1 Tingting Ma2 Jiahai Wang1 Jian Yin1.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:634.85KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注