Multi-View Reasoning Consistent Contrastive Learning for Math Word Problem Wenqi Zhang1 Yongliang Shen1 Yanna Ma2 Xiaoxia Cheng1

2025-05-02 0 0 1.81MB 15 页 10玖币
侵权投诉
Multi-View Reasoning:
Consistent Contrastive Learning for Math Word Problem
Wenqi Zhang1, Yongliang Shen1, Yanna Ma2, Xiaoxia Cheng1,
Zeqi Tan1, Qingpeng Nong3, Weiming Lu1
1College of Computer Science and Technology, Zhejiang University
2University of Shanghai for Science and Technology
3Zhongxing Telecommunication Equipment Corporationy
{zhangwenqi, luwm}@zju.edu.cn
Abstract
Math word problem solver requires both pre-
cise relation reasoning about quantities in the
text and reliable generation for the diverse equa-
tion. Current sequence-to-tree or relation ex-
traction methods regard this only from a fixed
view, struggling to simultaneously handle com-
plex semantics and diverse equations. However,
human solving naturally involves two consis-
tent reasoning views: top-down and bottom-up,
just as math equations also can be expressed in
multiple equivalent forms: pre-order and post-
order. We propose a multi-view consistent con-
trastive learning for a more complete semantics-
to-equation mapping. The entire process is de-
coupled into two independent but consistent
views: top-down decomposition and bottom-
up construction, and the two reasoning views
are aligned in multi-granularity for consistency,
enhancing global generation and precise reason-
ing. Experiments on multiple datasets across
two languages show our approach significantly
outperforms the existing baselines, especially
on complex problems
1
. We also show after
consistent alignment, multi-view can absorb
the merits of both views and generate more di-
verse results consistent with the mathematical
laws.
1 Introduction
Math word problem (MWP) is a very signifi-
cant and challenging task with a wide range of
applications in both natural language processing
and general artificial intelligence (Bobrow,1964).
The MWP is to predict the mathematical equa-
tion and the final answer based on a natural lan-
guage description of the scenario and a math prob-
lem. It requires mathematical reasoning over the
text (Mukherjee and Garain,2008), which is very
challenging for conventional methods (Patel et al.,
Corresponding author.
1
Our source code and data are open
sourced at
https://github.com/zwq2018/
Multi-view-Consistency-for-MWP
×
+
23
4
5
Top-down Reasoning
1
2
3
4
5
7
6
Bottom-up Reasoning
2345
(2, 3 +)
(Exp1, 4 ×)
(Exp2, 5 )
1 2
3
4
5
6
7
Latent Space
h3
h3
h2
h2
Question:Xiao Ming and Zhang work in the orchard to pick fruit, Xiao
Ming pick 2fruits per minute, Zhang pick 3fruits per minute, they
worked for 4minutes, and then ate 5fruits, how many fruits are left
Answer:15 Expr:(2 + 3) ×4 5
Multi-view
Post-order Traversal
(2 3 + 4 ×5)
Pre-order Traversal
(×+ 2 3 4 5 )
Mid-order
(2 + 3) ×4 - 5
1 3 4 52 76
Multi-order
1 3 4 52 76
Consistent
Exp3
Exp2
Exp2
Exp3
Remaining
fruits
Total
pick rate
Total
fruits
Total
pick rate
Total
fruits
Remaining
fruits
Total
pick rate
Total
fruits
Remaining
fruits
Total
pick rate
Total
fruits
Remaining
fruits
h1
h1
Exp1
Exp1
Figure 1: Human solving has multiple reasoning views,
and math equation also can be expressed in multi-order.
Pre-order traversal can be seen as a top-down reason-
ing view. Post-order traversal corresponds exactly to
the bottom-up reasoning view. Consistent contrastive
learning aligns two views in the same latent space.
2021).
MWP tasks have attracted a great deal of re-
search attention. In the early days, MWP was
treated as a sequence-to-sequence (seq2seq) trans-
lation task, translating human language into mathe-
matical language (Wang et al.,2017,2019). Then,
Xie and Sun (2019); Zhang et al. (2020); Faldu et al.
(2021) proposed that tree or graph structure was
more suitable for MWP. Those generation meth-
ods (Seq2Tree and Graph2Tree) further improved
generation capabilities through a specific structure.
Although very flexible in generating complex equa-
tion combinations, the fixed structure decoder also
limits its fine-grained mapping. Recently, Cao et al.
(2021); Jie et al. (2022) introduced an iterative rela-
tion extraction approach, providing a new solving
view for MWP. It performs well at capturing local
relations, but lacks global generation capabilities,
especially for complex mathematical problems.
From the seq2seq translation to the seq2tree gen-
eration and relation extraction, those are essentially
seeking a suitable solving view for MWP. However,
arXiv:2210.11694v2 [cs.CL] 26 Aug 2023
MWP is more challenging than that as it requires
both precise relation reasoning about quantities and
reliable generation for diverse equation combina-
tions. Both are necessary for mathematical reason-
ing. Existing methods all consider the MWP from
a single view and thus bring certain limitations.
We argue that multiple views are required
to comprehensively solve the MWP. As shown
in Figure 1, the process of human solving in-
herently involves multiple reasoning views, i.e.,
top-down decomposition (
remaining fruits
total fruits ×
pick rate +
), and bottom-up
construction (
+
pick rate ×
total fruits
remaining fruits
). Two reasoning views
are reversed in the process but consistent
in results. Meanwhile, mathematical equa-
tion can be expressed in multi-order traversal,
i.e., pre-order (
,×,+,2,3,4,5
) and post-order
(
2,3,+,4,×,5,
). Two sequences are quite dis-
similar in form but equivalent in logic. Two order
traversal equation corresponds exactly to the two
reasoning processes, i.e. , the pre-order equation
is a top-down reasoning view, while the post-order
can be seen as a bottom-up reasoning view.
Inspired by this, we design multi-view reason-
ing using multi-order traversal. The MWP solv-
ing is decoupled into two independent but consis-
tent views: top-down reasoning using pre-order
traversal to decompose problem from global to lo-
cal and a bottom-up process following post-order
traversal for relation construction from local to
global. Pre-order and post-order traversals should
be equivalent in math just as top-down decompo-
sition and bottom-up construction should be con-
sistent. In Figure 1, we add multi-granularity con-
trastive learning to align the intermediate expres-
sions generated by two views in the same latent
space. Through consistent alignment, two views
constrain each other and jointly learn a accurate
and complete representation for math reasoning.
Besides, math operator must conform to mathe-
matical laws (e.g., commutative law). We devise a
knowledge-enhanced augmentation to incorporate
mathematical rules into the learning process, pro-
moting multi-view reasoning more consistent with
mathematical rules.
Our contributions are threefold:
We treat multi-order traversal as a multi-view
reasoning process, which contains a top-down
decomposition using pre-order traversal and
a down-up construction following post-order.
Both views are necessary for MWP.
We introduce consistent contrastive learning
to align two views reasoning processes, fus-
ing flexible global generation and accurate
semantics-to-equation mapping. We also de-
sign an augmentation process for rules injec-
tion and understanding.
Extensive experiments on multiple standard
datasets show our method significantly outper-
forms existing baselines. Our method can also
generate equivalent but non-annotated math
equations, demonstrating reliable reasoning
ability behind our multi-view framework.
2 Related Work
Reliable reasoning is a necessary capability to
move towards general-purpose AI. How to achieve
human-like reasoning has been extensively re-
searched in areas such as natural language process-
ing, reinforcement learning, and robotics (Fu et al.,
2021;Zhang et al.,2021,2022a). In particular,
mathematical reasoning is an important manifesta-
tion of intelligence. Automatically solving mathe-
matical problems has been studied for a long time,
from rule-based methods (Fletcher,1985;Bakman,
2007;Yuhui et al.,2010) with hand-crafted fea-
tures and templates-based methods (Kushman et al.,
2014;Roy and Roth,2018) to deep learning meth-
ods (Wang et al.,2017;Ling et al.,2017) with
the encoder-decoder framework. The introduction
of Transformer (Vaswani et al.,2017) and pre-
trained language models (Devlin et al.,2019;Liu
et al.,2019b) greatly improves the performance
of MWPs. From the perspective of proxy tasks,
we divide the recent works into three categories:
seq2seq-based translation, seq2structure genera-
tion, and iterative relation extraction.
Seq2seq-based translation MWPs are treated
as a translation task, translating human language
into mathematical language (Liang and Zhang,
2021). Wang et al. (2017) proposed a large-scale
dataset Math23K and used the vanilla seq2seq
method (Chiang and Chen,2019). Li et al. (2019)
introduced a group attention mechanism to en-
hance seq2seq method performance. Huang et al.
(2018) used reinforcement learning to optimize
translation task. Huang et al. (2017) incorporated
semantic-parsing methods to solve MWPs. Al-
though seq2seq-based methods have made great
progress in the field, the performance of these meth-
ods is still unsatisfying, since the generation of
mathematical equations requires relation reasoning
over quantities than natural language.
Seq2structure-based generation Liu et al.
(2019a); Xie and Sun (2019) introduced tree-
structured decoder to generate mathematical ex-
pressions. This explicit tree-based design rapidly
dominated the MWPs community. Other re-
searchers have begun to explore reasonable struc-
tures for encoder. Li et al. (2020); Zhang et al.
(2020,2022b) used graph neural networks to ex-
tract effective logical information from the natu-
ral language problem. Liang and Zhang (2021)
adopted the teacher model using contrast learn-
ing to improve the encoder. Several researchers
have attempted to extract multi-level features from
the problems using the hierarchical encoder (Lin
et al.,2021) and pre-trained model (Yu et al.,2021).
Many auxiliary tasks are used to enhance the sym-
bolic reasoning ability (Qin et al.,2021). Wu
et al. (2020,2021) tried to introduce mathemat-
ical knowledge to solve the difficult mathematical
reasoning. These structured generation approaches
show strong generation capabilities towards com-
plex mathematical reasoning tasks.
Iterative relation extraction Recently, some re-
searchers have borrowed ideas from the field of in-
formation extraction (Shen et al.,2021b), and have
designed iterative relation extraction frameworks
for predicting math relations between two numeric
tokens. Kim et al. (2020) designed an expression-
pointer transformer model to predict expression
fragmentation. Cao et al. (2021) introduced a DAG
structure to extract numerical token relation from
bottom to top. Jie et al. (2022) further treated the
MWP task as an iterative relation extraction task,
achieving impressive performance. These works
provide a new perspective to tackle MWP from
a local relation construction view, improving the
fine-grained relation reasoning between quantities.
The above proxy tasks are designed from differ-
ent solving views. The seq2seq is a left-to-right
consecutive view, while seq2tree is a tree view, and
the relation extraction method emphasizes a local
relation view. Unlike these single-view methods,
our approach employs multiple consistent reason-
ing views to address the challenges of MWP.
3 Approach
3.1 Overview
The MWP is to predict the equation
Y
and
the answer based on a problem description
T={w1, w2· · · wn}
containing
n
words and
m
quantity words
Q={q1, q2,· · · , qm}
. The equa-
tion
Y
is a sequence of constant words (e.g., 3.14),
mathematical operator
op ={+,,×,÷,· · · }
and
quantity words from
Q
. Solving MWP is to find the
optimal mapping
Tˆ
Y
, allowing predicted
ˆ
Y
to
derive the correct answer. Existing methods learn
this mapping from a single view, e.g., seq2tree
generation and iterative relation extraction. Our
consistent contrastive learning approach solves this
by reasoning from multiple views. Both top-down
and bottom-up view are necessary for a complete
semantics-to-equation mapping.
3.2 Multi-View using Multi-Order
We use the labeled mid-order equation to generate
two different sequences
Ypre ={yf
1, yf
2,· · · , yf
L}
and
Ypost ={yb
1, yb
2,· · · , yb
L}
using pre-order and
post-order traversal. As shown in Figure 1, we treat
the
Ypre
as the label for the top-down process and
the Ypost is for the bottom-up process training.
Global shared Embedding Firstly, we design
three types of global shared embedding matrix: text
word embedding
Ew
, quantity word embedding
Eq
, mathematical operator embedding
Eop
. Text
embedding and quantity word embedding are ex-
tracted from the pre-trained language model (De-
vlin et al.,2019;Liu et al.,2019b), and operator
embeddings are randomly initialized. Besides, all
constant word embeddings are also randomly ini-
tialized and added to
Eq
. As shown in Figure 2,
three global embeddings are shared by two rea-
soning processes. Then, text embeddings
Ew
are
fused into a target vector
troot
by the Bidirectional
Gated Recurrent Unit (GRU) (Cho et al.,2014),
where
troot
means the global target for top-down
reasoning. Quantity embeddings
Eq
is for quantity
relation construction in bottom-up reasoning.
Top-down view using Pre-order The top-
down view is a global-to-local decomposition
that follows the pre-order equation
Ypre
(e.g.,
,×,+,2,3,4,5
). This process is similar to Xie
and Sun (2019). Starting from the root node, each
node needs to conduct node prediction, and the
operator node also conduct node decomposition,
e.g., in Figure 1, root node predicts its node type
is “operator” and output token is “
” and then is
decomposed into two child nodes. Two child nodes
are predicted to “×” in step 2and “5” in step 7.
Node prediction Each node has a target vector
tn
decomposed from their parent (for root node,
摘要:

MultiViewReasoning)ConsistentContrastiveLearningforMathWordProblemWenqiZhangsnYongliangShensnYannaMainXiaoxiaChengsnZeqiTansnQingpengNonggnWeimingLusysCollegeofComputerScienceandTechnologynZhejiangUniversityiUniversityofShanghaiforScienceandTechnologygZhongxingTelecommunicationEquipmentCorporationy{...

展开>> 收起<<
Multi-View Reasoning Consistent Contrastive Learning for Math Word Problem Wenqi Zhang1 Yongliang Shen1 Yanna Ma2 Xiaoxia Cheng1.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.81MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注