Structure-Unified M-Tree Coding Solver for Math Word Problem Bin Wang Jiangzhou Ju Yang Fan Xinyu Dai Shujian Huang Jiajun Chen National Key Laboratory for Novel Software Technology Nanjing University

2025-05-02 0 0 917.8KB 12 页 10玖币
侵权投诉
Structure-Unified M-Tree Coding Solver for Math Word Problem
Bin Wang, Jiangzhou Ju, Yang Fan, Xinyu Dai
, Shujian Huang, Jiajun Chen
National Key Laboratory for Novel Software Technology, Nanjing University
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing
{wangbin, jujiangzhou, fanyang}@smail.nju.edu.cn
{daixinyu, huangsj, chenjj}@nju.edu.cn
Abstract
As one of the challenging NLP tasks, design-
ing math word problem (MWP) solvers has
attracted increasing research attention for the
past few years. In previous work, models de-
signed by taking into account the properties
of the binary tree structure of mathematical
expressions at the output side have achieved
better performance. However, the expressions
corresponding to a MWP are often diverse
(e.g., n1+n2×n3n4,n3×n2n4+n1,
etc.), and so are the corresponding binary trees,
which creates difficulties in model learning
due to the non-deterministic output space. In
this paper, we propose the Structure-Unified
M-Tree Coding Solver (SUMC-Solver), which
applies a tree with any M branches (M-tree) to
unify the output structures. To learn the M-
tree, we use a mapping to convert the M-tree
into the M-tree codes, where codes store the
information of the paths from tree root to leaf
nodes and the information of leaf nodes them-
selves, and then devise a Sequence-to-Code
(seq2code) model to generate the codes. Ex-
perimental results on the widely used MAWPS
and Math23K datasets have demonstrated that
SUMC-Solver not only outperforms several
state-of-the-art models under similar experi-
mental settings but also performs much better
under low-resource conditions1.
1 Introduction
Given the description text of a MWP, an automatic
solver needs to output an expression for solving
the unknown variable asked in the problem that
consists of mathematical operands (numerical val-
ues) and operation symbols (
+,,×,÷
), as shown
in Fig. 1. It requires that the solver not only un-
derstand the natural-language problem but also be
able to model the relationships between the numer-
ical values to perform arithmetic reasoning. These
*Corresponding author
1
Code and data are available at
https://github.com/
devWangBin/SUMC-Solver
Problem: Mike bought a new book. He read 3 pages every day
on the first 2 days. Then, on the third day, he read 4 pages in the
morning and 5 pages in the afternoon. How many pages has
Mike read so far?
Solution expressions: 2 × 3 + 4 + 5
3 × 2 + (5 + 4)
5 + 3 × 2 + 4
……
Post expression sequences: 2 3 × 4 + 5 +
32×54++
5 3 2 × + 4 +
……
Expression binary trees:
……
M-Tree of expressions:
Answer: 15
Figure 1: An example of math word problems, which
has multiple solution expressions and binary trees, but
only one M-tree output.
challenges mean MWP solvers are often broadly
considered good test beds for evaluating the intelli-
gence level of agents (Lin et al.,2021).
In recent years, the research on designing auto-
matic MWP solvers has also made great progress
due to the success of neural network models on
NLP. Wang et al. (2017) first apply a Sequence-to-
Sequence (seq2seq) model to solve the MWP, and
since then more methods (Wang et al.,2018;Chi-
ang and Chen,2019;Wang et al.,2019) based on
seq2seq models have been proposed for further im-
provement. To use the structural information from
expressions more effectively, other methods (Liu
arXiv:2210.12432v2 [cs.CL] 25 Oct 2022
et al.,2019a;Xie and Sun,2019) use Sequence-
to-Tree (seq2tree) models with a well-designed
tree-structured decoder to generate the pre-order
sequence of a binary tree in a top-down manner
and have achieved better performance. Zhang et al.
(2020b) combines the merits of the graph-based en-
coder and tree-based decoder (graph2tree) to gen-
erate better solution expressions.
Promising results have been achieved in solving
MWP, but the existing methods learn to output only
one expression sequence or binary tree, ignoring
that there is likely to be far more than one that
can obtain the correct answer. For example, in Fig.
1, to answer the question “How many pages has
Mike read so far?”, we can add up the number of
pages that Mike read each day to get the expression
2×3+4+5
”; we can also calculate the first two
days followed by the sum of pages read on the third
day to get the expression “
3×2 + (5 + 4)
”; or we
could even calculate the problem in a less obvious
way with the expression “
5 + 3 ×2 + 4
”. The
number of different expression sequences or binary
trees can grow very large with different combina-
tions, which results in a large non-deterministic
output space and creates difficulties in model learn-
ing. Specifically, when a problem has multiple
correct outputs, but the solver only obtains one of
them, the knowledge learned by the model will be
incomplete, and the demand for data will also in-
crease, making most data-driven methods perform
poorly under low-resource conditions.
In previous work, to overcome these limitations,
Wang et al. (2018,2019) used the equation nor-
malization method, which normalizes the output
sequence by restricting the order of operands and
has only a limited effect. Zhang et al. (2020a) pro-
posed to use multiple decoders to learn different ex-
pression sequences simultaneously. However, the
large and varying number of sequences for MWPS
makes the strategy less adaptable. For the mod-
els that learn the binary-tree output (Xie and Sun,
2019;Wu et al.,2020;Zhang et al.,2020b), they
generally use a tree decoder to perform top-down
and left-to-right generation that only generate one
binary tree at a time, which dose not propose a
solution to these limitations.
To address the challenge that the output diversity
in MWP increases the difficulty of model learning,
we analyzed the causes for the diversity, which can
be summarized as the following:
Uncertainty of computation order of the math-
ematical operations: This is caused by 1) giv-
ing the same priority to the same or different
mathematical operations. For example, in the
expression
n1+n2+n3n4
, three operations
have the same priority. Consequently, the calcu-
lations in any order can obtain the correct answer,
which leads to many equivalent expressions and
binary trees. And 2) brackets can also lead to
many equivalent outputs with different forms.
For example,
n1+n2n3
,
n1(n3n2)
and
(n1+n2)n3
are equivalent expressions and
can be represented as different binary trees.
The uncertainty caused by the exchange of
operands or sub-expressions: Among the four
basic mathematical operations {
+,,×,÷
}, ad-
dition “
+
” and multiplication “
×
” have the prop-
erty that the operands or sub-expressions of both
sides are allowed to be swapped. For example,
the expression
n1+n2×n3
can be transformed
to get: n1+n3×n2,n2×n3+n1, etc.
In this paper, to account for the aforementioned
challenge, we propose SUMC-Solver for solving
math word problems. The following describes the
main contents of our work:
We designed the M-tree to unify the diverse out-
put. Existing work (Xie and Sun,2019;Wu et al.,
2020,2021b) has demonstrated through extensive
experiments that taking advantage of the tree struc-
ture information of MWP expressions can achieve
better performance. We retain the use of a tree
structure but further develop on top of the binary
tree with an M-tree which contains any M branches.
The ability of the M-tree to unify output structures
is reflected in both horizontal and vertical direc-
tions:
To deal with the uncertainty of computation or-
ders for mathematical operations, we set the root
to a specific operation and allow any number
of branches for internal nodes in the M-tree, re-
ducing the diversity of the tree structure in the
vertical direction.
To deal with the uncertainty caused by the ex-
change between the left and right sibling nodes
in original binary trees, we redefine the opera-
tions in the M-tree to make sure that the exchange
between any sibling nodes will not affect the cal-
culation process and treat M-trees that differ only
in the left-to-right order of their sibling nodes
as the same. Like the M-tree example shown in
Fig. 1. The exchange between node “
5
”, “
×
”,
and “
4
” will neither affect the calculation process
nor form a new tree. With this method, the struc-
tural diversity in the horizontal direction is also
reduced.
We designed the M-tree codes and a seq2code
framework for the M-tree learning. We abandoned
the top-down and left-to-right autoregressive gen-
eration used for binary trees in previous methods.
The reason is that the generation can not avoid
the diversity caused by the generation order of sib-
ling nodes. Instead, we encode the M-tree into
M-tree codes that can be restored to the original
M-tree, where the codes store the information of
the paths from the root to leaf nodes and leaf nodes
themselves. And inspired by the sequence labeling
methods used in studies mentioned in 2.2, we inno-
vatively use a seq2code framework to generate the
M-tree codes in a non-autoregressive way, which
takes the problem text as the input sequence and
outputs the M-tree codes of the numbers (numer-
ical values) in the math word problem. Then we
restore the codes to a M-tree that can represent the
calculation logic between the numbers and finally
calculate the answer.
Our contributions can be summarized as follows:
We analyze the causes of output diversity in
MWP and design a novel M-tree-based solu-
tion to unify the output.
We design the M-tree codes to represent the
M-tree and propose a seq2code model to gen-
erate the codes in a non-autoregressive fash-
ion. To the best of our knowledge, this is
the first work to analyze mathematical expres-
sions with M-tree codes and seq2code.
Experimental results on MAWPS (Koncel-
Kedziorski et al.,2016) and Math23K datasets
(Wang et al.,2017) show that SUMC-Solver
outperforms previous methods with similar
settings. This is especially the case in low-
resource scenarios, where our solver achieves
superior performance.
2 Related Work
2.1 Math Word Problem Solver
With the success of deep learning (DL) in various
NLP tasks, designing a DL-Based MWP solver has
become a major research focus lately. Wang et al.
(2017) first addresses the MWP with a seq2seq
model, which implements the solver as a generative
model from problem text sequence to expression
sequence. By utilizing the semantic meanings of
operation symbols, Chiang and Chen (2019) ap-
ply a stack to help generate expressions. To bet-
ter utilize expression structure information, other
methods (Liu et al.,2019a;Xie and Sun,2019;Li
et al.,2020) transform expressions into binary-tree-
structured representations and learn the tree out-
put. Zhang et al. (2020b) additionally introduces
a graph-based encoder to enrich the representation
of problems.
There are also approaches that explore the use of
more extensive networks and external knowledge to
help solve the MWP. Li et al. (2019) builds several
special attention mechanisms to extract the critical
information of the input sequence, and Zhang et al.
(2020a) propose using teacher-student networks
that combine two solvers to solve the MWP. Wu
et al. (2020) utilizes external knowledge through
the use of an entity graph extracted from the prob-
lem sequence and Lin et al. (2021) proposes a hier-
archical encoder with a dependency parsing mod-
ule and hierarchical attention mechanism to make
better use of the input information. Following the
previous work, Wu et al. (2021b) continues to use
external knowledge to enrich the problem represen-
tations and further explicitly incorporate numerical
value information encoded by an external network
into solving math word problems. Based on the
graph2tree framework, Wu et al. (2021a) uses exter-
nal knowledge to further enrich the input graph in-
formation. Yu et al. (2021) uses both a pre-trained
knowledge encoder and a hierarchical reasoning
encoder to encode the input problems. Qin et al.
(2021) constructed different auxiliary tasks using
supervised or self-supervised methods and external
knowledge (common sense and problem’s part-of-
speech) to help the model learn the solution of
MWPs.
Different from the above methods that mainly
focused on the input side and directly generated
expression sequences or binary trees, we designed
the structure-unified M-tree and the M-tree codes
on the output side. Also, we design a simple model
to test the advances of the M-tree and M-tree codes
by comparing with methods under similar experi-
mental settings, which means that methods using
additional knowledge or multiple encoders will not
become our baselines.
摘要:

Structure-UniedM-TreeCodingSolverforMathWordProblemBinWang,JiangzhouJu,YangFan,XinyuDai,ShujianHuang,JiajunChenNationalKeyLaboratoryforNovelSoftwareTechnology,NanjingUniversityCollaborativeInnovationCenterofNovelSoftwareTechnologyandIndustrialization,Nanjing{wangbin,jujiangzhou,fanyang}@smail.nju....

展开>> 收起<<
Structure-Unified M-Tree Coding Solver for Math Word Problem Bin Wang Jiangzhou Ju Yang Fan Xinyu Dai Shujian Huang Jiajun Chen National Key Laboratory for Novel Software Technology Nanjing University.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:917.8KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注