Structure-Uniﬁed M-Tree Coding Solver for Math Word Problem Bin Wang Jiangzhou Ju Yang Fan Xinyu Dai Shujian Huang Jiajun Chen National Key Laboratory for Novel Software Technology Nanjing University

2025-05-02 1 0 917.8KB 12 页 10玖币

侵权投诉

Structure-Uniﬁed M-Tree Coding Solver for Math Word Problem

Bin Wang, Jiangzhou Ju, Yang Fan, Xinyu Dai∗

, Shujian Huang, Jiajun Chen

National Key Laboratory for Novel Software Technology, Nanjing University

Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing

{wangbin, jujiangzhou, fanyang}@smail.nju.edu.cn

{daixinyu, huangsj, chenjj}@nju.edu.cn

Abstract

As one of the challenging NLP tasks, design-

ing math word problem (MWP) solvers has

attracted increasing research attention for the

past few years. In previous work, models de-

signed by taking into account the properties

of the binary tree structure of mathematical

expressions at the output side have achieved

better performance. However, the expressions

corresponding to a MWP are often diverse

(e.g., n1+n2×n3−n4,n3×n2−n4+n1,

etc.), and so are the corresponding binary trees,

which creates difﬁculties in model learning

due to the non-deterministic output space. In

this paper, we propose the Structure-Uniﬁed

M-Tree Coding Solver (SUMC-Solver), which

applies a tree with any M branches (M-tree) to

unify the output structures. To learn the M-

tree, we use a mapping to convert the M-tree

into the M-tree codes, where codes store the

information of the paths from tree root to leaf

nodes and the information of leaf nodes them-

selves, and then devise a Sequence-to-Code

(seq2code) model to generate the codes. Ex-

perimental results on the widely used MAWPS

and Math23K datasets have demonstrated that

SUMC-Solver not only outperforms several

state-of-the-art models under similar experi-

mental settings but also performs much better

under low-resource conditions1.

1 Introduction

Given the description text of a MWP, an automatic

solver needs to output an expression for solving

the unknown variable asked in the problem that

consists of mathematical operands (numerical val-

ues) and operation symbols (

+,−,×,÷

), as shown

in Fig. 1. It requires that the solver not only un-

derstand the natural-language problem but also be

able to model the relationships between the numer-

ical values to perform arithmetic reasoning. These

*Corresponding author

Code and data are available at

https://github.com/

devWangBin/SUMC-Solver

Problem: Mike bought a new book. He read 3 pages every day

on the first 2 days. Then, on the third day, he read 4 pages in the

morning and 5 pages in the afternoon. How many pages has

Mike read so far?

Solution expressions: 2 × 3 + 4 + 5

3 × 2 + (5 + 4)

5 + 3 × 2 + 4

……

Post expression sequences: 2 3 × 4 + 5 +

32×54++

5 3 2 × + 4 +

……

Expression binary trees:

……

M-Tree of expressions:

Answer: 15

Figure 1: An example of math word problems, which

has multiple solution expressions and binary trees, but

only one M-tree output.

challenges mean MWP solvers are often broadly

considered good test beds for evaluating the intelli-

gence level of agents (Lin et al.,2021).

In recent years, the research on designing auto-

matic MWP solvers has also made great progress

due to the success of neural network models on

NLP. Wang et al. (2017) ﬁrst apply a Sequence-to-

Sequence (seq2seq) model to solve the MWP, and

since then more methods (Wang et al.,2018;Chi-

ang and Chen,2019;Wang et al.,2019) based on

seq2seq models have been proposed for further im-

provement. To use the structural information from

expressions more effectively, other methods (Liu

arXiv:2210.12432v2 [cs.CL] 25 Oct 2022

et al.,2019a;Xie and Sun,2019) use Sequence-

to-Tree (seq2tree) models with a well-designed

tree-structured decoder to generate the pre-order

sequence of a binary tree in a top-down manner

and have achieved better performance. Zhang et al.

(2020b) combines the merits of the graph-based en-

coder and tree-based decoder (graph2tree) to gen-

erate better solution expressions.

Promising results have been achieved in solving

MWP, but the existing methods learn to output only

one expression sequence or binary tree, ignoring

that there is likely to be far more than one that

can obtain the correct answer. For example, in Fig.

1, to answer the question “How many pages has

Mike read so far?”, we can add up the number of

pages that Mike read each day to get the expression

“

2×3+4+5

”; we can also calculate the ﬁrst two

days followed by the sum of pages read on the third

day to get the expression “

3×2 + (5 + 4)

”; or we

could even calculate the problem in a less obvious

way with the expression “

5 + 3 ×2 + 4

”. The

number of different expression sequences or binary

trees can grow very large with different combina-

tions, which results in a large non-deterministic

output space and creates difﬁculties in model learn-

ing. Speciﬁcally, when a problem has multiple

correct outputs, but the solver only obtains one of

them, the knowledge learned by the model will be

incomplete, and the demand for data will also in-

crease, making most data-driven methods perform

poorly under low-resource conditions.

In previous work, to overcome these limitations,

Wang et al. (2018,2019) used the equation nor-

malization method, which normalizes the output

sequence by restricting the order of operands and

has only a limited effect. Zhang et al. (2020a) pro-

posed to use multiple decoders to learn different ex-

pression sequences simultaneously. However, the

large and varying number of sequences for MWPS

makes the strategy less adaptable. For the mod-

els that learn the binary-tree output (Xie and Sun,

2019;Wu et al.,2020;Zhang et al.,2020b), they

generally use a tree decoder to perform top-down

and left-to-right generation that only generate one

binary tree at a time, which dose not propose a

solution to these limitations.

To address the challenge that the output diversity

in MWP increases the difﬁculty of model learning,

we analyzed the causes for the diversity, which can

be summarized as the following:

•

Uncertainty of computation order of the math-

ematical operations: This is caused by 1) giv-

ing the same priority to the same or different

mathematical operations. For example, in the

expression

n1+n2+n3−n4

, three operations

have the same priority. Consequently, the calcu-

lations in any order can obtain the correct answer,

which leads to many equivalent expressions and

binary trees. And 2) brackets can also lead to

many equivalent outputs with different forms.

For example,

n1+n2−n3

n1−(n3−n2)

and

(n1+n2)−n3

are equivalent expressions and

can be represented as different binary trees.

•

The uncertainty caused by the exchange of

operands or sub-expressions: Among the four

basic mathematical operations {

+,−,×,÷

}, ad-

dition “

” and multiplication “

” have the prop-

erty that the operands or sub-expressions of both

sides are allowed to be swapped. For example,

the expression

n1+n2×n3

can be transformed

to get: n1+n3×n2,n2×n3+n1, etc.

In this paper, to account for the aforementioned

challenge, we propose SUMC-Solver for solving

math word problems. The following describes the

main contents of our work:

We designed the M-tree to unify the diverse out-

put. Existing work (Xie and Sun,2019;Wu et al.,

2020,2021b) has demonstrated through extensive

experiments that taking advantage of the tree struc-

ture information of MWP expressions can achieve

better performance. We retain the use of a tree

structure but further develop on top of the binary

tree with an M-tree which contains any M branches.

The ability of the M-tree to unify output structures

is reﬂected in both horizontal and vertical direc-

tions:

•

To deal with the uncertainty of computation or-

ders for mathematical operations, we set the root

to a speciﬁc operation and allow any number

of branches for internal nodes in the M-tree, re-

ducing the diversity of the tree structure in the

vertical direction.

•

To deal with the uncertainty caused by the ex-

change between the left and right sibling nodes

in original binary trees, we redeﬁne the opera-

tions in the M-tree to make sure that the exchange

between any sibling nodes will not affect the cal-

culation process and treat M-trees that differ only

in the left-to-right order of their sibling nodes

as the same. Like the M-tree example shown in

Fig. 1. The exchange between node “

”, “

”,

and “

” will neither affect the calculation process

nor form a new tree. With this method, the struc-

tural diversity in the horizontal direction is also

reduced.

We designed the M-tree codes and a seq2code

framework for the M-tree learning. We abandoned

the top-down and left-to-right autoregressive gen-

eration used for binary trees in previous methods.

The reason is that the generation can not avoid

the diversity caused by the generation order of sib-

ling nodes. Instead, we encode the M-tree into

M-tree codes that can be restored to the original

M-tree, where the codes store the information of

the paths from the root to leaf nodes and leaf nodes

themselves. And inspired by the sequence labeling

methods used in studies mentioned in 2.2, we inno-

vatively use a seq2code framework to generate the

M-tree codes in a non-autoregressive way, which

takes the problem text as the input sequence and

outputs the M-tree codes of the numbers (numer-

ical values) in the math word problem. Then we

restore the codes to a M-tree that can represent the

calculation logic between the numbers and ﬁnally

calculate the answer.

Our contributions can be summarized as follows:

•

We analyze the causes of output diversity in

MWP and design a novel M-tree-based solu-

tion to unify the output.

•

We design the M-tree codes to represent the

M-tree and propose a seq2code model to gen-

erate the codes in a non-autoregressive fash-

ion. To the best of our knowledge, this is

the ﬁrst work to analyze mathematical expres-

sions with M-tree codes and seq2code.

•

Experimental results on MAWPS (Koncel-

Kedziorski et al.,2016) and Math23K datasets

(Wang et al.,2017) show that SUMC-Solver

outperforms previous methods with similar

settings. This is especially the case in low-

resource scenarios, where our solver achieves

superior performance.

2 Related Work

2.1 Math Word Problem Solver

With the success of deep learning (DL) in various

NLP tasks, designing a DL-Based MWP solver has

become a major research focus lately. Wang et al.

(2017) ﬁrst addresses the MWP with a seq2seq

model, which implements the solver as a generative

model from problem text sequence to expression

sequence. By utilizing the semantic meanings of

operation symbols, Chiang and Chen (2019) ap-

ply a stack to help generate expressions. To bet-

ter utilize expression structure information, other

methods (Liu et al.,2019a;Xie and Sun,2019;Li

et al.,2020) transform expressions into binary-tree-

structured representations and learn the tree out-

put. Zhang et al. (2020b) additionally introduces

a graph-based encoder to enrich the representation

of problems.

There are also approaches that explore the use of

more extensive networks and external knowledge to

help solve the MWP. Li et al. (2019) builds several

special attention mechanisms to extract the critical

information of the input sequence, and Zhang et al.

(2020a) propose using teacher-student networks

that combine two solvers to solve the MWP. Wu

et al. (2020) utilizes external knowledge through

the use of an entity graph extracted from the prob-

lem sequence and Lin et al. (2021) proposes a hier-

archical encoder with a dependency parsing mod-

ule and hierarchical attention mechanism to make

better use of the input information. Following the

previous work, Wu et al. (2021b) continues to use

external knowledge to enrich the problem represen-

tations and further explicitly incorporate numerical

value information encoded by an external network

into solving math word problems. Based on the

graph2tree framework, Wu et al. (2021a) uses exter-

nal knowledge to further enrich the input graph in-

formation. Yu et al. (2021) uses both a pre-trained

knowledge encoder and a hierarchical reasoning

encoder to encode the input problems. Qin et al.

(2021) constructed different auxiliary tasks using

supervised or self-supervised methods and external

knowledge (common sense and problem’s part-of-

speech) to help the model learn the solution of

MWPs.

Different from the above methods that mainly

focused on the input side and directly generated

expression sequences or binary trees, we designed

the structure-uniﬁed M-tree and the M-tree codes

on the output side. Also, we design a simple model

to test the advances of the M-tree and M-tree codes

by comparing with methods under similar experi-

mental settings, which means that methods using

additional knowledge or multiple encoders will not

become our baselines.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Structure-UniedM-TreeCodingSolverforMathWordProblemBinWang,JiangzhouJu,YangFan,XinyuDai,ShujianHuang,JiajunChenNationalKeyLaboratoryforNovelSoftwareTechnology,NanjingUniversityCollaborativeInnovationCenterofNovelSoftwareTechnologyandIndustrialization,Nanjing{wangbin,jujiangzhou,fanyang}@smail.nju....

展开>> 收起<<

Structure-Uniﬁed M-Tree Coding Solver for Math Word Problem Bin Wang Jiangzhou Ju Yang Fan Xinyu Dai Shujian Huang Jiajun Chen National Key Laboratory for Novel Software Technology Nanjing University.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Structure-Uniﬁed M-Tree Coding Solver for Math Word Problem Bin Wang Jiangzhou Ju Yang Fan Xinyu Dai Shujian Huang Jiajun Chen National Key Laboratory for Novel Software Technology Nanjing University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: