et al.,2019a;Xie and Sun,2019) use Sequence-
to-Tree (seq2tree) models with a well-designed
tree-structured decoder to generate the pre-order
sequence of a binary tree in a top-down manner
and have achieved better performance. Zhang et al.
(2020b) combines the merits of the graph-based en-
coder and tree-based decoder (graph2tree) to gen-
erate better solution expressions.
Promising results have been achieved in solving
MWP, but the existing methods learn to output only
one expression sequence or binary tree, ignoring
that there is likely to be far more than one that
can obtain the correct answer. For example, in Fig.
1, to answer the question “How many pages has
Mike read so far?”, we can add up the number of
pages that Mike read each day to get the expression
“
2×3+4+5
”; we can also calculate the first two
days followed by the sum of pages read on the third
day to get the expression “
3×2 + (5 + 4)
”; or we
could even calculate the problem in a less obvious
way with the expression “
5 + 3 ×2 + 4
”. The
number of different expression sequences or binary
trees can grow very large with different combina-
tions, which results in a large non-deterministic
output space and creates difficulties in model learn-
ing. Specifically, when a problem has multiple
correct outputs, but the solver only obtains one of
them, the knowledge learned by the model will be
incomplete, and the demand for data will also in-
crease, making most data-driven methods perform
poorly under low-resource conditions.
In previous work, to overcome these limitations,
Wang et al. (2018,2019) used the equation nor-
malization method, which normalizes the output
sequence by restricting the order of operands and
has only a limited effect. Zhang et al. (2020a) pro-
posed to use multiple decoders to learn different ex-
pression sequences simultaneously. However, the
large and varying number of sequences for MWPS
makes the strategy less adaptable. For the mod-
els that learn the binary-tree output (Xie and Sun,
2019;Wu et al.,2020;Zhang et al.,2020b), they
generally use a tree decoder to perform top-down
and left-to-right generation that only generate one
binary tree at a time, which dose not propose a
solution to these limitations.
To address the challenge that the output diversity
in MWP increases the difficulty of model learning,
we analyzed the causes for the diversity, which can
be summarized as the following:
•
Uncertainty of computation order of the math-
ematical operations: This is caused by 1) giv-
ing the same priority to the same or different
mathematical operations. For example, in the
expression
n1+n2+n3−n4
, three operations
have the same priority. Consequently, the calcu-
lations in any order can obtain the correct answer,
which leads to many equivalent expressions and
binary trees. And 2) brackets can also lead to
many equivalent outputs with different forms.
For example,
n1+n2−n3
,
n1−(n3−n2)
and
(n1+n2)−n3
are equivalent expressions and
can be represented as different binary trees.
•
The uncertainty caused by the exchange of
operands or sub-expressions: Among the four
basic mathematical operations {
+,−,×,÷
}, ad-
dition “
+
” and multiplication “
×
” have the prop-
erty that the operands or sub-expressions of both
sides are allowed to be swapped. For example,
the expression
n1+n2×n3
can be transformed
to get: n1+n3×n2,n2×n3+n1, etc.
In this paper, to account for the aforementioned
challenge, we propose SUMC-Solver for solving
math word problems. The following describes the
main contents of our work:
We designed the M-tree to unify the diverse out-
put. Existing work (Xie and Sun,2019;Wu et al.,
2020,2021b) has demonstrated through extensive
experiments that taking advantage of the tree struc-
ture information of MWP expressions can achieve
better performance. We retain the use of a tree
structure but further develop on top of the binary
tree with an M-tree which contains any M branches.
The ability of the M-tree to unify output structures
is reflected in both horizontal and vertical direc-
tions:
•
To deal with the uncertainty of computation or-
ders for mathematical operations, we set the root
to a specific operation and allow any number
of branches for internal nodes in the M-tree, re-
ducing the diversity of the tree structure in the
vertical direction.
•
To deal with the uncertainty caused by the ex-
change between the left and right sibling nodes
in original binary trees, we redefine the opera-
tions in the M-tree to make sure that the exchange
between any sibling nodes will not affect the cal-
culation process and treat M-trees that differ only
in the left-to-right order of their sibling nodes
as the same. Like the M-tree example shown in